Re: more NET_LOCK in pppx
ok. > On 28 Mar 2018, at 10:07, Jonathan Matthewwrote: > > Reading through if_pppx.c I spotted a couple of places we need NET_LOCK(). > > ok? > > Index: if_pppx.c > === > RCS file: /cvs/src/sys/net/if_pppx.c,v > retrieving revision 1.63 > diff -u -p -u -p -r1.63 if_pppx.c > --- if_pppx.c 12 Aug 2017 20:27:28 - 1.63 > +++ if_pppx.c 27 Mar 2018 23:56:56 - > @@ -392,6 +392,8 @@ pppxwrite(dev_t dev, struct uio *uio, in > proto = ntohl(*(uint32_t *)(th + 1)); > m_adj(top, sizeof(uint32_t)); > > + NET_LOCK(); > + > switch (proto) { > case AF_INET: > ipv4_input(>pxi_if, top); > @@ -403,9 +405,12 @@ pppxwrite(dev_t dev, struct uio *uio, in > #endif > default: > m_freem(top); > - return (EAFNOSUPPORT); > + error = EAFNOSUPPORT; > + break; > } > > + NET_UNLOCK(); > + > return (error); > } > > @@ -583,8 +588,10 @@ pppxclose(dev_t dev, int flags, int mode > pxd = pppx_dev_lookup(dev); > > /* XXX */ > + NET_LOCK(); > while ((pxi = LIST_FIRST(>pxd_pxis))) > pppx_if_destroy(pxd, pxi); > + NET_UNLOCK(); > > LIST_REMOVE(pxd, pxd_entry); > >
Re: interface queue transmit mitigation (again)
On Tue, Mar 27, 2018 at 9:30 PM David Gwynnewrote: > On Thu, Mar 15, 2018 at 03:25:46PM +0100, Martin Pieuchot wrote: > > On 14/03/18(Wed) 13:00, David Gwynne wrote: > > > this adds transmit mitigation back to the tree. > > > > > > it is basically the same diff as last time. the big difference this > > > time is that all the tunnel drivers all defer ip_output calls, which > > > avoids having to play games with NET_LOCK in the ifq transmit paths. > > > > Comments inline. > > > > > + if (ifq_len(ifq) >= min(4, ifq->ifq_maxlen)) { > > > > Why 4? DragonFly recently bumped `ifsq_stage_cntmax' to 16. Did you > > try other values? They also have an XXX comment that this value should > > be per-interface. Why? > > their default was 4, and they'd done some research on it. if they > moved to 16 there would be a reason for it. Would it be this commit? https://marc.info/?l=dragonfly-commits=151401707632544=2 Comments include test data. Michael
Re: interface queue transmit mitigation (again)
On Thu, Mar 15, 2018 at 03:25:46PM +0100, Martin Pieuchot wrote: > On 14/03/18(Wed) 13:00, David Gwynne wrote: > > this adds transmit mitigation back to the tree. > > > > it is basically the same diff as last time. the big difference this > > time is that all the tunnel drivers all defer ip_output calls, which > > avoids having to play games with NET_LOCK in the ifq transmit paths. > > Comments inline. > > > + if (ifq_len(ifq) >= min(4, ifq->ifq_maxlen)) { > > Why 4? DragonFly recently bumped `ifsq_stage_cntmax' to 16. Did you > try other values? They also have an XXX comment that this value should > be per-interface. Why? their default was 4, and they'd done some research on it. if they moved to 16 there would be a reason for it. > In any case I'd be happier with a define. i've taken your advice on board and made it per interface, defaulted to 16 with a macro. after this i can add ioctls to report it (under hwfeatures maybe) and change it with ifconfig. > > ifq_barrier(struct ifqueue *ifq) > > { > > struct cond c = COND_INITIALIZER(); > > struct task t = TASK_INITIALIZER(ifq_barrier_task, ); > > > > + NET_ASSERT_UNLOCKED(); > > Since you assert here... > > > + > > + if (!task_del(ifq->ifq_softnet, >ifq_bundle)) { > > + int netlocked = (rw_status() == RW_WRITE); > ^ > You can remove this code. i can't get rid of the assert :( ifq_barrier is called from lots of places, eg, ifq_destroy and ix_down. the former does not hold the net lock, but the latter does. putting manual NET_UNLOCK calls around places like the latter is too much work imo. Index: if.c === RCS file: /cvs/src/sys/net/if.c,v retrieving revision 1.548 diff -u -p -r1.548 if.c --- if.c2 Mar 2018 15:52:11 - 1.548 +++ if.c28 Mar 2018 01:28:03 - @@ -607,6 +607,7 @@ if_attach_common(struct ifnet *ifp) ifp->if_snd.ifq_ifqs[0] = >if_snd; ifp->if_ifqs = ifp->if_snd.ifq_ifqs; ifp->if_nifqs = 1; + ifp->if_sndmit = IF_SNDMIT_DEFAULT; ifiq_init(>if_rcv, ifp, 0); Index: if_var.h === RCS file: /cvs/src/sys/net/if_var.h,v retrieving revision 1.89 diff -u -p -r1.89 if_var.h --- if_var.h10 Jan 2018 23:50:39 - 1.89 +++ if_var.h28 Mar 2018 01:28:03 - @@ -170,6 +170,7 @@ struct ifnet { /* and the entries */ struct ifqueue **if_ifqs; /* [I] pointer to an array of sndqs */ void(*if_qstart)(struct ifqueue *); unsigned int if_nifqs; /* [I] number of output queues */ + unsigned int if_sndmit; /* [c] tx mitigation amount */ struct ifiqueue if_rcv;/* rx/input queue */ struct ifiqueue **if_iqs; /* [I] pointer to the array of iqs */ @@ -279,6 +280,9 @@ do { \ #defineIFQ_LEN(ifq)ifq_len(ifq) #defineIFQ_IS_EMPTY(ifq) ifq_empty(ifq) #defineIFQ_SET_MAXLEN(ifq, len)ifq_set_maxlen(ifq, len) + +#define IF_SNDMIT_MIN 1 +#define IF_SNDMIT_DEFAULT 16 /* default interface priorities */ #define IF_WIRED_DEFAULT_PRIORITY 0 Index: ifq.c === RCS file: /cvs/src/sys/net/ifq.c,v retrieving revision 1.22 diff -u -p -r1.22 ifq.c --- ifq.c 25 Jan 2018 14:04:36 - 1.22 +++ ifq.c 28 Mar 2018 01:28:03 - @@ -70,9 +70,16 @@ struct priq { void ifq_start_task(void *); void ifq_restart_task(void *); void ifq_barrier_task(void *); +void ifq_bundle_task(void *); #define TASK_ONQUEUE 0x1 +static inline void +ifq_run_start(struct ifqueue *ifq) +{ + ifq_serialize(ifq, >ifq_start); +} + void ifq_serialize(struct ifqueue *ifq, struct task *t) { @@ -114,6 +121,16 @@ ifq_is_serialized(struct ifqueue *ifq) } void +ifq_start(struct ifqueue *ifq) +{ + if (ifq_len(ifq) >= min(ifq->ifq_if->if_sndmit, ifq->ifq_maxlen)) { + task_del(ifq->ifq_softnet, >ifq_bundle); + ifq_run_start(ifq); + } else + task_add(ifq->ifq_softnet, >ifq_bundle); +} + +void ifq_start_task(void *p) { struct ifqueue *ifq = p; @@ -137,11 +154,31 @@ ifq_restart_task(void *p) } void +ifq_bundle_task(void *p) +{ + struct ifqueue *ifq = p; + + ifq_run_start(ifq); +} + +void ifq_barrier(struct ifqueue *ifq) { struct cond c = COND_INITIALIZER(); struct task t = TASK_INITIALIZER(ifq_barrier_task, ); + if (!task_del(ifq->ifq_softnet, >ifq_bundle)) { + int netlocked = (rw_status() == RW_WRITE); + + if (netlocked) /* XXXSMP breaks atomicity */ + NET_UNLOCK(); + + taskq_barrier(ifq->ifq_softnet);
make tcpdump translate vlan priorities
the priority field in vlan headers needs some translation before it's displayed because the values of 0 and 1 are swapped when coming off the wire. ie, priority 0 on the wire is 1 in the system, and 1 on the wire is priority 0. 2 through 7 map to 2 to 7. the goal is to make the value printed with tcpdump the same as what you configure in ifconfig with llprio, and the prio you set in pf, and so on. ok? Index: print-ether.c === RCS file: /cvs/src/usr.sbin/tcpdump/print-ether.c,v retrieving revision 1.31 diff -u -p -r1.31 print-ether.c --- print-ether.c 11 Jul 2016 00:27:50 - 1.31 +++ print-ether.c 25 Mar 2018 06:06:23 - @@ -182,6 +182,7 @@ int ether_encap_print(u_short ethertype, const u_char *p, u_int length, u_int caplen) { + uint16_t vlan, pri, vid; recurse: extracted_ethertype = ethertype; @@ -221,10 +222,17 @@ recurse: case ETHERTYPE_QINQ: if (ethertype == ETHERTYPE_QINQ) printf("QinQ s"); - printf("vid %d pri %d%s", - ntohs(*(unsigned short*)p)&0xFFF, - ntohs(*(unsigned short*)p)>>13, - (ntohs(*(unsigned short*)p)&0x1000) ? " cfi " : " "); + + /* XXX caplen check */ + + vlan = ntohs(*(unsigned short*)p); + vid = vlan & 0xfff; + pri = vlan >> 13; + if (pri <= 1) + pri = !pri; + + printf("vid %d pri %d%s", vid, pri, + vlan & 0x1000 ? " cfi " : " "); ethertype = ntohs(*(unsigned short*)(p+2)); p += 4; length -= 4;
use link0 on vlan(4) to force the vlan priority to llprio
at the moment, vlan(4) takes the priority from mbufs and maps it to the vlan priority field on the wire when sending packets out. however, some isps now use pppoe on a vlan for connectivity, but only accept vlan packets with a specific priority level set, usually 1 (which is 0 on the wire). to force pppoe to use priority 1, you can ifconfig pppoe0 llprio 1. to force data packets to use prio 1, you must use pf to do it. for non-ip packets, eg arp and dhcp packets from bpfwrite, you need to set the vlan interface's llprio to 1. there is a caveat to all this in that flattening the prio on mbufs also makes priority queueing on interfaces impossible. because everything get's mapped to prio 1, there's no way to make pppoe control frames higher priority than data, and tcp syn packets higher priority than data packets and so on. i recently got moved to an isp like this, and i found having to configure this knowledge in 2 different places for pppoe to work was tedious and felt brittle. so because of that, and the loss of priority queueing, i came up with the following diff. it makes vlan(4) use it's llprio value for the value on the wire if link0 is set. if link0 is not set, the prio is taken from the mbuf like now. ok? Index: if_vlan.c === RCS file: /cvs/src/sys/net/if_vlan.c,v retrieving revision 1.176 diff -u -p -r1.176 if_vlan.c --- if_vlan.c 19 Feb 2018 08:59:52 - 1.176 +++ if_vlan.c 25 Mar 2018 05:12:05 - @@ -261,9 +261,10 @@ vlan_start(struct ifqueue *ifq) bpf_mtap_ether(ifp->if_bpf, m, BPF_DIRECTION_OUT); #endif /* NBPFILTER > 0 */ + prio = ISSET(ifp->if_flags, IFF_LINK0) ? + ifp->if_llprio : m->m_pkthdr.pf.prio; /* IEEE 802.1p has prio 0 and 1 swapped */ - prio = m->m_pkthdr.pf.prio; if (prio <= 1) prio = !prio;
more NET_LOCK in pppx
Reading through if_pppx.c I spotted a couple of places we need NET_LOCK(). ok? Index: if_pppx.c === RCS file: /cvs/src/sys/net/if_pppx.c,v retrieving revision 1.63 diff -u -p -u -p -r1.63 if_pppx.c --- if_pppx.c 12 Aug 2017 20:27:28 - 1.63 +++ if_pppx.c 27 Mar 2018 23:56:56 - @@ -392,6 +392,8 @@ pppxwrite(dev_t dev, struct uio *uio, in proto = ntohl(*(uint32_t *)(th + 1)); m_adj(top, sizeof(uint32_t)); + NET_LOCK(); + switch (proto) { case AF_INET: ipv4_input(>pxi_if, top); @@ -403,9 +405,12 @@ pppxwrite(dev_t dev, struct uio *uio, in #endif default: m_freem(top); - return (EAFNOSUPPORT); + error = EAFNOSUPPORT; + break; } + NET_UNLOCK(); + return (error); } @@ -583,8 +588,10 @@ pppxclose(dev_t dev, int flags, int mode pxd = pppx_dev_lookup(dev); /* XXX */ + NET_LOCK(); while ((pxi = LIST_FIRST(>pxd_pxis))) pppx_if_destroy(pxd, pxi); + NET_UNLOCK(); LIST_REMOVE(pxd, pxd_entry);
Re: cut: Fix segmentation fault
Hi Tobias, Tobias Stoeckmann wrote on Tue, Mar 27, 2018 at 11:51:29PM +0200: > On Tue, Mar 27, 2018 at 11:47:27PM +0200, Ingo Schwarze wrote: >> See inline for one optional suggestion. >>> if (!stop || !start) >>> errx(1, "[-bcf] list: values may not include zero"); >> Consider deleting these two lines, too. >> >> You new function read_number() already makes sure that neither stop >> nor start can be 0 at this point. > Beware that you can actually reach that point if you call cut with an > empty list argument, like: > > $ cut -c '' - > cut: [-bcf] list: values may not include zero > > It's therefore still required. Ouch, you are right. But then, the code feels counter-intuitive and the error message confusing. Would it make sense to change the error message to: errx(1, "[-bcf] list: empty list value"); schwarze@isnote $ cut -c '2,-,4' cut: [-bcf] list: empty list value schwarze@isnote $ cut -c '2,--,4' cut: [-bcf] list: empty list value schwarze@isnote $ cut -c '2,---,4' cut: [-bcf] list: illegal list value > Thinking about it, this might be a good regression candidate as well. Sounds reasonable. Yours, Ingo
Re: cut: Fix segmentation fault
On Tue, Mar 27, 2018 at 11:47:27PM +0200, Ingo Schwarze wrote: > See inline for one optional suggestion. > > > if (!stop || !start) > > errx(1, "[-bcf] list: values may not include zero"); > > Consider deleting these two lines, too. > > You new function read_number() already makes sure that neither stop > nor start can be 0 at this point. Beware that you can actually reach that point if you call cut with an empty list argument, like: $ cut -c '' - cut: [-bcf] list: values may not include zero It's therefore still required. Thinking about it, this might be a good regression candidate as well.
Re: cut: Fix segmentation fault
Hi Tobias, Tobias Stoeckmann wrote on Tue, Mar 27, 2018 at 01:30:03PM +0200: > This patch fixes an out of boundary write in cut: > > $ cut -c 2147483648 - > Segmentation fault (core dumped) > > The supplied number can be parsed by strtol, but the result is casted > into a signed int, therefore turning negative. Afterwards, it is used > as an offset into a char array to write data at the given position. > This leads to the shown segmentation fault. > > I have changed the strtol call to strtonum, making sure that range > specifications still work. Our cut regress tests pass. Also it has a > nicer error message now, thanks to strtonum. > > While at it, I replaced a manual for-loop with memset. See inline for one optional suggestion. OK schwarze@ either way. Ingo > Index: usr.bin/cut/cut.c > === > RCS file: /cvs/src/usr.bin/cut/cut.c,v > retrieving revision 1.23 > diff -u -p -u -p -r1.23 cut.c > --- usr.bin/cut/cut.c 2 Dec 2015 00:56:46 - 1.23 > +++ usr.bin/cut/cut.c 27 Mar 2018 11:24:31 - > @@ -154,11 +154,32 @@ int autostart, autostop, maxval; > > char positions[_POSIX2_LINE_MAX + 1]; > > +int > +read_number(char **p) > +{ > + size_t pos; > + int dash, n; > + const char *errstr; > + char *q; > + > + q = *p + strcspn(*p, "-"); > + dash = *q == '-'; > + *q = '\0'; > + n = strtonum(*p, 1, _POSIX2_LINE_MAX, ); > + if (errstr != NULL) > + errx(1, "[-bcf] list: %s %s (allowed 1-%d)", *p, errstr, > + _POSIX2_LINE_MAX); > + if (dash) > + *q = '-'; > + *p = q; > + > + return n; > +} > + > void > get_list(char *list) > { > int setautostart, start, stop; > - char *pos; > char *p; > > /* > @@ -176,13 +197,15 @@ get_list(char *list) > setautostart = 1; > } > if (isdigit((unsigned char)*p)) { > - start = stop = strtol(p, , 10); > + start = stop = read_number(); > if (setautostart && start > autostart) > autostart = start; > } > if (*p == '-') { > - if (isdigit((unsigned char)p[1])) > - stop = strtol(p + 1, , 10); > + if (isdigit((unsigned char)p[1])) { > + ++p; > + stop = read_number(); > + } > if (*p == '-') { > ++p; > if (!autostop || autostop > stop) > @@ -193,13 +216,10 @@ get_list(char *list) > errx(1, "[-bcf] list: illegal list value"); > if (!stop || !start) > errx(1, "[-bcf] list: values may not include zero"); Consider deleting these two lines, too. You new function read_number() already makes sure that neither stop nor start can be 0 at this point. > - if (stop > _POSIX2_LINE_MAX) > - errx(1, "[-bcf] list: %d too large (max %d)", > - stop, _POSIX2_LINE_MAX); > if (maxval < stop) > maxval = stop; > - for (pos = positions + start; start++ <= stop; *pos++ = 1) > - ; > + if (start <= stop) > + memset(positions + start, 1, stop - start + 1); > } > > /* overlapping ranges */
Re: test(1): Fix integer overflows
On Tue, Mar 27, 2018 at 10:48:51PM +0200, Tobias Stoeckmann wrote: > On Tue, Mar 27, 2018 at 10:36:17PM +0200, Ingo Schwarze wrote: > > > + int sig; > > > > Drop this variable, it does no good but only harm. > > Yeah, too bad that I didn't notice this left-over from a previous > development step. Strange enough that regression tests didn't > choke on it... > > > > + /* skip leading zeros */ > > > + while (*p == '0' && isdigit((unsigned char)p[1])) > > > + p++; > > > + > > > + /* turn 0 always positive */ > > > + if (*p == '0' && !isdigit((unsigned char)p[1])) > > > + sig = 1; > > As tb@ pointed out, the !isdigit check can be removed as well. > > > > + c = strncmp(p1, p2, len1); > > > > That's another bug: > > > > schwarze@isnote $ /bin/test -1 -gt -2 ; echo $? > > 1 > > schwarze@isnote $ /bin/test 1 -gt 2 ; echo $? > > 1 > > It's really great to have people who do extensive reviews. Thanks > a lot for spotting this. I have added it to the regression tests. > > > With these changes, OK schwarze@. See below for the patch that > > i tested. > > Modified with tb@'s remark. Also I changed the error message > "bad number" to "invalid", so we stay fully in sync with strtonum's > error messages. > > > Given that breaking test(1) might cause hard-to-find breakage > > deep down in build systems and scripts, you might with a to > > have a second OK for commit, though. > > Definitely, I'm not out to rush this one in. ok tb > > > Index: bin/test/test.c > === > RCS file: /cvs/src/bin/test/test.c,v > retrieving revision 1.18 > diff -u -p -u -p -r1.18 test.c > --- bin/test/test.c 24 Jul 2017 22:15:52 - 1.18 > +++ bin/test/test.c 27 Mar 2018 20:43:38 - > @@ -16,6 +16,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -145,6 +146,8 @@ static int aexpr(enum token n); > static int nexpr(enum token n); > static int binop(void); > static int primary(enum token n); > +static const char *getnstr(const char *, int *, size_t *); > +static int intcmp(const char *, const char *); > static int filstat(char *nm, enum token mode); > static int getn(const char *s); > static int newerf(const char *, const char *); > @@ -290,6 +293,70 @@ primary(enum token n) > return strlen(*t_wp) > 0; > } > > +static const char * > +getnstr(const char *s, int *signum, size_t *len) > +{ > + const char *p, *start; > + > + /* skip leading whitespaces */ > + p = s; > + while (isspace((unsigned char)*p)) > + p++; > + > + /* accept optional sign */ > + if (*p == '-') { > + *signum = -1; > + p++; > + } else { > + *signum = 1; > + if (*p == '+') > + p++; > + } > + > + /* skip leading zeros */ > + while (*p == '0' && isdigit((unsigned char)p[1])) > + p++; > + > + /* turn 0 always positive */ > + if (*p == '0') > + *signum = 1; > + > + start = p; > + while (isdigit((unsigned char)*p)) > + p++; > + *len = p - start; > + > + /* allow trailing whitespaces */ > + while (isspace((unsigned char)*p)) > + p++; > + > + /* validate number */ > + if (*p != '\0' || *start == '\0') > + errx(2, "%s: invalid", s); > + > + return start; > +} > + > +static int > +intcmp(const char *opnd1, const char *opnd2) > +{ > + const char *p1, *p2; > + size_t len1, len2; > + int c, sig1, sig2; > + > + p1 = getnstr(opnd1, , ); > + p2 = getnstr(opnd2, , ); > + > + if (sig1 != sig2) > + c = sig1; > + else if (len1 != len2) > + c = (len1 < len2) ? -sig1 : sig1; > + else > + c = strncmp(p1, p2, len1) * sig1; > + > + return c; > +} > + > static int > binop(void) > { > @@ -313,17 +380,17 @@ binop(void) > case STRGT: > return strcmp(opnd1, opnd2) > 0; > case INTEQ: > - return getn(opnd1) == getn(opnd2); > + return intcmp(opnd1, opnd2) == 0; > case INTNE: > - return getn(opnd1) != getn(opnd2); > + return intcmp(opnd1, opnd2) != 0; > case INTGE: > - return getn(opnd1) >= getn(opnd2); > + return intcmp(opnd1, opnd2) >= 0; > case INTGT: > - return getn(opnd1) > getn(opnd2); > + return intcmp(opnd1, opnd2) > 0; > case INTLE: > - return getn(opnd1) <= getn(opnd2); > + return intcmp(opnd1, opnd2) <= 0; > case INTLT: > - return getn(opnd1) < getn(opnd2); > + return intcmp(opnd1, opnd2) < 0; > case FILNT: > return newerf(opnd1, opnd2); > case FILOT: > @@ -455,22 +522,26 @@ t_lex(char *s) > static int > getn(const char *s) > { > - char *p; > - long r; > - > - errno = 0; > - r = strtol(s,
Re: test(1): Fix integer overflows
On Tue, Mar 27, 2018 at 10:36:17PM +0200, Ingo Schwarze wrote: > > + int sig; > > Drop this variable, it does no good but only harm. Yeah, too bad that I didn't notice this left-over from a previous development step. Strange enough that regression tests didn't choke on it... > > + /* skip leading zeros */ > > + while (*p == '0' && isdigit((unsigned char)p[1])) > > + p++; > > + > > + /* turn 0 always positive */ > > + if (*p == '0' && !isdigit((unsigned char)p[1])) > > + sig = 1; As tb@ pointed out, the !isdigit check can be removed as well. > > + c = strncmp(p1, p2, len1); > > That's another bug: > > schwarze@isnote $ /bin/test -1 -gt -2 ; echo $? > 1 > schwarze@isnote $ /bin/test 1 -gt 2 ; echo $? > 1 It's really great to have people who do extensive reviews. Thanks a lot for spotting this. I have added it to the regression tests. > With these changes, OK schwarze@. See below for the patch that > i tested. Modified with tb@'s remark. Also I changed the error message "bad number" to "invalid", so we stay fully in sync with strtonum's error messages. > Given that breaking test(1) might cause hard-to-find breakage > deep down in build systems and scripts, you might with a to > have a second OK for commit, though. Definitely, I'm not out to rush this one in. Index: bin/test/test.c === RCS file: /cvs/src/bin/test/test.c,v retrieving revision 1.18 diff -u -p -u -p -r1.18 test.c --- bin/test/test.c 24 Jul 2017 22:15:52 - 1.18 +++ bin/test/test.c 27 Mar 2018 20:43:38 - @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -145,6 +146,8 @@ static int aexpr(enum token n); static int nexpr(enum token n); static int binop(void); static int primary(enum token n); +static const char *getnstr(const char *, int *, size_t *); +static int intcmp(const char *, const char *); static int filstat(char *nm, enum token mode); static int getn(const char *s); static int newerf(const char *, const char *); @@ -290,6 +293,70 @@ primary(enum token n) return strlen(*t_wp) > 0; } +static const char * +getnstr(const char *s, int *signum, size_t *len) +{ + const char *p, *start; + + /* skip leading whitespaces */ + p = s; + while (isspace((unsigned char)*p)) + p++; + + /* accept optional sign */ + if (*p == '-') { + *signum = -1; + p++; + } else { + *signum = 1; + if (*p == '+') + p++; + } + + /* skip leading zeros */ + while (*p == '0' && isdigit((unsigned char)p[1])) + p++; + + /* turn 0 always positive */ + if (*p == '0') + *signum = 1; + + start = p; + while (isdigit((unsigned char)*p)) + p++; + *len = p - start; + + /* allow trailing whitespaces */ + while (isspace((unsigned char)*p)) + p++; + + /* validate number */ + if (*p != '\0' || *start == '\0') + errx(2, "%s: invalid", s); + + return start; +} + +static int +intcmp(const char *opnd1, const char *opnd2) +{ + const char *p1, *p2; + size_t len1, len2; + int c, sig1, sig2; + + p1 = getnstr(opnd1, , ); + p2 = getnstr(opnd2, , ); + + if (sig1 != sig2) + c = sig1; + else if (len1 != len2) + c = (len1 < len2) ? -sig1 : sig1; + else + c = strncmp(p1, p2, len1) * sig1; + + return c; +} + static int binop(void) { @@ -313,17 +380,17 @@ binop(void) case STRGT: return strcmp(opnd1, opnd2) > 0; case INTEQ: - return getn(opnd1) == getn(opnd2); + return intcmp(opnd1, opnd2) == 0; case INTNE: - return getn(opnd1) != getn(opnd2); + return intcmp(opnd1, opnd2) != 0; case INTGE: - return getn(opnd1) >= getn(opnd2); + return intcmp(opnd1, opnd2) >= 0; case INTGT: - return getn(opnd1) > getn(opnd2); + return intcmp(opnd1, opnd2) > 0; case INTLE: - return getn(opnd1) <= getn(opnd2); + return intcmp(opnd1, opnd2) <= 0; case INTLT: - return getn(opnd1) < getn(opnd2); + return intcmp(opnd1, opnd2) < 0; case FILNT: return newerf(opnd1, opnd2); case FILOT: @@ -455,22 +522,26 @@ t_lex(char *s) static int getn(const char *s) { - char *p; - long r; - - errno = 0; - r = strtol(s, , 10); - - if (errno != 0) - errx(2, "%s: out of range", s); - - while (isspace((unsigned char)*p)) - p++; + char buf[32]; + const char *errstr, *p; + size_t len; + int r, sig; + + p =
Re: FREF() earlier in sys_pread() & sys_pwrite()
On Tue, Mar 27, 2018 at 11:28:03AM +0200, Martin Pieuchot wrote: > Here's a diff to move FREF() just after fd_getfile_mode() in > sys_pread(), sys_preadv(), sys_pwrite() and sys_pwritev(). > > As explained recently [0], I'd like to make sure all operations > manipulating a 'struct file *' do so with a properly refcounted > element. > > [0] https://marc.info/?l=openbsd-tech=152214234530708=2 > > Ok? OK bluhm@ > Index: kern/vfs_syscalls.c > === > RCS file: /cvs/src/sys/kern/vfs_syscalls.c,v > retrieving revision 1.276 > diff -u -p -r1.276 vfs_syscalls.c > --- kern/vfs_syscalls.c 19 Feb 2018 08:59:52 - 1.276 > +++ kern/vfs_syscalls.c 27 Mar 2018 09:24:51 - > @@ -2930,23 +2930,25 @@ sys_pread(struct proc *p, void *v, regis > off_t offset; > int fd = SCARG(uap, fd); > > + iov.iov_base = SCARG(uap, buf); > + iov.iov_len = SCARG(uap, nbyte); > + > if ((fp = fd_getfile_mode(fdp, fd, FREAD)) == NULL) > return (EBADF); > + FREF(fp); > > vp = fp->f_data; > if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || > (vp->v_flag & VISTTY)) { > + FRELE(fp, p); > return (ESPIPE); > } > > - iov.iov_base = SCARG(uap, buf); > - iov.iov_len = SCARG(uap, nbyte); > - > offset = SCARG(uap, offset); > - if (offset < 0 && vp->v_type != VCHR) > + if (offset < 0 && vp->v_type != VCHR) { > + FRELE(fp, p); > return (EINVAL); > - > - FREF(fp); > + } > > /* dofilereadv() will FRELE the descriptor for us */ > return (dofilereadv(p, fd, fp, , 1, 0, , retval)); > @@ -2973,18 +2975,20 @@ sys_preadv(struct proc *p, void *v, regi > > if ((fp = fd_getfile_mode(fdp, fd, FREAD)) == NULL) > return (EBADF); > + FREF(fp); > > vp = fp->f_data; > if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || > (vp->v_flag & VISTTY)) { > + FRELE(fp, p); > return (ESPIPE); > } > > offset = SCARG(uap, offset); > - if (offset < 0 && vp->v_type != VCHR) > + if (offset < 0 && vp->v_type != VCHR) { > + FRELE(fp, p); > return (EINVAL); > - > - FREF(fp); > + } > > /* dofilereadv() will FRELE the descriptor for us */ > return (dofilereadv(p, fd, fp, SCARG(uap, iovp), SCARG(uap, iovcnt), 1, > @@ -3011,23 +3015,25 @@ sys_pwrite(struct proc *p, void *v, regi > off_t offset; > int fd = SCARG(uap, fd); > > + iov.iov_base = (void *)SCARG(uap, buf); > + iov.iov_len = SCARG(uap, nbyte); > + > if ((fp = fd_getfile_mode(fdp, fd, FWRITE)) == NULL) > return (EBADF); > + FREF(fp); > > vp = fp->f_data; > if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || > (vp->v_flag & VISTTY)) { > + FRELE(fp, p); > return (ESPIPE); > } > > - iov.iov_base = (void *)SCARG(uap, buf); > - iov.iov_len = SCARG(uap, nbyte); > - > offset = SCARG(uap, offset); > - if (offset < 0 && vp->v_type != VCHR) > + if (offset < 0 && vp->v_type != VCHR) { > + FRELE(fp, p); > return (EINVAL); > - > - FREF(fp); > + } > > /* dofilewritev() will FRELE the descriptor for us */ > return (dofilewritev(p, fd, fp, , 1, 0, , retval)); > @@ -3054,18 +3060,20 @@ sys_pwritev(struct proc *p, void *v, reg > > if ((fp = fd_getfile_mode(fdp, fd, FWRITE)) == NULL) > return (EBADF); > + FREF(fp); > > vp = fp->f_data; > if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || > (vp->v_flag & VISTTY)) { > + FRELE(fp, p); > return (ESPIPE); > } > > offset = SCARG(uap, offset); > - if (offset < 0 && vp->v_type != VCHR) > + if (offset < 0 && vp->v_type != VCHR) { > + FRELE(fp, p); > return (EINVAL); > - > - FREF(fp); > + } > > /* dofilewritev() will FRELE the descriptor for us */ > return (dofilewritev(p, fd, fp, SCARG(uap, iovp), SCARG(uap, iovcnt),
Re: FREF() earlier in getsock()
On Tue, Mar 27, 2018 at 11:24:00AM +0200, Martin Pieuchot wrote: > Here's a diff to move FREF() just after fd_getfile() in getsock(). > > As explained recently [0], I'd like to make sure all operations > manipulating a 'struct file *' do so with a properly refcounted > element. > > [0] https://marc.info/?l=openbsd-tech=152214234530708=2 > > Ok? OK bluhm@ > Index: kern/uipc_syscalls.c > === > RCS file: /cvs/src/sys/kern/uipc_syscalls.c,v > retrieving revision 1.167 > diff -u -p -r1.167 uipc_syscalls.c > --- kern/uipc_syscalls.c 21 Feb 2018 09:30:02 - 1.167 > +++ kern/uipc_syscalls.c 27 Mar 2018 09:19:37 - > @@ -1160,9 +1160,11 @@ getsock(struct proc *p, int fdes, struct > > if ((fp = fd_getfile(p->p_fd, fdes)) == NULL) > return (EBADF); > - if (fp->f_type != DTYPE_SOCKET) > - return (ENOTSOCK); > FREF(fp); > + if (fp->f_type != DTYPE_SOCKET) { > + FRELE(fp, p); > + return (ENOTSOCK); > + } > *fpp = fp; > > return (0);
Re: FREF() earlier in sys_flock()
On Tue, Mar 27, 2018 at 11:18:29AM +0200, Martin Pieuchot wrote: > FREF() is currently needed before any sleep point. That's because > there's almost no parallelism in the top half of the kernel. I'd > like to improve that. > > The first step is to make sure all operations manipulating a > 'struct file *' do so with a properly refcounted element. So here's > a diff moving FREF() just after fd_getfile() in sys_flock(). > > Ok? OK bluhm@ > Index: kern/kern_descrip.c > === > RCS file: /cvs/src/sys/kern/kern_descrip.c,v > retrieving revision 1.142 > diff -u -p -r1.142 kern_descrip.c > --- kern/kern_descrip.c 19 Feb 2018 08:59:52 - 1.142 > +++ kern/kern_descrip.c 27 Mar 2018 09:12:45 - > @@ -1205,9 +1205,11 @@ sys_flock(struct proc *p, void *v, regis > > if ((fp = fd_getfile(fdp, fd)) == NULL) > return (EBADF); > - if (fp->f_type != DTYPE_VNODE) > - return (EOPNOTSUPP); > FREF(fp); > + if (fp->f_type != DTYPE_VNODE) { > + error = EOPNOTSUPP; > + goto out; > + } > vp = fp->f_data; > lf.l_whence = SEEK_SET; > lf.l_start = 0;
Re: test(1): Fix integer overflows
Hi Tobias, Tobias Stoeckmann wrote on Tue, Mar 27, 2018 at 07:37:32PM +0200: > While at it, -t has a proper 0-INT_MAX limitation now as well. Looks correct. > Please note that we have overflows in pdksh's builtin test as well. > If bin/test is refactored, I'll try to unify both implementations > much closer to each other where possible. Makes sense to me, including doing one step at a time. > +static const char * > +getnstr(const char *s, int *signum, size_t *len) > +{ > + const char *p, *start; > + int sig; Drop this variable, it does no good but only harm. > + /* skip leading whitespaces */ > + p = s; > + while (isspace((unsigned char)*p)) > + p++; > + > + /* accept optional sign */ > + if (*p == '-') { > + *signum = -1; > + p++; > + } else { > + *signum = 1; > + if (*p == '+') > + p++; > + } > + > + /* skip leading zeros */ > + while (*p == '0' && isdigit((unsigned char)p[1])) > + p++; > + > + /* turn 0 always positive */ > + if (*p == '0' && !isdigit((unsigned char)p[1])) > + sig = 1; Make that: *signum = 1; > + start = p; > + while (isdigit((unsigned char)*p)) > + p++; > + *len = p - start; > + > + /* allow trailing whitespaces */ > + while (isspace((unsigned char)*p)) > + p++; > + > + /* validate number */ > + if (*p != '\0' || *start == '\0') > + errx(2, "%s: bad number", s); > + > + *signum = sig; That's bad, "sig" is usually uninitialized here. Just delete this line. > + return start; > +} > + > +static int > +intcmp(const char *opnd1, const char *opnd2) > +{ > + const char *p1, *p2; > + size_t len1, len2; > + int c, sig1, sig2; > + > + p1 = getnstr(opnd1, , ); > + p2 = getnstr(opnd2, , ); > + > + if (sig1 != sig2) > + c = sig1; > + else if (len1 != len2) > + c = (len1 < len2) ? -sig1 : sig1; > + else > + c = strncmp(p1, p2, len1); That's another bug: schwarze@isnote $ /bin/test -1 -gt -2 ; echo $? 1 schwarze@isnote $ /bin/test 1 -gt 2 ; echo $? 1 I think what you meant is: c = strncmp(p1, p2, len1) * sig1; With the above changes, i get the expected: schwarze@isnote $ /bin/test -1 -gt -2 ; echo $? 0 schwarze@isnote $ /obin/test 1 -gt 2 ; echo $? 1 With these changes, OK schwarze@. See below for the patch that i tested. Given that breaking test(1) might cause hard-to-find breakage deep down in build systems and scripts, you might with a to have a second OK for commit, though. Yours, Ingo Index: test.c === RCS file: /cvs/src/bin/test/test.c,v retrieving revision 1.18 diff -u -p -r1.18 test.c --- test.c 24 Jul 2017 22:15:52 - 1.18 +++ test.c 27 Mar 2018 20:30:47 - @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -145,6 +146,8 @@ static int aexpr(enum token n); static int nexpr(enum token n); static int binop(void); static int primary(enum token n); +static const char *getnstr(const char *, int *, size_t *); +static int intcmp(const char *, const char *); static int filstat(char *nm, enum token mode); static int getn(const char *s); static int newerf(const char *, const char *); @@ -290,6 +293,70 @@ primary(enum token n) return strlen(*t_wp) > 0; } +static const char * +getnstr(const char *s, int *signum, size_t *len) +{ + const char *p, *start; + + /* skip leading whitespaces */ + p = s; + while (isspace((unsigned char)*p)) + p++; + + /* accept optional sign */ + if (*p == '-') { + *signum = -1; + p++; + } else { + *signum = 1; + if (*p == '+') + p++; + } + + /* skip leading zeros */ + while (*p == '0' && isdigit((unsigned char)p[1])) + p++; + + /* turn 0 always positive */ + if (*p == '0' && !isdigit((unsigned char)p[1])) + *signum = 1; + + start = p; + while (isdigit((unsigned char)*p)) + p++; + *len = p - start; + + /* allow trailing whitespaces */ + while (isspace((unsigned char)*p)) + p++; + + /* validate number */ + if (*p != '\0' || *start == '\0') + errx(2, "%s: bad number", s); + + return start; +} + +static int +intcmp(const char *opnd1, const char *opnd2) +{ + const char *p1, *p2; + size_t len1, len2; + int c, sig1, sig2; + + p1 = getnstr(opnd1, , ); + p2 = getnstr(opnd2, , ); + + if (sig1 != sig2) + c = sig1; + else if (len1 != len2) + c = (len1 < len2) ? -sig1 : sig1; + else + c = strncmp(p1, p2, len1) *
Re: test(1): Fix integer overflows
Hi, On Tue, Mar 27, 2018 at 09:05:12PM +0200, Klemens Nanni wrote: > FWIW, see "test: Catch integer overflow, fail on trailing whitespaces" > from 24.07.2017 on tech@: > > https://marc.info/?l=openbsd-tech=150091968903042 sorry I didn't intend to ignore your mail. I didn't follow tech@ closely the last months. Your proposed patch is actually close to what I came up first -- never thought about dropping this whitespace support. Just as Ingo did, I have checked the specs here closely again and found this one: http://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu_chap02.html You can find in there this section: For the commands: x=010; echo $((x += 1)) the output must be 9. For the commands: x=' 1'; echo $((x += 1)) the results are unspecified. So, POSIX would allow us to do what we want, which means that your proposal could be done as well. I tried to see what ksh does, and if there is any way to trigger a situation in which ' 1' wouldn't be considered a number. But even this one works: $ x=' 1 ' $ echo $((x+1)) 2 $ _ Here are my two cents and feedback to your mail: If test(1) stops accepting ' 1 ' as a number, it's in conflict with ksh. I would prefer if test(1) and built-in test get closer to each other, not splitting their way of functioning further. But it's no strong opinion of mine. Having a very strict POSIX test support sounds appealing as well. Tobias
Re: test(1): Fix integer overflows
Hi Tobias, one quick remark regarding POSIX: The relevant text is http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html section 12.1 Utility Argument Syntax, number 6. Together with http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html we have the following consequences: * The minimum range we must support is 0 to 2147483647. * We are allowed to support a larger range if we want to. * Behaviour for integers out of the supported range appears to be unspecified. If i read that correctly, both our current implementation, and the new syntax and semantics with unlimited range that you propose, are allowed by POSIX. I agree that supporting an unlimited range is better and safer, in particular if it can be done without excessive complication of the code. > - Numbers are always base 10, regardless of starting with 0 Yes, that is definitely required. > - A number can start with arbitrary amounts of whitespaces > - Then a sign char is allowed (POSIX only mentions -) I don't think also accepting '+' does any harm - quite to the contrary, given that we currently accept it, we should continue doing so. > - An arbitrary amount of digits follows > - Then an arbitrary amount of whitespaces can follow I agree that discarding leading and trailing whitespace can do no harm. When unquoted, such whitespace is already stripped by argument parsing, so it seems less surprising to also ignore it when quoted. Note that i did not inspect your patch yet. Yours, Ingo
Re: test(1): Fix integer overflows
FWIW, see "test: Catch integer overflow, fail on trailing whitespaces" from 24.07.2017 on tech@: https://marc.info/?l=openbsd-tech=150091968903042
Re: use hashfree on pcb hash tables
Ignore for now. A lovely regress/ test exposes a problem in my diff. :) On Tue, Mar 27, 2018 at 01:11:19PM -0400, David Hill wrote: > Hello - > > The hash tables are allocated with hashinit, so free them with hashfree. > This gives the bonus of calling free() with a size as well. >
Re: disklabel,fsck_ffs: division by zero on invalid frag sizes
On Tue, Mar 27, 2018 at 12:46:03PM +0200, Tobias Stoeckmann wrote: > Resending this old diff. Adjusted to apply with -current source: > > Illegal fragmentation block sizes can trigger division by zero in the > disklabel and fsck_ffs tools. > > See this sequence of commands to reproduce: > > # dd if=/dev/zero of=nofrag.iso bs=1M count=1 > # vnconfig vnd0 nofrag.iso > # disklabel -e vnd0 > > Create 'a' and set fsize = bsize = 1. A line like this one will do > a: 20480 4.2BSD 1 1 1 > > # fsck_ffs vnd0a > ** /dev/vnd0a (vnd0a) > BAD SUPER BLOCK: MAGIC NUMBER WRONG > Floating point exception (core dumped) > # disklabel -E vnd0 > Label editor (enter '?' for help at any prompt) > > m a > offset: [0] > size: [2048] > FS type: [4.2BSD] > Floating point exception (core dumped) > # vnconfig -u vnd0 > > A fragmentation (block) size smaller than a sector size is not valid > while using "disklabel -E", and really doesn't make sense. While > using "disklabel -e", not all validation checks are performed, which > makes it possible to enter invalid values. > > If "disklabel -E" is used without the expert mode, fragmentation sizes > cannot be changed and will be just accepted from the parsed disklabel, > resulting in a division by zero if they are too small. > > And the same happens in fsck_ffs. Instead of coming up with a guessed > value in fsck_ffs, I think it's better to simply fail and let the user > fix the disklabel. After all, it shouldn't be fsck_ffs's duty to fix > faulty values outside the filesystem. This looks from reading the diff, but "fragmentation size" is not the correct term, see inline. -Otto > > Index: sbin/disklabel/disklabel.c > === > RCS file: /cvs/src/sbin/disklabel/disklabel.c,v > retrieving revision 1.227 > diff -u -p -u -p -r1.227 disklabel.c > --- sbin/disklabel/disklabel.c25 Feb 2018 17:24:44 - 1.227 > +++ sbin/disklabel/disklabel.c27 Mar 2018 10:42:59 - > @@ -1100,9 +1100,24 @@ gottype: > > case FS_BSDFFS: > NXTNUM(fsize, fsize, ); > - if (fsize == 0) > + if (fsize < lp->d_secsize || > + (fsize % lp->d_secsize) != 0) { > + warnx("line %d: " > + "bad fragmentation size: %s", Should be "fragment size" > + lineno, cp); > + errors++; > break; > + } > NXTNUM(v, v, ); > + if (v < fsize || (fsize != v && > + fsize * 2 != v && fsize * 4 != v && > + fsize * 8 != v)) { > + warnx("line %d: " > + "bad block size: %s", > + lineno, cp); > + errors++; > + break; > + } > pp->p_fragblock = > DISKLABELV1_FFS_FRAGBLOCK(fsize, v / fsize); > NXTNUM(pp->p_cpg, pp->p_cpg, ); > Index: sbin/disklabel/editor.c > === > RCS file: /cvs/src/sbin/disklabel/editor.c,v > retrieving revision 1.327 > diff -u -p -u -p -r1.327 editor.c > --- sbin/disklabel/editor.c 8 Mar 2018 22:05:17 - 1.327 > +++ sbin/disklabel/editor.c 27 Mar 2018 10:42:59 - > @@ -2069,16 +2069,16 @@ get_bsize(struct disklabel *lp, int part > if (pp->p_fstype != FS_BSDFFS) > return (0); > > + frag = DISKLABELV1_FFS_FRAG(pp->p_fragblock); > + fsize = DISKLABELV1_FFS_FSIZE(pp->p_fragblock); > + > /* Avoid dividing by zero... */ > - if (pp->p_fragblock == 0) > - return(1); > + if (frag * fsize < lp->d_secsize) > + return (1); > > if (!expert) > goto align; > > - fsize = DISKLABELV1_FFS_FSIZE(pp->p_fragblock); > - frag = DISKLABELV1_FFS_FRAG(pp->p_fragblock); > - > for (;;) { > ui = getuint64(lp, "block size", > "Size of ffs blocks. 1, 2, 4 or 8 times ffs fragment size.", > @@ -2119,8 +2119,7 @@ align: > orig_size = DL_GETPSIZE(pp); > orig_offset = DL_GETPOFFSET(pp); > > - bsize = (DISKLABELV1_FFS_FRAG(pp->p_fragblock) * > - DISKLABELV1_FFS_FSIZE(pp->p_fragblock)) / lp->d_secsize; > + bsize = (frag * fsize) / lp->d_secsize; > if (DL_GETPOFFSET(pp) != starting_sector) { > /* Can't change offset of first partition. */ > adj = bsize - (DL_GETPOFFSET(pp) %
test(1): Fix integer overflows
The test(1) utility is prone to integer overflows on 64 bit systems. By using strtol and casting the result to an int, it is possible to run tests which succeed even though they should fail: $ arch OpenBSD.amd64 $ /bin/test 4294967296 -eq 0; echo $? 0 I could have chosen the way of FreeBSD, NetBSD, or built-in test of pdksh and extend the number to intmax_t, but I actually prefered the way of coreutils, i.e. validating the number as strings. This version allows numbers based on the rules which I figured out by reading strtol's man page and POSIX's rather vague specification: - Numbers are always base 10, regardless of starting with 0 - A number can start with arbitrary amounts of whitespaces - Then a sign char is allowed (POSIX only mentions -) - An arbitrary amount of digits follows - Then an arbitrary amount of whitespaces can follow Therefore, comparing numbers can be done in a human-understandable form as strings: 1. If signum differs, the smaller number is the one having a - 2. Leading zeros are skipped 3. If both numbers are positive, the one with more digits is larger 4. If both numbers are negative, the one with more digits is smaller 5. Equality is based on a simple string comparison I have extended the regression test suite. The last two additions fail with current implementation. While at it, -t has a proper 0-INT_MAX limitation now as well. This is quite a refactoring so tests are very appreciated. Please note that we have overflows in pdksh's builtin test as well. If bin/test is refactored, I'll try to unify both implementations much closer to each other where possible. Tobias Index: bin/test/test.c === RCS file: /cvs/src/bin/test/test.c,v retrieving revision 1.18 diff -u -p -u -p -r1.18 test.c --- bin/test/test.c 24 Jul 2017 22:15:52 - 1.18 +++ bin/test/test.c 27 Mar 2018 17:23:51 - @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -145,6 +146,8 @@ static int aexpr(enum token n); static int nexpr(enum token n); static int binop(void); static int primary(enum token n); +static const char *getnstr(const char *, int *, size_t *); +static int intcmp(const char *, const char *); static int filstat(char *nm, enum token mode); static int getn(const char *s); static int newerf(const char *, const char *); @@ -290,6 +293,72 @@ primary(enum token n) return strlen(*t_wp) > 0; } +static const char * +getnstr(const char *s, int *signum, size_t *len) +{ + const char *p, *start; + int sig; + + /* skip leading whitespaces */ + p = s; + while (isspace((unsigned char)*p)) + p++; + + /* accept optional sign */ + if (*p == '-') { + *signum = -1; + p++; + } else { + *signum = 1; + if (*p == '+') + p++; + } + + /* skip leading zeros */ + while (*p == '0' && isdigit((unsigned char)p[1])) + p++; + + /* turn 0 always positive */ + if (*p == '0' && !isdigit((unsigned char)p[1])) + sig = 1; + + start = p; + while (isdigit((unsigned char)*p)) + p++; + *len = p - start; + + /* allow trailing whitespaces */ + while (isspace((unsigned char)*p)) + p++; + + /* validate number */ + if (*p != '\0' || *start == '\0') + errx(2, "%s: bad number", s); + + *signum = sig; + return start; +} + +static int +intcmp(const char *opnd1, const char *opnd2) +{ + const char *p1, *p2; + size_t len1, len2; + int c, sig1, sig2; + + p1 = getnstr(opnd1, , ); + p2 = getnstr(opnd2, , ); + + if (sig1 != sig2) + c = sig1; + else if (len1 != len2) + c = (len1 < len2) ? -sig1 : sig1; + else + c = strncmp(p1, p2, len1); + + return c; +} + static int binop(void) { @@ -313,17 +382,17 @@ binop(void) case STRGT: return strcmp(opnd1, opnd2) > 0; case INTEQ: - return getn(opnd1) == getn(opnd2); + return intcmp(opnd1, opnd2) == 0; case INTNE: - return getn(opnd1) != getn(opnd2); + return intcmp(opnd1, opnd2) != 0; case INTGE: - return getn(opnd1) >= getn(opnd2); + return intcmp(opnd1, opnd2) >= 0; case INTGT: - return getn(opnd1) > getn(opnd2); + return intcmp(opnd1, opnd2) > 0; case INTLE: - return getn(opnd1) <= getn(opnd2); + return intcmp(opnd1, opnd2) <= 0; case INTLT: - return getn(opnd1) < getn(opnd2); + return intcmp(opnd1, opnd2) < 0; case FILNT: return newerf(opnd1, opnd2); case FILOT: @@ -455,22 +524,26 @@ t_lex(char *s) static int
use hashfree on pcb hash tables
Hello - The hash tables are allocated with hashinit, so free them with hashfree. This gives the bonus of calling free() with a size as well. OK? Index: in_pcb.c === RCS file: /cvs/src/sys/netinet/in_pcb.c,v retrieving revision 1.228 diff -u -p -r1.228 in_pcb.c --- in_pcb.c19 Feb 2018 08:59:53 - 1.228 +++ in_pcb.c27 Mar 2018 17:06:51 - @@ -997,20 +997,21 @@ in_pcbrehash(struct inpcb *inp) int in_pcbresize(struct inpcbtable *table, int hashsize) { - u_long nhash, nlhash; + u_long nhash, nlhash, ohash, olhash; void *nhashtbl, *nlhashtbl, *ohashtbl, *olhashtbl; struct inpcb *inp0, *inp1; ohashtbl = table->inpt_hashtbl; + ohash = table->inpt_hash; olhashtbl = table->inpt_lhashtbl; + olhash = table->inpt_lhash; nhashtbl = hashinit(hashsize, M_PCB, M_NOWAIT, ); + if (nhashtbl == NULL) + return (ENOBUFS); nlhashtbl = hashinit(hashsize, M_PCB, M_NOWAIT, ); - if (nhashtbl == NULL || nlhashtbl == NULL) { - if (nhashtbl != NULL) - free(nhashtbl, M_PCB, 0); - if (nlhashtbl != NULL) - free(nlhashtbl, M_PCB, 0); + if (nlhashtbl == NULL) { + hashfree(nhashtbl, nhash, M_PCB); return (ENOBUFS); } table->inpt_hashtbl = nhashtbl; @@ -1022,8 +1023,8 @@ in_pcbresize(struct inpcbtable *table, i TAILQ_FOREACH_SAFE(inp0, >inpt_queue, inp_queue, inp1) { in_pcbrehash(inp0); } - free(ohashtbl, M_PCB, 0); - free(olhashtbl, M_PCB, 0); + hashfree(ohashtbl, ohash, M_PCB); + hashfree(olhashtbl, olhash, M_PCB); return (0); }
netinet bcopy -> memcpy/memmove
Hello - A few bcopy conversions to memcpy where the memory does not overlap, otherwise memmove. OK? Index: netinet/ip_ah.c === RCS file: /cvs/src/sys/netinet/ip_ah.c,v retrieving revision 1.138 diff -u -p -r1.138 ip_ah.c --- netinet/ip_ah.c 14 Mar 2018 22:38:46 - 1.138 +++ netinet/ip_ah.c 27 Mar 2018 16:10:03 - @@ -899,8 +899,8 @@ ah_input_cb(struct cryptop *crp) * mbuf...do an overlapping copy of the * remainder of the mbuf over the ESP header. */ - bcopy(mtod(m1, u_char *) + roff + rplen + - ahx->authsize, mtod(m1, u_char *) + roff, + memmove(mtod(m1, u_char *) + roff, + mtod(m1, u_char *) + roff + rplen + ahx->authsize, m1->m_len - (roff + rplen + ahx->authsize)); m1->m_len -= rplen + ahx->authsize; m->m_pkthdr.len -= rplen + ahx->authsize; Index: netinet/tcp_subr.c === RCS file: /cvs/src/sys/netinet/tcp_subr.c,v retrieving revision 1.169 diff -u -p -r1.169 tcp_subr.c --- netinet/tcp_subr.c 18 Mar 2018 21:05:21 - 1.169 +++ netinet/tcp_subr.c 27 Mar 2018 16:10:03 - @@ -328,11 +328,11 @@ tcp_respond(struct tcpcb *tp, caddr_t te th = (struct tcphdr *)(ip6 + 1); tlen = sizeof(*ip6) + sizeof(*th); if (th0) { - bcopy(template, ip6, sizeof(*ip6)); - bcopy(th0, th, sizeof(*th)); + memcpy(ip6, template, sizeof(*ip6)); + memcpy(th, th0, sizeof(*th)); xchg(ip6->ip6_dst, ip6->ip6_src, struct in6_addr); } else { - bcopy(template, ip6, tlen); + memcpy(ip6, template, tlen); } break; #endif /* INET6 */ @@ -341,11 +341,11 @@ tcp_respond(struct tcpcb *tp, caddr_t te th = (struct tcphdr *)(ip + 1); tlen = sizeof(*ip) + sizeof(*th); if (th0) { - bcopy(template, ip, sizeof(*ip)); - bcopy(th0, th, sizeof(*th)); + memcpy(ip, template, sizeof(*ip)); + memcpy(th, th0, sizeof(*th)); xchg(ip->ip_dst.s_addr, ip->ip_src.s_addr, u_int32_t); } else { - bcopy(template, ip, tlen); + memcpy(ip, template, tlen); } break; } @@ -952,7 +952,7 @@ tcp_signature_tdb_init(struct tdb *tdbp, tdbp->tdb_amxkey = malloc(ii->ii_authkeylen, M_XDATA, M_NOWAIT); if (tdbp->tdb_amxkey == NULL) return (ENOMEM); - bcopy(ii->ii_authkey, tdbp->tdb_amxkey, ii->ii_authkeylen); + memcpy(tdbp->tdb_amxkey, ii->ii_authkey, ii->ii_authkeylen); tdbp->tdb_amxkeylen = ii->ii_authkeylen; return (0); Index: netinet/udp_usrreq.c === RCS file: /cvs/src/sys/netinet/udp_usrreq.c,v retrieving revision 1.245 diff -u -p -r1.245 udp_usrreq.c --- netinet/udp_usrreq.c1 Dec 2017 10:33:33 - 1.245 +++ netinet/udp_usrreq.c27 Mar 2018 16:10:03 - @@ -295,8 +295,8 @@ udp_input(struct mbuf **mp, int *offp, i } /* remove the UDP header */ - bcopy(mtod(m, u_char *), - mtod(m, u_char *) + sizeof(struct udphdr), iphlen); + memmove(mtod(m, u_char *) + sizeof(struct udphdr), + mtod(m, u_char *), iphlen); m_adj(m, sizeof(struct udphdr)); skip -= sizeof(struct udphdr);
Re: NFS vs NET_LOCK()
On Tue, Mar 27, 2018 at 11:00:20AM +0200, Martin Pieuchot wrote: > Diff below prevents a future lock ordering problem between NFSnode > locks (similar to Inode locks) and the NET_LOCK(). > > It ensures the NET_LOCK() is always locked *after* any NFSnode lock > by fixing the UVM fault case. So we have always have: > VFS -> NFS -> NFSnode lock -> socket -> NET_LOCK(). > > Ok? > > Index: uvm/uvm_vnode.c > === > RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v > retrieving revision 1.99 > diff -u -p -r1.99 uvm_vnode.c > --- uvm/uvm_vnode.c 8 Mar 2018 22:04:18 - 1.99 > +++ uvm/uvm_vnode.c 20 Mar 2018 12:58:17 - > @@ -1105,6 +1105,7 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t > off_t file_offset; > int waitf, result, mapinflags; > size_t got, wanted; > + int netlocked = 0; > > /* init values */ > waitf = (flags & PGO_SYNCIO) ? M_WAITOK : M_NOWAIT; > @@ -1174,18 +1175,24 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t >* Ideally, this kind of operation *should* work. >*/ > result = 0; > - if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) > - result = vn_lock(vn, LK_EXCLUSIVE | LK_RECURSEFAIL, curproc); > - > - if (result == 0) { > - int netlocked = (rw_status() == RW_WRITE); > - > + if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) { > /* >* This process may already have the NET_LOCK(), if we >* faulted in copyin() or copyout() in the network stack. >*/ > - if (netlocked) > + if (rw_status() == RW_WRITE) { > + netlocked = 1; > + NET_UNLOCK(); > + } > + > + result = vn_lock(vn, LK_EXCLUSIVE | LK_RECURSEFAIL, curproc); > + } > + > + if (result == 0) { > + if (!netlocked && (rw_status() == RW_WRITE)) { > + netlocked = 1; > NET_UNLOCK(); > + } > > /* NOTE: vnode now locked! */ > if (rw == UIO_READ) > @@ -1195,11 +1202,12 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t > (flags & PGO_PDFREECLUST) ? IO_NOCACHE : 0, > curproc->p_ucred); > > + if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) > + VOP_UNLOCK(vn, curproc); > + > if (netlocked) > NET_LOCK(); > > - if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) > - VOP_UNLOCK(vn, curproc); > } This can leave NET_LOCK() unrestored if vn_lock() returns non-zero.
disklabel,fsck_ffs: division by zero on invalid frag sizes
Resending this old diff. Adjusted to apply with -current source: Illegal fragmentation block sizes can trigger division by zero in the disklabel and fsck_ffs tools. See this sequence of commands to reproduce: # dd if=/dev/zero of=nofrag.iso bs=1M count=1 # vnconfig vnd0 nofrag.iso # disklabel -e vnd0 Create 'a' and set fsize = bsize = 1. A line like this one will do a: 20480 4.2BSD 1 1 1 # fsck_ffs vnd0a ** /dev/vnd0a (vnd0a) BAD SUPER BLOCK: MAGIC NUMBER WRONG Floating point exception (core dumped) # disklabel -E vnd0 Label editor (enter '?' for help at any prompt) > m a offset: [0] size: [2048] FS type: [4.2BSD] Floating point exception (core dumped) # vnconfig -u vnd0 A fragmentation (block) size smaller than a sector size is not valid while using "disklabel -E", and really doesn't make sense. While using "disklabel -e", not all validation checks are performed, which makes it possible to enter invalid values. If "disklabel -E" is used without the expert mode, fragmentation sizes cannot be changed and will be just accepted from the parsed disklabel, resulting in a division by zero if they are too small. And the same happens in fsck_ffs. Instead of coming up with a guessed value in fsck_ffs, I think it's better to simply fail and let the user fix the disklabel. After all, it shouldn't be fsck_ffs's duty to fix faulty values outside the filesystem. Index: sbin/disklabel/disklabel.c === RCS file: /cvs/src/sbin/disklabel/disklabel.c,v retrieving revision 1.227 diff -u -p -u -p -r1.227 disklabel.c --- sbin/disklabel/disklabel.c 25 Feb 2018 17:24:44 - 1.227 +++ sbin/disklabel/disklabel.c 27 Mar 2018 10:42:59 - @@ -1100,9 +1100,24 @@ gottype: case FS_BSDFFS: NXTNUM(fsize, fsize, ); - if (fsize == 0) + if (fsize < lp->d_secsize || + (fsize % lp->d_secsize) != 0) { + warnx("line %d: " + "bad fragmentation size: %s", + lineno, cp); + errors++; break; + } NXTNUM(v, v, ); + if (v < fsize || (fsize != v && + fsize * 2 != v && fsize * 4 != v && + fsize * 8 != v)) { + warnx("line %d: " + "bad block size: %s", + lineno, cp); + errors++; + break; + } pp->p_fragblock = DISKLABELV1_FFS_FRAGBLOCK(fsize, v / fsize); NXTNUM(pp->p_cpg, pp->p_cpg, ); Index: sbin/disklabel/editor.c === RCS file: /cvs/src/sbin/disklabel/editor.c,v retrieving revision 1.327 diff -u -p -u -p -r1.327 editor.c --- sbin/disklabel/editor.c 8 Mar 2018 22:05:17 - 1.327 +++ sbin/disklabel/editor.c 27 Mar 2018 10:42:59 - @@ -2069,16 +2069,16 @@ get_bsize(struct disklabel *lp, int part if (pp->p_fstype != FS_BSDFFS) return (0); + frag = DISKLABELV1_FFS_FRAG(pp->p_fragblock); + fsize = DISKLABELV1_FFS_FSIZE(pp->p_fragblock); + /* Avoid dividing by zero... */ - if (pp->p_fragblock == 0) - return(1); + if (frag * fsize < lp->d_secsize) + return (1); if (!expert) goto align; - fsize = DISKLABELV1_FFS_FSIZE(pp->p_fragblock); - frag = DISKLABELV1_FFS_FRAG(pp->p_fragblock); - for (;;) { ui = getuint64(lp, "block size", "Size of ffs blocks. 1, 2, 4 or 8 times ffs fragment size.", @@ -2119,8 +2119,7 @@ align: orig_size = DL_GETPSIZE(pp); orig_offset = DL_GETPOFFSET(pp); - bsize = (DISKLABELV1_FFS_FRAG(pp->p_fragblock) * - DISKLABELV1_FFS_FSIZE(pp->p_fragblock)) / lp->d_secsize; + bsize = (frag * fsize) / lp->d_secsize; if (DL_GETPOFFSET(pp) != starting_sector) { /* Can't change offset of first partition. */ adj = bsize - (DL_GETPOFFSET(pp) % bsize); Index: sbin/fsck_ffs/setup.c === RCS file: /cvs/src/sbin/fsck_ffs/setup.c,v retrieving revision 1.64 diff -u -p -u -p -r1.64 setup.c --- sbin/fsck_ffs/setup.c 5 Jan 2018 09:33:47 - 1.64 +++ sbin/fsck_ffs/setup.c 27 Mar 2018 10:42:59 - @@ -636,6
cut: Fix segmentation fault
This patch fixes an out of boundary write in cut: $ cut -c 2147483648 - Segmentation fault (core dumped) The supplied number can be parsed by strtol, but the result is casted into a signed int, therefore turning negative. Afterwards, it is used as an offset into a char array to write data at the given position. This leads to the shown segmentation fault. I have changed the strtol call to strtonum, making sure that range specifications still work. Our cut regress tests pass. Also it has a nicer error message now, thanks to strtonum. While at it, I replaced a manual for-loop with memset. Index: usr.bin/cut/cut.c === RCS file: /cvs/src/usr.bin/cut/cut.c,v retrieving revision 1.23 diff -u -p -u -p -r1.23 cut.c --- usr.bin/cut/cut.c 2 Dec 2015 00:56:46 - 1.23 +++ usr.bin/cut/cut.c 27 Mar 2018 11:24:31 - @@ -154,11 +154,32 @@ int autostart, autostop, maxval; char positions[_POSIX2_LINE_MAX + 1]; +int +read_number(char **p) +{ + size_t pos; + int dash, n; + const char *errstr; + char *q; + + q = *p + strcspn(*p, "-"); + dash = *q == '-'; + *q = '\0'; + n = strtonum(*p, 1, _POSIX2_LINE_MAX, ); + if (errstr != NULL) + errx(1, "[-bcf] list: %s %s (allowed 1-%d)", *p, errstr, + _POSIX2_LINE_MAX); + if (dash) + *q = '-'; + *p = q; + + return n; +} + void get_list(char *list) { int setautostart, start, stop; - char *pos; char *p; /* @@ -176,13 +197,15 @@ get_list(char *list) setautostart = 1; } if (isdigit((unsigned char)*p)) { - start = stop = strtol(p, , 10); + start = stop = read_number(); if (setautostart && start > autostart) autostart = start; } if (*p == '-') { - if (isdigit((unsigned char)p[1])) - stop = strtol(p + 1, , 10); + if (isdigit((unsigned char)p[1])) { + ++p; + stop = read_number(); + } if (*p == '-') { ++p; if (!autostop || autostop > stop) @@ -193,13 +216,10 @@ get_list(char *list) errx(1, "[-bcf] list: illegal list value"); if (!stop || !start) errx(1, "[-bcf] list: values may not include zero"); - if (stop > _POSIX2_LINE_MAX) - errx(1, "[-bcf] list: %d too large (max %d)", - stop, _POSIX2_LINE_MAX); if (maxval < stop) maxval = stop; - for (pos = positions + start; start++ <= stop; *pos++ = 1) - ; + if (start <= stop) + memset(positions + start, 1, stop - start + 1); } /* overlapping ranges */
-current: thunar cores when opening document
Hi, most of the time I've observed that thunar ends with a core if I start an application (in my case OpenOffice or envince) using the context menu or by clicking twice on the icon. core file has the size 0 (no write access?) Kind regards Felix OpenBSD 6.3 (GENERIC.MP) #102: Fri Mar 23 00:06:18 MDT 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 6262734848 (5972MB) avail mem = 6065856512 (5784MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdac05000 (48 entries) bios0: vendor LENOVO version "H4ET92WW (2.52 )" date 04/18/2013 bios0: LENOVO 3358AWG acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC ASF! HPET APIC MCFG FPDT SSDT SSDT UEFI UEFI MSDM UEFI DBG2 acpi0: wakeup devices P0P1(S4) EHC1(S3) EHC2(S3) XHC_(S3) HDEF(S4) RP04(S4) PXSX(S4) RP06(S4) PXSX(S4) BLAN(S4) PEG0(S4) PEGP(S4) PEG1(S4) PEG2(S4) PEG3(S4) LID_(S4) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz, 1796.23 MHz cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,AVX,F16C,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache acpihpet0: recalibrated TSC frequency 1895693609 Hz cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz, 1795.92 MHz cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,AVX,F16C,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 1, core 0, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz, 1795.92 MHz cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,AVX,F16C,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz, 1795.92 MHz cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,XSAVE,AVX,F16C,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,IBRS,IBPB,STIBP,SENSOR,ARAT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins acpimcfg0 at acpi0 addr 0xf800, bus 0-63 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (P0P1) acpiprt2 at acpi0: bus 2 (RP01) acpiprt3 at acpi0: bus 3 (RP02) acpiprt4 at acpi0: bus 4 (RP03) acpiprt5 at acpi0: bus -1 (RP04) acpiprt6 at acpi0: bus -1 (RP05) acpiprt7 at acpi0: bus 9 (RP06) acpiprt8 at acpi0: bus -1 (RP07) acpiprt9 at acpi0: bus -1 (RP08) acpiprt10 at acpi0: bus -1 (PEG0) acpiprt11 at acpi0: bus -1 (PEG1) acpiprt12 at acpi0: bus -1 (PEG2) acpiprt13 at acpi0: bus -1 (PEG3) acpiec0 at acpi0 acpicpu0 at acpi0: C2(500@59 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu1 at acpi0: C2(500@59 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu2 at acpi0: C2(500@59 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpicpu3 at acpi0: C2(500@59 mwait.1@0x10), C1(1000@1 mwait.1), PSS acpitz0 at acpi0: critical temperature is 99 degC "INT3F0D" at acpi0 not configured "MSF0001" at acpi0 not configured "LEN0052" at acpi0 not configured acpithinkpad0 at acpi0 acpiac0 at acpi0: AC unit online acpibat0 at acpi0: BAT1 model "45N1174" serial 912 type LION oem "SANYO" acpibtn0 at acpi0: LID_ "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured acpibtn1 at acpi0: PWRB acpivideo0 at acpi0: GFX0 acpivout at acpivideo0 not configured cpu0: Enhanced SpeedStep 1796 MHz: speeds: 1901, 1900, 1800, 1700, 1600, 1500, 1400, 1300, 1200, 1100, 1000, 900, 800, 779 MHz pci0 at mainbus0 bus 0 pchb0 at pci0 dev 0 function 0 "Intel Core 3G Host" rev 0x09 inteldrm0 at pci0 dev 2 function 0 "Intel HD Graphics 4000" rev 0x09 drm0 at inteldrm0
FREF() earlier in sys_pread() & sys_pwrite()
Here's a diff to move FREF() just after fd_getfile_mode() in sys_pread(), sys_preadv(), sys_pwrite() and sys_pwritev(). As explained recently [0], I'd like to make sure all operations manipulating a 'struct file *' do so with a properly refcounted element. [0] https://marc.info/?l=openbsd-tech=152214234530708=2 Ok? Index: kern/vfs_syscalls.c === RCS file: /cvs/src/sys/kern/vfs_syscalls.c,v retrieving revision 1.276 diff -u -p -r1.276 vfs_syscalls.c --- kern/vfs_syscalls.c 19 Feb 2018 08:59:52 - 1.276 +++ kern/vfs_syscalls.c 27 Mar 2018 09:24:51 - @@ -2930,23 +2930,25 @@ sys_pread(struct proc *p, void *v, regis off_t offset; int fd = SCARG(uap, fd); + iov.iov_base = SCARG(uap, buf); + iov.iov_len = SCARG(uap, nbyte); + if ((fp = fd_getfile_mode(fdp, fd, FREAD)) == NULL) return (EBADF); + FREF(fp); vp = fp->f_data; if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || (vp->v_flag & VISTTY)) { + FRELE(fp, p); return (ESPIPE); } - iov.iov_base = SCARG(uap, buf); - iov.iov_len = SCARG(uap, nbyte); - offset = SCARG(uap, offset); - if (offset < 0 && vp->v_type != VCHR) + if (offset < 0 && vp->v_type != VCHR) { + FRELE(fp, p); return (EINVAL); - - FREF(fp); + } /* dofilereadv() will FRELE the descriptor for us */ return (dofilereadv(p, fd, fp, , 1, 0, , retval)); @@ -2973,18 +2975,20 @@ sys_preadv(struct proc *p, void *v, regi if ((fp = fd_getfile_mode(fdp, fd, FREAD)) == NULL) return (EBADF); + FREF(fp); vp = fp->f_data; if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || (vp->v_flag & VISTTY)) { + FRELE(fp, p); return (ESPIPE); } offset = SCARG(uap, offset); - if (offset < 0 && vp->v_type != VCHR) + if (offset < 0 && vp->v_type != VCHR) { + FRELE(fp, p); return (EINVAL); - - FREF(fp); + } /* dofilereadv() will FRELE the descriptor for us */ return (dofilereadv(p, fd, fp, SCARG(uap, iovp), SCARG(uap, iovcnt), 1, @@ -3011,23 +3015,25 @@ sys_pwrite(struct proc *p, void *v, regi off_t offset; int fd = SCARG(uap, fd); + iov.iov_base = (void *)SCARG(uap, buf); + iov.iov_len = SCARG(uap, nbyte); + if ((fp = fd_getfile_mode(fdp, fd, FWRITE)) == NULL) return (EBADF); + FREF(fp); vp = fp->f_data; if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || (vp->v_flag & VISTTY)) { + FRELE(fp, p); return (ESPIPE); } - iov.iov_base = (void *)SCARG(uap, buf); - iov.iov_len = SCARG(uap, nbyte); - offset = SCARG(uap, offset); - if (offset < 0 && vp->v_type != VCHR) + if (offset < 0 && vp->v_type != VCHR) { + FRELE(fp, p); return (EINVAL); - - FREF(fp); + } /* dofilewritev() will FRELE the descriptor for us */ return (dofilewritev(p, fd, fp, , 1, 0, , retval)); @@ -3054,18 +3060,20 @@ sys_pwritev(struct proc *p, void *v, reg if ((fp = fd_getfile_mode(fdp, fd, FWRITE)) == NULL) return (EBADF); + FREF(fp); vp = fp->f_data; if (fp->f_type != DTYPE_VNODE || vp->v_type == VFIFO || (vp->v_flag & VISTTY)) { + FRELE(fp, p); return (ESPIPE); } offset = SCARG(uap, offset); - if (offset < 0 && vp->v_type != VCHR) + if (offset < 0 && vp->v_type != VCHR) { + FRELE(fp, p); return (EINVAL); - - FREF(fp); + } /* dofilewritev() will FRELE the descriptor for us */ return (dofilewritev(p, fd, fp, SCARG(uap, iovp), SCARG(uap, iovcnt),
FREF() earlier in getsock()
Here's a diff to move FREF() just after fd_getfile() in getsock(). As explained recently [0], I'd like to make sure all operations manipulating a 'struct file *' do so with a properly refcounted element. [0] https://marc.info/?l=openbsd-tech=152214234530708=2 Ok? Index: kern/uipc_syscalls.c === RCS file: /cvs/src/sys/kern/uipc_syscalls.c,v retrieving revision 1.167 diff -u -p -r1.167 uipc_syscalls.c --- kern/uipc_syscalls.c21 Feb 2018 09:30:02 - 1.167 +++ kern/uipc_syscalls.c27 Mar 2018 09:19:37 - @@ -1160,9 +1160,11 @@ getsock(struct proc *p, int fdes, struct if ((fp = fd_getfile(p->p_fd, fdes)) == NULL) return (EBADF); - if (fp->f_type != DTYPE_SOCKET) - return (ENOTSOCK); FREF(fp); + if (fp->f_type != DTYPE_SOCKET) { + FRELE(fp, p); + return (ENOTSOCK); + } *fpp = fp; return (0);
FREF() earlier in sys_flock()
FREF() is currently needed before any sleep point. That's because there's almost no parallelism in the top half of the kernel. I'd like to improve that. The first step is to make sure all operations manipulating a 'struct file *' do so with a properly refcounted element. So here's a diff moving FREF() just after fd_getfile() in sys_flock(). Ok? Index: kern/kern_descrip.c === RCS file: /cvs/src/sys/kern/kern_descrip.c,v retrieving revision 1.142 diff -u -p -r1.142 kern_descrip.c --- kern/kern_descrip.c 19 Feb 2018 08:59:52 - 1.142 +++ kern/kern_descrip.c 27 Mar 2018 09:12:45 - @@ -1205,9 +1205,11 @@ sys_flock(struct proc *p, void *v, regis if ((fp = fd_getfile(fdp, fd)) == NULL) return (EBADF); - if (fp->f_type != DTYPE_VNODE) - return (EOPNOTSUPP); FREF(fp); + if (fp->f_type != DTYPE_VNODE) { + error = EOPNOTSUPP; + goto out; + } vp = fp->f_data; lf.l_whence = SEEK_SET; lf.l_start = 0;
Kill nfs_hashlock
This lock is here to prevent a race between multiple threads wanting to insert and element in the RB-tree. This race is possible because both getnewvnode() and pool_get() can wait. However this lock adds some lock ordering problems as soon as we'll try to use proper NFSnode locks. I'd prefer not to have to deal with such problems, so change the code below to check for a race after sleeping. Ok? Index: nfs/nfs_node.c === RCS file: /cvs/src/sys/nfs/nfs_node.c,v retrieving revision 1.65 diff -u -p -r1.65 nfs_node.c --- nfs/nfs_node.c 27 Sep 2016 01:37:38 - 1.65 +++ nfs/nfs_node.c 27 Mar 2018 08:48:44 - @@ -58,8 +58,6 @@ struct pool nfs_node_pool; extern int prtactive; -struct rwlock nfs_hashlock = RWLOCK_INITIALIZER("nfshshlk"); - /* XXX */ extern struct vops nfs_vops; @@ -98,12 +96,10 @@ nfs_nget(struct mount *mnt, nfsfh_t *fh, nmp = VFSTONFS(mnt); loop: - rw_enter_write(_hashlock); find.n_fhp = fh; find.n_fhsize = fhsize; np = RBT_FIND(nfs_nodetree, >nm_ntree, ); if (np != NULL) { - rw_exit_write(_hashlock); vp = NFSTOV(np); error = vget(vp, LK_EXCLUSIVE, p); if (error) @@ -120,25 +116,24 @@ loop: * to see if this nfsnode has been added while we did not hold * the lock. */ - rw_exit_write(_hashlock); error = getnewvnode(VT_NFS, mnt, _vops, ); /* note that we don't have this vnode set up completely yet */ - rw_enter_write(_hashlock); if (error) { *npp = NULL; - rw_exit_write(_hashlock); return (error); } nvp->v_flag |= VLARVAL; - np = RBT_FIND(nfs_nodetree, >nm_ntree, ); - if (np != NULL) { + np = pool_get(_node_pool, PR_WAITOK | PR_ZERO); + /* +* getnewvnode() and pool_get() can sleep, check for race. +*/ + if (RBT_FIND(nfs_nodetree, >nm_ntree, ) != NULL) { + pool_put(_node_pool, np); vgone(nvp); - rw_exit_write(_hashlock); goto loop; } vp = nvp; - np = pool_get(_node_pool, PR_WAITOK | PR_ZERO); vp->v_data = np; /* we now have an nfsnode on this vnode */ vp->v_flag &= ~VLARVAL; @@ -162,7 +157,6 @@ loop: np2 = RBT_INSERT(nfs_nodetree, >nm_ntree, np); KASSERT(np2 == NULL); np->n_accstamp = -1; - rw_exit(_hashlock); *npp = np; return (0); @@ -239,9 +233,7 @@ nfs_reclaim(void *v) ap->a_vp); #endif nmp = VFSTONFS(vp->v_mount); - rw_enter_write(_hashlock); RBT_REMOVE(nfs_nodetree, >nm_ntree, np); - rw_exit_write(_hashlock); if (np->n_rcred) crfree(np->n_rcred);
NFS vs NET_LOCK()
Diff below prevents a future lock ordering problem between NFSnode locks (similar to Inode locks) and the NET_LOCK(). It ensures the NET_LOCK() is always locked *after* any NFSnode lock by fixing the UVM fault case. So we have always have: VFS -> NFS -> NFSnode lock -> socket -> NET_LOCK(). Ok? Index: uvm/uvm_vnode.c === RCS file: /cvs/src/sys/uvm/uvm_vnode.c,v retrieving revision 1.99 diff -u -p -r1.99 uvm_vnode.c --- uvm/uvm_vnode.c 8 Mar 2018 22:04:18 - 1.99 +++ uvm/uvm_vnode.c 20 Mar 2018 12:58:17 - @@ -1105,6 +1105,7 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t off_t file_offset; int waitf, result, mapinflags; size_t got, wanted; + int netlocked = 0; /* init values */ waitf = (flags & PGO_SYNCIO) ? M_WAITOK : M_NOWAIT; @@ -1174,18 +1175,24 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t * Ideally, this kind of operation *should* work. */ result = 0; - if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) - result = vn_lock(vn, LK_EXCLUSIVE | LK_RECURSEFAIL, curproc); - - if (result == 0) { - int netlocked = (rw_status() == RW_WRITE); - + if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) { /* * This process may already have the NET_LOCK(), if we * faulted in copyin() or copyout() in the network stack. */ - if (netlocked) + if (rw_status() == RW_WRITE) { + netlocked = 1; + NET_UNLOCK(); + } + + result = vn_lock(vn, LK_EXCLUSIVE | LK_RECURSEFAIL, curproc); + } + + if (result == 0) { + if (!netlocked && (rw_status() == RW_WRITE)) { + netlocked = 1; NET_UNLOCK(); + } /* NOTE: vnode now locked! */ if (rw == UIO_READ) @@ -1195,11 +1202,12 @@ uvn_io(struct uvm_vnode *uvn, vm_page_t (flags & PGO_PDFREECLUST) ? IO_NOCACHE : 0, curproc->p_ucred); + if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) + VOP_UNLOCK(vn, curproc); + if (netlocked) NET_LOCK(); - if ((uvn->u_flags & UVM_VNODE_VNISLOCKED) == 0) - VOP_UNLOCK(vn, curproc); } /* NOTE: vnode now unlocked (unless vnislocked) */
Re: [patch] autofs for OpenBSD
On 27/03/18(Tue) 13:02, Tomohiro Kusumi wrote: > 2018-03-27 0:57 GMT+09:00 Martin Pieuchot: > > On 25/03/18(Sun) 03:53, Tomohiro Kusumi wrote: > [...] > > - Since file(1) on OpenBSD is privilege separated and pledge(2)'d do we > > really need fstyp(8)? What does it buy us? > > Nothing really. I can make this a part of usr.sbin/autofs rather than as > a stand alone command if this bothers OpenBSD. While this command > is a generic command, some features are specifically needed by autofs. > fstyp(8) also supports multi-volumes filesystem such as HAMMER. > > By the way, I can drop exfat.c (a copy from below) if OpenBSD does not > prefer to have exFAT related code. > https://github.com/freebsd/freebsd/blob/master/usr.sbin/fstyp/exfat.c Having it part of autofs(1) would make more sense to me. You can leave exfat code around, hopefully we'll get support for it at some point ;) > > - Why do we need two daemons? Can we merge the functionalities of > > autounmountd(8) into automountd(8)? > > automountd(8) blocks at ioctl(2) via /dev/autofs waiting for mount requests. > autounmountd(8) blocks waiting for kqueue(2) events. > I recommend to have these separated for simplicity. It is simple for the developer to have two programs not for the user. Having separated processes for doing separate actions is a good thing and it will help pledging them. However it doesn't make sense to have to start 2 daemons to do one thing: using autofs. We have many privilege separated daemons that fork and do multiple things. You could look at them. I'm quite sure other BSDs would also welcome such change :) It reduce the number of scripts and keep most of the logic in one place. > >> [...] > >> 3. The "-media" map is currently unsupported due to > >>etc/autofs/special_media not being functional. The same script in > >>FreeBSD uses GEOM sysctl, whereas there seems to be no easy way to > >>achieve the same result on OpenBSD. See the bottom part of > >>etc/autofs/special_media for details. I'd apply if this is doable > >>on OpenBSD, which is to list all possible block devices and > >>partitions for fstyp to check filesystem type. > > > > That's a crucial point. If that's for removable media, how does the > > deamon know when a USB key for example, is inserted? Can't you use > > /dev/hotplug and disklabel(8)? > > The requirement here is to get the list of block devices rather than > catching hardware events, but if the kernel doesn't provide a way to > retrieve such information, it boils down to low level things like > hotplug, searching dmesg, etc. I don't understand when do you have to get such list? But what you're describing seems to be: $ sysctl hw.disknames hw.disknames=sd0:72519550243ad632 $ disklabel 72519550243ad632 # /dev/rsd0c: type: SCSI disk: SCSI disk label: SR CRYPTO duid: 72519550243ad631 flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 31129 total sectors: 500102858 boundstart: 64 boundend: 500087385 drivedata: 0 16 partitions: #size offset fstype [fsize bsize cpg] a: 2097152 64 4.2BSD 2048 16384 1 # / b: 16614720 2097216swap# none c:5001028580 unused d: 8388608 18711936 4.2BSD 2048 16384 1 # /tmp e: 8387040 27100544 4.2BSD 2048 16384 1 # /var f: 4194304 67670016 4.2BSD 2048 16384 1 # /usr g: 2097152 71864320 4.2BSD 2048 16384 1 # /usr/X11R6 h: 20971520 73961472 4.2BSD 2048 16384 1 # /usr/local i: 8397088 94932992 4.2BSD 2048 16384 1 # /usr/src j: 4209024103330080 4.2BSD 2048 16384 1 # /usr/xenocara k: 2088448107539104 4.2BSD 2048 16384 1 # /usr/www l: 8385920109627552 4.2BSD 2048 16384 1 # /usr/ports m: 12578912118013472 4.2BSD 2048 16384 12958 # /usr/obj n: 83875360130592384 4.2BSD 2048 16384 1 # /usr/hack o: 20964800214467744 4.2BSD 2048 16384 12958 # /cvs p:264654816235432544 4.2BSD 2048 16384 1 # /home