Re: rtfree(): "rt->rt_refcnt > 0" assertion
On Sat, Oct 02, 2021 at 11:27:48AM +0200, Martin Pieuchot wrote: > On 15/09/21(Wed) 01:23, Vitaliy Makkoveev wrote: > > We have weird `rt_refcnt' check in rtfree(): > > > > > 497 rtfree(struct rtentry *rt) > > 498 { > > ... > > 504 refcnt = (int)atomic_dec_int_nv(>rt_refcnt); > > 505 if (refcnt <= 0) { > > 506 KASSERT(!ISSET(rt->rt_flags, RTF_UP)); > > 507 KASSERT(!RT_ROOT(rt)); > > 508 atomic_dec_int(); > > 509 if (refcnt < 0) { > > 510 printf("rtfree: %p not freed (neg refs)\n", rt); > > 511 return; > > 512 } > > > > We underflow `rt_refcnt' when we missed to get reference to this `rt' or > > we did extra release. This is the bug which should be exposed. But > > according current code this `rt' is just leaked. Also it's easy to miss > > this error condition because we only print error message. > > Yes, all of this is intentional. If you screw up reference counting a > leak is better than a crash that you can't debug. At least people can > report there is a leak. > Sorry, I don't assume the leak more helpful for debug. This error message could be easily missed and not reported, but the panic could not be ignored. > > I propose to put "rt->rt_refcnt > 0" assertion before we decrement > > `rt_refcnt'. This makes reference counting errors more notable when they > > are. Also I changed `rt_refcnt' definition to unsigned integer. > > Such assert isn't safe because the value can be dereferenced by another > thread. You can only assert for the value of a private variable or if > you're under a critical section. > May be I'm wrong but the sequence: KASSERT(rt->rt_refcnt > 0); if (atomic_dec_int_nv(>rt_refcnt) == 0) { should not be reordered. This is the 'int' variable, so the 'read' (or 'load') and 'atomic decrement' commands could not interrupt each other. We could have the case when many rtfree(9) threads hit this assertion, but we could not have the situation when no one rtfree(9) thread hits this. The critical section is not required here. We have the same logic is other places, like closef(): 1230 closef(struct file *fp, struct proc *p) 1231 { 1232 struct filedesc *fdp; 1233 1234 if (fp == NULL) 1235 return (0); 1236 1237 KASSERTMSG(fp->f_count >= 2, "count (%u) < 2", fp->f_count); 1238 1239 atomic_dec_int(>f_count); > > I didn't find any "rtfree: ... not freed (neg refs)" report, so it > > looks like we can't hit this assertion, but I like to commit this > > diff after release. But nothing stops to test it and provide feedback :) > > We did hit it many many times in the past. Please do not change this. > I searched "rtfree: ... not freed (neg refs)" by google but didn't find anything. I propose to put this diff into the snaps before commit. We should instantly receive reports if we have refcounting issues. Also we should instantly receive reports if someone introduced them in the future. > > Index: sys/net/route.c > > === > > RCS file: /cvs/src/sys/net/route.c,v > > retrieving revision 1.399 > > diff -u -p -r1.399 route.c > > --- sys/net/route.c 25 May 2021 22:45:09 - 1.399 > > +++ sys/net/route.c 14 Sep 2021 21:47:11 - > > @@ -496,20 +496,15 @@ rtref(struct rtentry *rt) > > void > > rtfree(struct rtentry *rt) > > { > > - int refcnt; > > - > > if (rt == NULL) > > return; > > > > - refcnt = (int)atomic_dec_int_nv(>rt_refcnt); > > - if (refcnt <= 0) { > > + KASSERT(rt->rt_refcnt > 0); > > + > > + if (atomic_dec_int_nv(>rt_refcnt) == 0) { > > KASSERT(!ISSET(rt->rt_flags, RTF_UP)); > > KASSERT(!RT_ROOT(rt)); > > atomic_dec_int(); > > - if (refcnt < 0) { > > - printf("rtfree: %p not freed (neg refs)\n", rt); > > - return; > > - } > > > > KERNEL_LOCK(); > > rt_timer_remove_all(rt); > > Index: sys/net/route.h > > === > > RCS file: /cvs/src/sys/net/route.h,v > > retrieving revision 1.185 > > diff -u -p -r1.185 route.h > > --- sys/net/route.h 17 Mar 2021 09:05:42 - 1.185 > > +++ sys/net/route.h 14 Sep 2021 21:47:11 - > > @@ -113,7 +113,7 @@ struct rtentry { > > struct rt_kmetrics rt_rmx; /* metrics used by rx'ing protocols */ > > unsigned int rt_ifidx; /* the answer: interface to use */ > > unsigned int rt_flags; /* up/down?, host/net */ > > - int rt_refcnt; /* # held references */ > > + unsigned int rt_refcnt; /* # held references */ > > int rt_plen; /* prefix length */ > > uint16_t rt_labelid;/* route label ID */ > > uint8_t rt_priority; /* routing
Re: rtfree(): "rt->rt_refcnt > 0" assertion
On 15/09/21(Wed) 01:23, Vitaliy Makkoveev wrote: > We have weird `rt_refcnt' check in rtfree(): > > 497 rtfree(struct rtentry *rt) > 498 { > ... > 504 refcnt = (int)atomic_dec_int_nv(>rt_refcnt); > 505 if (refcnt <= 0) { > 506 KASSERT(!ISSET(rt->rt_flags, RTF_UP)); > 507 KASSERT(!RT_ROOT(rt)); > 508 atomic_dec_int(); > 509 if (refcnt < 0) { > 510 printf("rtfree: %p not freed (neg refs)\n", rt); > 511 return; > 512 } > > We underflow `rt_refcnt' when we missed to get reference to this `rt' or > we did extra release. This is the bug which should be exposed. But > according current code this `rt' is just leaked. Also it's easy to miss > this error condition because we only print error message. Yes, all of this is intentional. If you screw up reference counting a leak is better than a crash that you can't debug. At least people can report there is a leak. > I propose to put "rt->rt_refcnt > 0" assertion before we decrement > `rt_refcnt'. This makes reference counting errors more notable when they > are. Also I changed `rt_refcnt' definition to unsigned integer. Such assert isn't safe because the value can be dereferenced by another thread. You can only assert for the value of a private variable or if you're under a critical section. > I didn't find any "rtfree: ... not freed (neg refs)" report, so it > looks like we can't hit this assertion, but I like to commit this > diff after release. But nothing stops to test it and provide feedback :) We did hit it many many times in the past. Please do not change this. > Index: sys/net/route.c > === > RCS file: /cvs/src/sys/net/route.c,v > retrieving revision 1.399 > diff -u -p -r1.399 route.c > --- sys/net/route.c 25 May 2021 22:45:09 - 1.399 > +++ sys/net/route.c 14 Sep 2021 21:47:11 - > @@ -496,20 +496,15 @@ rtref(struct rtentry *rt) > void > rtfree(struct rtentry *rt) > { > - int refcnt; > - > if (rt == NULL) > return; > > - refcnt = (int)atomic_dec_int_nv(>rt_refcnt); > - if (refcnt <= 0) { > + KASSERT(rt->rt_refcnt > 0); > + > + if (atomic_dec_int_nv(>rt_refcnt) == 0) { > KASSERT(!ISSET(rt->rt_flags, RTF_UP)); > KASSERT(!RT_ROOT(rt)); > atomic_dec_int(); > - if (refcnt < 0) { > - printf("rtfree: %p not freed (neg refs)\n", rt); > - return; > - } > > KERNEL_LOCK(); > rt_timer_remove_all(rt); > Index: sys/net/route.h > === > RCS file: /cvs/src/sys/net/route.h,v > retrieving revision 1.185 > diff -u -p -r1.185 route.h > --- sys/net/route.h 17 Mar 2021 09:05:42 - 1.185 > +++ sys/net/route.h 14 Sep 2021 21:47:11 - > @@ -113,7 +113,7 @@ struct rtentry { > struct rt_kmetrics rt_rmx; /* metrics used by rx'ing protocols */ > unsigned int rt_ifidx; /* the answer: interface to use */ > unsigned int rt_flags; /* up/down?, host/net */ > - int rt_refcnt; /* # held references */ > + unsigned int rt_refcnt; /* # held references */ > int rt_plen; /* prefix length */ > uint16_t rt_labelid;/* route label ID */ > uint8_t rt_priority; /* routing priority to use */ >
rtfree(): "rt->rt_refcnt > 0" assertion
We have weird `rt_refcnt' check in rtfree(): 497 rtfree(struct rtentry *rt) 498 { ... 504 refcnt = (int)atomic_dec_int_nv(>rt_refcnt); 505 if (refcnt <= 0) { 506 KASSERT(!ISSET(rt->rt_flags, RTF_UP)); 507 KASSERT(!RT_ROOT(rt)); 508 atomic_dec_int(); 509 if (refcnt < 0) { 510 printf("rtfree: %p not freed (neg refs)\n", rt); 511 return; 512 } We underflow `rt_refcnt' when we missed to get reference to this `rt' or we did extra release. This is the bug which should be exposed. But according current code this `rt' is just leaked. Also it's easy to miss this error condition because we only print error message. I propose to put "rt->rt_refcnt > 0" assertion before we decrement `rt_refcnt'. This makes reference counting errors more notable when they are. Also I changed `rt_refcnt' definition to unsigned integer. I didn't find any "rtfree: ... not freed (neg refs)" report, so it looks like we can't hit this assertion, but I like to commit this diff after release. But nothing stops to test it and provide feedback :) Index: sys/net/route.c === RCS file: /cvs/src/sys/net/route.c,v retrieving revision 1.399 diff -u -p -r1.399 route.c --- sys/net/route.c 25 May 2021 22:45:09 - 1.399 +++ sys/net/route.c 14 Sep 2021 21:47:11 - @@ -496,20 +496,15 @@ rtref(struct rtentry *rt) void rtfree(struct rtentry *rt) { - int refcnt; - if (rt == NULL) return; - refcnt = (int)atomic_dec_int_nv(>rt_refcnt); - if (refcnt <= 0) { + KASSERT(rt->rt_refcnt > 0); + + if (atomic_dec_int_nv(>rt_refcnt) == 0) { KASSERT(!ISSET(rt->rt_flags, RTF_UP)); KASSERT(!RT_ROOT(rt)); atomic_dec_int(); - if (refcnt < 0) { - printf("rtfree: %p not freed (neg refs)\n", rt); - return; - } KERNEL_LOCK(); rt_timer_remove_all(rt); Index: sys/net/route.h === RCS file: /cvs/src/sys/net/route.h,v retrieving revision 1.185 diff -u -p -r1.185 route.h --- sys/net/route.h 17 Mar 2021 09:05:42 - 1.185 +++ sys/net/route.h 14 Sep 2021 21:47:11 - @@ -113,7 +113,7 @@ struct rtentry { struct rt_kmetrics rt_rmx; /* metrics used by rx'ing protocols */ unsigned int rt_ifidx; /* the answer: interface to use */ unsigned int rt_flags; /* up/down?, host/net */ - int rt_refcnt; /* # held references */ + unsigned int rt_refcnt; /* # held references */ int rt_plen; /* prefix length */ uint16_t rt_labelid;/* route label ID */ uint8_t rt_priority; /* routing priority to use */