Re: sendsyslog(2) man page
On Mon, Mar 21, 2016 at 5:02 PM, Alexander Bluhmwrote: > there are several improvements for the sendsyslog(2) man page > floating around. I have put them into a single diff. Do we want > all of them? Looks good to me; refinement can happen in tree. ok guenther@
Re: bgpd: fix adding a new interface and network inet connected
On 21/03/2016 19:04, Peter Hessler wrote: On 2016 Mar 21 (Mon) at 16:22:53 +0100 (+0100), Claudio Jeker wrote: :On Mon, Mar 21, 2016 at 03:54:38PM +0100, Peter Hessler wrote: :> We ran into a situation where we accidentally blackholed traffic going to :> a new Internet Exchange. When we added the new vlans and new peers, the :> nexthop address on that vlan was *not* our neighbor's address, but :> instead used our own IP address on that new interface. "bgpctl show rib", :> showed the correct IP, but the wrong one was used when installing into :> the fib. :> :> Restarting bgpd installed the correct nexthop. :> :> Additionally, a user had reported that "network inet connected" didn't :> work any more. This fix also convinces bgpd to redistribute the network :> as soon as it is created. The underlying cause was a combination of :> missing the F_CONNECTED flag, as well as having an ifindex of 0, which :> is lo0's index. :> :> OK? : :I noticed this breakage as well some time ago but never managed to debug :it (too many production systems and not enough test systems). :This was caused by the change of cloning routes to show the IP address :instead of the link local one. :Now about the diff I'm not 100% sure. It for sure changes beheviour a bit. :ifindex is now always set and not only for connected routes. Not sure if :this is bad or not... :mpath should probably be set to 0 in the RTF_CONNECTED case to keep same :behaviour. On the other hand it is now possible to have multiple routes to :the same network even for connected routes (not sure if it is possible on :the same prio or not). : :IMO this is a more conservative version... : : case AF_INET: : case AF_INET6: : if (rtm->rtm_flags & RTF_CONNECTED) { : flags |= F_CONNECTED; : ifindex = rtm->rtm_index; : sa = NULL; : mpath = 0; : } : break; : :I think this is less dangerous or what you think? : As I think about this more, I think this should go in. We can make it perfect later. OK Happy to lend time and resource to this - I am in the process of building such a network on OpenBSD and will likely hit this, so any resource or testing is available should it be required. :> Index: kroute.c :> === :> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v :> retrieving revision 1.207 :> diff -u -p -u -p -r1.207 kroute.c :> --- kroute.c 5 Dec 2015 18:28:04 - 1.207 :> +++ kroute.c 21 Mar 2016 14:44:52 - :> @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt :> sa = NULL; :> mpath = 0; /* link local stuff can't be mpath */ :> break; :> + case AF_INET: :> + case AF_INET6: :> + if (rtm->rtm_flags & RTF_CONNECTED) :> + flags |= F_CONNECTED; :> + ifindex = rtm->rtm_index; :> + break; :> } :> :> if (rtm->rtm_type == RTM_DELETE) { :> :> :> :> :> -- :> If you are a fatalist, what can you do about it? :> -- Ann Edwards-Duff : :-- ::wq Claudio :
Re: arm: purge arm9e, too
On Mon, Mar 21, 2016 at 11:15:48AM +0100, Patrick Wildt wrote: > Hi, > > I would like to get rid of even more unused CPUs, so we end up with only > armish, zaurus (armv5) and armv7. This diff removes ARM9E, but I also > have diffs prepared to get rid of ARM10 and ARM11. Tested here, and OK bmercer@ Looking forward to the next two.
sendsyslog(2) man page
Hi, there are several improvements for the sendsyslog(2) man page floating around. I have put them into a single diff. Do we want all of them? bluhm Index: lib/libc/sys/sendsyslog.2 === RCS file: /data/mirror/openbsd/cvs/src/lib/libc/sys/sendsyslog.2,v retrieving revision 1.6 diff -u -p -r1.6 sendsyslog.2 --- lib/libc/sys/sendsyslog.2 21 Mar 2016 22:41:28 - 1.6 +++ lib/libc/sys/sendsyslog.2 21 Mar 2016 23:54:55 - @@ -28,34 +28,42 @@ .Sh DESCRIPTION The .Fn sendsyslog -functions is used to transmit a +function is used to transmit a .Xr syslog 3 -formatted message direct to +formatted message directly to .Xr syslogd 8 without requiring the allocation of a socket. -.Pp -The -.Fa flags -argument of -.Fn sendsyslog -accepts the -.Dv LOG_CONS -flag. If .Dv LOG_CONS -is specified, and +is specified in the +.Fa flags +argument, and .Xr syslogd 8 is not accepting messages, the message will be sent directly to the console. This is used internally by .Xr syslog_r 3 , so that messages can be sent during difficult situations. +If sending to +.Xr syslogd 8 +fails, dropped messages are counted. +When +.Xr syslogd 8 +works again, a warning with the counter and error number is logged. .Sh RETURN VALUES .Rv -std .Sh ERRORS .Fn sendsyslog can fail if: .Bl -tag -width Er +.It Bq Er EFAULT +An invalid user space address was specified for a parameter. +.It Bq Er EMSGSIZE +The socket requires that message be sent atomically, and the size +of the message to be sent made this impossible. +.It Bq Er ENOBUFS +The system was unable to allocate an internal buffer. +The operation may succeed when buffers become available. .It Bq Er ENOTCONN The message cannot be sent, likely because .Xr syslogd 8 @@ -69,3 +77,7 @@ The .Fn sendsyslog function call appeared in .Ox 5.6 . +The +.Fa flags +argument was added in +.Ox 6.0 .
armv7 unused vm_page_md members
Hi, compile tested. -Artturi Index: sys/arch/arm/include/pmap.h === RCS file: /cvs/src/sys/arch/arm/include/pmap.h,v retrieving revision 1.38 diff -u -p -u -r1.38 pmap.h --- sys/arch/arm/include/pmap.h 19 Mar 2016 09:36:57 - 1.38 +++ sys/arch/arm/include/pmap.h 21 Mar 2016 23:33:06 - @@ -800,6 +800,7 @@ extern uint32_t pmap_alias_bits; struct vm_page_md { struct pv_entry *pvh_list; /* pv_entry list */ int pvh_attrs; /* page attributes */ +#if !defined(CPU_ARMv7) u_int uro_mappings; u_int urw_mappings; union { @@ -809,8 +810,10 @@ struct vm_page_md { #definekro_mappingsk_u.s_mappings[0] #definekrw_mappingsk_u.s_mappings[1] #definek_mappings k_u.i_mappings +#endif /* !CPU_ARMv7 */ }; +#if !defined(CPU_ARMv7) #defineVM_MDPAGE_INIT(pg) \ do { \ (pg)->mdpage.pvh_list = NULL; \ @@ -819,6 +822,13 @@ do { \ (pg)->mdpage.urw_mappings = 0; \ (pg)->mdpage.k_mappings = 0;\ } while (/*CONSTCOND*/0) +#else +#defineVM_MDPAGE_INIT(pg) \ +do { \ + (pg)->mdpage.pvh_list = NULL; \ + (pg)->mdpage.pvh_attrs = 0; \ +} while (/*CONSTCOND*/0) +#endif /* CPU_ARMv7 */ #endif /* _LOCORE */ #endif /* _ARM_PMAP_H_ */
Re: uvm: shrinking amap kmem consumption
> Date: Mon, 21 Mar 2016 20:02:28 +0100 > From: Stefan Kempf> > Recently we found that amaps consume a good deal of kernel address space. > See this thread: https://marc.info/?l=openbsd-tech=145752756005014=2. > And we found a way to reduce kernel mem pressure for some architectures > at least. See the diffs in that thread. > > Besides that, it's possible to shrink the amap struct from 72 to 48 bytes > on 64 bit systems (or from 44 to 32 bytes on 32 bit systems). > > It's also possible to cut down the memory needed for slots roughly in half > on 64 bit architectures, and cut it up to a factor of 3 on 32 bit machines. > > Here's how amap slots are maintained currently: for every slot, the kernel > allocates one pointer to a vm_anon and two ints (16 bytes per slot on > 64 bit systems, 12 bytes for 32 CPUs). > > To reduce these memory requirements, we need three flavors of amaps: > > - Tiny amaps with only one slot store the pointer to the vm_anon in the > amap directly. The two ints are not needed. This was Theo's idea. > > - Small amaps with up to 32 slots need 8 instead of 16 bytes per slot > (or 4 bytes instead of 12 on 32 bit machines). > It's enough to store the array of anons. The two ints per slot are > not needed. > > Tiny and small amaps are the ones used most often. > > - Normal amaps with n >= 32 slots only need > 4 * sizeof(pointer) + n * sizeof(struct vm_anon *) + 12*n/32 bytes > to maintain amap slots. For large n that's also around 1.8 times > less memory for slots (or about 2.7 times less on 32 bit CPUs) compared > to the current implementation. > That memory is for the vm_anon array, and a header structure. The two > ints per slot in the current code are replaced by n/32 bitmaps. > > But that also means that the amap layer has to do some special-casing > when dealing with amaps. I tried to factor out the common code where > possible. > > How does it work? Basically, the *current* slots in an amap consist > of three arrays (for n slots, every array has size n): > - am_anon: an array of vm_anon > - am_slots: an array of indices that store which entries in am_anon are > not NULL. This is for quickly traversing all occupied entries in > the amap > - am_bckptr: maps a non-NULL entry in am_anon to an index in am_slots > Needed to update am_slots when setting an entry in am_anon to NULL. > > We can compress am_slots and am_bckptr to bitmaps of 32 bits each, > so we only need n/32 entries for am_slots and am_bckptr. > > For amaps with up to 32 slots, we do not need am_slots and am_bckptr: > it's possible to store the bitmap of used slots in the amap directly. > > I see no difference in doing a make -j4 build with this on an amd64 > T430s (so things don't seem to get slower here with this). Tests on > other systems and architectures are much appreciated. > > The diff below contains a proof of concept, but I'll also post smaller > diffs that are easier to review in the next days. > > Comments/thoughts about doing this? Is it really worth having three flavours? If having only two (tiny+normal?) simplifies the code considerable and doesn't result in much more memory being used, that may be preferable. The amaps are one of the roadblocks on the way to make uvm more mpsafe. And keeping the code simple will make that easier. > Index: uvm/uvm_amap.c > === > RCS file: /cvs/src/sys/uvm/uvm_amap.c,v > retrieving revision 1.62 > diff -u -p -r1.62 uvm_amap.c > --- uvm/uvm_amap.c16 Mar 2016 16:53:43 - 1.62 > +++ uvm/uvm_amap.c21 Mar 2016 18:53:45 - > @@ -53,21 +53,31 @@ > struct pool uvm_amap_pool; > > /* Pools for amap slots for the most common amap slot sizes */ > -struct pool uvm_amap_slot_pools[UVM_AMAP_CHUNK]; > +struct pool uvm_amap_slot_pools[UVM_AMAP_CHUNK - 1]; > > LIST_HEAD(, vm_amap) amap_list; > > -static char amap_slot_pool_names[UVM_AMAP_CHUNK][13]; > - > -#define MALLOC_SLOT_UNIT (2 * sizeof(int) + sizeof(struct vm_anon *)) > +static char amap_slot_pool_names[UVM_AMAP_CHUNK - 1][13]; > > /* > * local functions > */ > > -static struct vm_amap *amap_alloc1(int, int, int); > +struct vm_anon **amap_slots_alloc(struct vm_amap *, u_int, int, > +struct vm_amap_meta **); > +static struct vm_amap *amap_alloc1(u_int, u_int, int); > static __inline void amap_list_insert(struct vm_amap *); > static __inline void amap_list_remove(struct vm_amap *); > +static __inline void amap_anon_release(struct vm_anon *); > +void amap_fill_slot(struct vm_amap *, u_int); > +void amap_normal_clear_slot(struct vm_amap *, u_int); > +void amap_clear_slot(struct vm_amap *, u_int); > +static __inline void amap_normal_wipe_slot(struct vm_amap *, u_int); > + > +void amap_wipeout_traverse(struct vm_anon **, u_int, u_int); > +int amap_cow_now_traverse(struct vm_anon **, u_int, u_int); > +int amap_swap_off_traverse(struct vm_amap *, struct vm_anon **,
uvm: shrinking amap kmem consumption
Recently we found that amaps consume a good deal of kernel address space. See this thread: https://marc.info/?l=openbsd-tech=145752756005014=2. And we found a way to reduce kernel mem pressure for some architectures at least. See the diffs in that thread. Besides that, it's possible to shrink the amap struct from 72 to 48 bytes on 64 bit systems (or from 44 to 32 bytes on 32 bit systems). It's also possible to cut down the memory needed for slots roughly in half on 64 bit architectures, and cut it up to a factor of 3 on 32 bit machines. Here's how amap slots are maintained currently: for every slot, the kernel allocates one pointer to a vm_anon and two ints (16 bytes per slot on 64 bit systems, 12 bytes for 32 CPUs). To reduce these memory requirements, we need three flavors of amaps: - Tiny amaps with only one slot store the pointer to the vm_anon in the amap directly. The two ints are not needed. This was Theo's idea. - Small amaps with up to 32 slots need 8 instead of 16 bytes per slot (or 4 bytes instead of 12 on 32 bit machines). It's enough to store the array of anons. The two ints per slot are not needed. Tiny and small amaps are the ones used most often. - Normal amaps with n >= 32 slots only need 4 * sizeof(pointer) + n * sizeof(struct vm_anon *) + 12*n/32 bytes to maintain amap slots. For large n that's also around 1.8 times less memory for slots (or about 2.7 times less on 32 bit CPUs) compared to the current implementation. That memory is for the vm_anon array, and a header structure. The two ints per slot in the current code are replaced by n/32 bitmaps. But that also means that the amap layer has to do some special-casing when dealing with amaps. I tried to factor out the common code where possible. How does it work? Basically, the *current* slots in an amap consist of three arrays (for n slots, every array has size n): - am_anon: an array of vm_anon - am_slots: an array of indices that store which entries in am_anon are not NULL. This is for quickly traversing all occupied entries in the amap - am_bckptr: maps a non-NULL entry in am_anon to an index in am_slots Needed to update am_slots when setting an entry in am_anon to NULL. We can compress am_slots and am_bckptr to bitmaps of 32 bits each, so we only need n/32 entries for am_slots and am_bckptr. For amaps with up to 32 slots, we do not need am_slots and am_bckptr: it's possible to store the bitmap of used slots in the amap directly. I see no difference in doing a make -j4 build with this on an amd64 T430s (so things don't seem to get slower here with this). Tests on other systems and architectures are much appreciated. The diff below contains a proof of concept, but I'll also post smaller diffs that are easier to review in the next days. Comments/thoughts about doing this? Index: uvm/uvm_amap.c === RCS file: /cvs/src/sys/uvm/uvm_amap.c,v retrieving revision 1.62 diff -u -p -r1.62 uvm_amap.c --- uvm/uvm_amap.c 16 Mar 2016 16:53:43 - 1.62 +++ uvm/uvm_amap.c 21 Mar 2016 18:53:45 - @@ -53,21 +53,31 @@ struct pool uvm_amap_pool; /* Pools for amap slots for the most common amap slot sizes */ -struct pool uvm_amap_slot_pools[UVM_AMAP_CHUNK]; +struct pool uvm_amap_slot_pools[UVM_AMAP_CHUNK - 1]; LIST_HEAD(, vm_amap) amap_list; -static char amap_slot_pool_names[UVM_AMAP_CHUNK][13]; - -#define MALLOC_SLOT_UNIT (2 * sizeof(int) + sizeof(struct vm_anon *)) +static char amap_slot_pool_names[UVM_AMAP_CHUNK - 1][13]; /* * local functions */ -static struct vm_amap *amap_alloc1(int, int, int); +struct vm_anon **amap_slots_alloc(struct vm_amap *, u_int, int, +struct vm_amap_meta **); +static struct vm_amap *amap_alloc1(u_int, u_int, int); static __inline void amap_list_insert(struct vm_amap *); static __inline void amap_list_remove(struct vm_amap *); +static __inline void amap_anon_release(struct vm_anon *); +void amap_fill_slot(struct vm_amap *, u_int); +void amap_normal_clear_slot(struct vm_amap *, u_int); +void amap_clear_slot(struct vm_amap *, u_int); +static __inline void amap_normal_wipe_slot(struct vm_amap *, u_int); + +void amap_wipeout_traverse(struct vm_anon **, u_int, u_int); +int amap_cow_now_traverse(struct vm_anon **, u_int, u_int); +int amap_swap_off_traverse(struct vm_amap *, struct vm_anon **, u_int, u_int, +int, int); static __inline void amap_list_insert(struct vm_amap *amap) @@ -81,6 +91,99 @@ amap_list_remove(struct vm_amap *amap) LIST_REMOVE(amap, am_list); } +static __inline void +amap_anon_release(struct vm_anon *anon) +{ + int refs; + + refs = --anon->an_ref; + if (refs == 0) { + /* we had the last reference to a vm_anon. free it. */ + uvm_anfree(anon); + } +} + +void +amap_fill_slot(struct vm_amap *amap, u_int slot) +{ + u_int clust, ptr; + struct vm_amap_meta *meta; + struct
send window update - decrementing snd_wnd by bytes acked
Hi all, We are contemplating this in FreeBSD land and thought I'd take your opinion. https://lists.freebsd.org/pipermail/freebsd-transport/2016-March/000100.html Is there a reason for snd_wnd decrease here? Cheers, Hiren ps: keep me cc'd as I am not subscribed. pgpwnplmaR_Ai.pgp Description: PGP signature
Re: bgpd: fix adding a new interface and network inet connected
On 2016 Mar 21 (Mon) at 16:22:53 +0100 (+0100), Claudio Jeker wrote: :On Mon, Mar 21, 2016 at 03:54:38PM +0100, Peter Hessler wrote: :> We ran into a situation where we accidentally blackholed traffic going to :> a new Internet Exchange. When we added the new vlans and new peers, the :> nexthop address on that vlan was *not* our neighbor's address, but :> instead used our own IP address on that new interface. "bgpctl show rib", :> showed the correct IP, but the wrong one was used when installing into :> the fib. :> :> Restarting bgpd installed the correct nexthop. :> :> Additionally, a user had reported that "network inet connected" didn't :> work any more. This fix also convinces bgpd to redistribute the network :> as soon as it is created. The underlying cause was a combination of :> missing the F_CONNECTED flag, as well as having an ifindex of 0, which :> is lo0's index. :> :> OK? : :I noticed this breakage as well some time ago but never managed to debug :it (too many production systems and not enough test systems). :This was caused by the change of cloning routes to show the IP address :instead of the link local one. :Now about the diff I'm not 100% sure. It for sure changes beheviour a bit. :ifindex is now always set and not only for connected routes. Not sure if :this is bad or not... :mpath should probably be set to 0 in the RTF_CONNECTED case to keep same :behaviour. On the other hand it is now possible to have multiple routes to :the same network even for connected routes (not sure if it is possible on :the same prio or not). : :IMO this is a more conservative version... : : case AF_INET: : case AF_INET6: : if (rtm->rtm_flags & RTF_CONNECTED) { : flags |= F_CONNECTED; : ifindex = rtm->rtm_index; : sa = NULL; : mpath = 0; : } : break; : :I think this is less dangerous or what you think? : As I think about this more, I think this should go in. We can make it perfect later. OK :> Index: kroute.c :> === :> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v :> retrieving revision 1.207 :> diff -u -p -u -p -r1.207 kroute.c :> --- kroute.c 5 Dec 2015 18:28:04 - 1.207 :> +++ kroute.c 21 Mar 2016 14:44:52 - :> @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt :> sa = NULL; :> mpath = 0; /* link local stuff can't be mpath */ :> break; :> +case AF_INET: :> +case AF_INET6: :> +if (rtm->rtm_flags & RTF_CONNECTED) :> +flags |= F_CONNECTED; :> +ifindex = rtm->rtm_index; :> +break; :> } :> :> if (rtm->rtm_type == RTM_DELETE) { :> :> :> :> :> -- :> If you are a fatalist, what can you do about it? :> -- Ann Edwards-Duff : :-- ::wq Claudio : -- This is a job for BOB VIOLENCE and SCUM, the INCREDIBLY STUPID MUTANT DOG. -- Bob Violence
Re: bgpd: fix adding a new interface and network inet connected
On Mon, Mar 21, 2016 at 05:11:04PM +0100, Peter Hessler wrote: > On 2016 Mar 21 (Mon) at 16:22:53 +0100 (+0100), Claudio Jeker wrote: > :On Mon, Mar 21, 2016 at 03:54:38PM +0100, Peter Hessler wrote: > :> We ran into a situation where we accidentally blackholed traffic going to > :> a new Internet Exchange. When we added the new vlans and new peers, the > :> nexthop address on that vlan was *not* our neighbor's address, but > :> instead used our own IP address on that new interface. "bgpctl show rib", > :> showed the correct IP, but the wrong one was used when installing into > :> the fib. > :> > :> Restarting bgpd installed the correct nexthop. > :> > :> Additionally, a user had reported that "network inet connected" didn't > :> work any more. This fix also convinces bgpd to redistribute the network > :> as soon as it is created. The underlying cause was a combination of > :> missing the F_CONNECTED flag, as well as having an ifindex of 0, which > :> is lo0's index. > :> > :> OK? > : > :I noticed this breakage as well some time ago but never managed to debug > :it (too many production systems and not enough test systems). > :This was caused by the change of cloning routes to show the IP address > :instead of the link local one. > :Now about the diff I'm not 100% sure. It for sure changes beheviour a bit. > :ifindex is now always set and not only for connected routes. Not sure if > :this is bad or not... > :mpath should probably be set to 0 in the RTF_CONNECTED case to keep same > :behaviour. On the other hand it is now possible to have multiple routes to > :the same network even for connected routes (not sure if it is possible on > :the same prio or not). > : > :IMO this is a more conservative version... > : > : case AF_INET: > : case AF_INET6: > : if (rtm->rtm_flags & RTF_CONNECTED) { > : flags |= F_CONNECTED; > : ifindex = rtm->rtm_index; > : sa = NULL; > : mpath = 0; > : } > : break; > : > :I think this is less dangerous or what you think? > > setting ifindex should always be safe, because that is the ifindex that > is valid for that route. Setting it to zero (lo0) is Wrong(tm). This is where I'm not sure. Since it is well possible that kroute considers 0 as not directly connected. I just fear that by having a non-zero ifindex on non connected routes we may end up doing something else different. This is why I'm at the moment somewhat conservative. In the long run you are right. ifindex should always be set and correct. > I'm not 100% convinced about mpath=0, or about setting sa=NULL, so I > left those alone for this version. I did see that we only used sa when > we already had a route, not when we are creating one. I don't have > super-strong feelings about them, though. The idea here is to emulate the old code as much as possible. If we set the sa may we end up with a nexthop route to the wrong (as in our) IP (which is currently the case)? > Since mpi@ says that RTAX_GATEWAY now returns the af, I could also simply > add those as FALLTHROUGH for the main switch statement... You need to add the RTF_CONNECTED case because any other route in your system will have a AF_INET or AF_INET6 nexthop. This is more or less what my code snipped is trying to do. > : > :> Index: kroute.c > :> === > :> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v > :> retrieving revision 1.207 > :> diff -u -p -u -p -r1.207 kroute.c > :> --- kroute.c 5 Dec 2015 18:28:04 - 1.207 > :> +++ kroute.c 21 Mar 2016 14:44:52 - > :> @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt > :>sa = NULL; > :>mpath = 0; /* link local stuff can't be mpath */ > :>break; > :> + case AF_INET: > :> + case AF_INET6: > :> + if (rtm->rtm_flags & RTF_CONNECTED) > :> + flags |= F_CONNECTED; > :> + ifindex = rtm->rtm_index; > :> + break; > :>} > :> > :>if (rtm->rtm_type == RTM_DELETE) { > :> > :> > :> > :> > :> -- > :> If you are a fatalist, what can you do about it? > :>-- Ann Edwards-Duff > : > :-- > ::wq Claudio > > -- > Death is nature's way of telling you to slow down. > -- :wq Claudio
Re: bgpd: fix adding a new interface and network inet connected
On 2016 Mar 21 (Mon) at 16:22:53 +0100 (+0100), Claudio Jeker wrote: :On Mon, Mar 21, 2016 at 03:54:38PM +0100, Peter Hessler wrote: :> We ran into a situation where we accidentally blackholed traffic going to :> a new Internet Exchange. When we added the new vlans and new peers, the :> nexthop address on that vlan was *not* our neighbor's address, but :> instead used our own IP address on that new interface. "bgpctl show rib", :> showed the correct IP, but the wrong one was used when installing into :> the fib. :> :> Restarting bgpd installed the correct nexthop. :> :> Additionally, a user had reported that "network inet connected" didn't :> work any more. This fix also convinces bgpd to redistribute the network :> as soon as it is created. The underlying cause was a combination of :> missing the F_CONNECTED flag, as well as having an ifindex of 0, which :> is lo0's index. :> :> OK? : :I noticed this breakage as well some time ago but never managed to debug :it (too many production systems and not enough test systems). :This was caused by the change of cloning routes to show the IP address :instead of the link local one. :Now about the diff I'm not 100% sure. It for sure changes beheviour a bit. :ifindex is now always set and not only for connected routes. Not sure if :this is bad or not... :mpath should probably be set to 0 in the RTF_CONNECTED case to keep same :behaviour. On the other hand it is now possible to have multiple routes to :the same network even for connected routes (not sure if it is possible on :the same prio or not). : :IMO this is a more conservative version... : : case AF_INET: : case AF_INET6: : if (rtm->rtm_flags & RTF_CONNECTED) { : flags |= F_CONNECTED; : ifindex = rtm->rtm_index; : sa = NULL; : mpath = 0; : } : break; : :I think this is less dangerous or what you think? setting ifindex should always be safe, because that is the ifindex that is valid for that route. Setting it to zero (lo0) is Wrong(tm). I'm not 100% convinced about mpath=0, or about setting sa=NULL, so I left those alone for this version. I did see that we only used sa when we already had a route, not when we are creating one. I don't have super-strong feelings about them, though. Since mpi@ says that RTAX_GATEWAY now returns the af, I could also simply add those as FALLTHROUGH for the main switch statement... : :> Index: kroute.c :> === :> RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v :> retrieving revision 1.207 :> diff -u -p -u -p -r1.207 kroute.c :> --- kroute.c 5 Dec 2015 18:28:04 - 1.207 :> +++ kroute.c 21 Mar 2016 14:44:52 - :> @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt :> sa = NULL; :> mpath = 0; /* link local stuff can't be mpath */ :> break; :> +case AF_INET: :> +case AF_INET6: :> +if (rtm->rtm_flags & RTF_CONNECTED) :> +flags |= F_CONNECTED; :> +ifindex = rtm->rtm_index; :> +break; :> } :> :> if (rtm->rtm_type == RTM_DELETE) { :> :> :> :> :> -- :> If you are a fatalist, what can you do about it? :> -- Ann Edwards-Duff : :-- ::wq Claudio -- Death is nature's way of telling you to slow down.
Re: Scheduler hack for multi-threaded processes
> Date: Sat, 19 Mar 2016 13:53:07 +0100 > From: Martin Pieuchot> > Applications using multiple threads often call sched_yield(2) to > indicate that one of the threads cannot make any progress because > it is waiting for a resource held by another one. > > One example of this scenario is the _spinlock() implementation of > our librthread. But if you look on https://codesearch.debian.net > you can find much more use cases, notably MySQL, PostgreSQL, JDK, > libreoffice, etc. > > Now the problem with our current scheduler is that the priority of > a thread decreases when it is the "curproc" of a CPU. So the threads > that don't run and sched_yield(2) end up having a higher priority than > the thread holding the resource. Which means that it's really hard for > such multi-threaded applications to make progress, resulting in a lot of > IPIs numbers. > That'd also explain why if you have a more CPUs, let's say 4 instead > of 2, your application will more likely make some progress and you'll > see less sluttering/freezing. > > So what the diff below does is that it penalizes the threads from > multi-threaded applications such that progress can be made. It is > inspired from the recent scheduler work done by Michal Mazurek on > tech@. > > I experimented with various values for "p_priority" and this one is > the one that generates fewer # IPIs when watching a HD video on firefox. > Because yes, with this diff, now I can. > > I'd like to know if dereferencing ``p_p'' is safe without holding the > KERNEL_LOCK. > > I'm also interested in hearing from more people using multi-threaded > applications. This doesn't only change the sched_yield() behaviour, but also modifies how in-kernel yield() calls behave for threaded processes. That is probably not right. > Index: kern/sched_bsd.c > === > RCS file: /cvs/src/sys/kern/sched_bsd.c,v > retrieving revision 1.43 > diff -u -p -r1.43 sched_bsd.c > --- kern/sched_bsd.c 9 Mar 2016 13:38:50 - 1.43 > +++ kern/sched_bsd.c 19 Mar 2016 12:21:36 - > @@ -298,7 +298,16 @@ yield(void) > int s; > > SCHED_LOCK(s); > - p->p_priority = p->p_usrpri; > + /* > + * If one of the threads of a multi-threaded process called > + * sched_yield(2), drop its priority to ensure its siblings > + * can make some progress. > + */ > + if (TAILQ_FIRST(>p_p->ps_threads) == p && > + TAILQ_NEXT(p, p_thr_link) == NULL) > + p->p_priority = p->p_usrpri; > + else > + p->p_priority = min(MAXPRI, p->p_usrpri * 2); > p->p_stat = SRUN; > setrunqueue(p); > p->p_ru.ru_nvcsw++; > >
Re: bgpd: fix adding a new interface and network inet connected
On 21/03/16(Mon) 15:54, Peter Hessler wrote: > We ran into a situation where we accidentally blackholed traffic going to > a new Internet Exchange. When we added the new vlans and new peers, the > nexthop address on that vlan was *not* our neighbor's address, but > instead used our own IP address on that new interface. "bgpctl show rib", > showed the correct IP, but the wrong one was used when installing into > the fib. > > Restarting bgpd installed the correct nexthop. > > Additionally, a user had reported that "network inet connected" didn't > work any more. This fix also convinces bgpd to redistribute the network > as soon as it is created. The underlying cause was a combination of > missing the F_CONNECTED flag, as well as having an ifindex of 0, which > is lo0's index. > > OK? Yes, this is a fallout from the multiple cloning route support. Previously the RTAX_GATEWAY of a cloning route was an AF_LINK, now it contains the address associated to the route. Make sure you get the input from a bgpd hacker though. > === > RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v > retrieving revision 1.207 > diff -u -p -u -p -r1.207 kroute.c > --- kroute.c 5 Dec 2015 18:28:04 - 1.207 > +++ kroute.c 21 Mar 2016 14:44:52 - > @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt > sa = NULL; > mpath = 0; /* link local stuff can't be mpath */ > break; > + case AF_INET: > + case AF_INET6: > + if (rtm->rtm_flags & RTF_CONNECTED) > + flags |= F_CONNECTED; > + ifindex = rtm->rtm_index; > + break; > } > > if (rtm->rtm_type == RTM_DELETE) { > > > > > -- > If you are a fatalist, what can you do about it? > -- Ann Edwards-Duff >
Re: bgpd: fix adding a new interface and network inet connected
On Mon, Mar 21, 2016 at 03:54:38PM +0100, Peter Hessler wrote: > We ran into a situation where we accidentally blackholed traffic going to > a new Internet Exchange. When we added the new vlans and new peers, the > nexthop address on that vlan was *not* our neighbor's address, but > instead used our own IP address on that new interface. "bgpctl show rib", > showed the correct IP, but the wrong one was used when installing into > the fib. > > Restarting bgpd installed the correct nexthop. > > Additionally, a user had reported that "network inet connected" didn't > work any more. This fix also convinces bgpd to redistribute the network > as soon as it is created. The underlying cause was a combination of > missing the F_CONNECTED flag, as well as having an ifindex of 0, which > is lo0's index. > > OK? I noticed this breakage as well some time ago but never managed to debug it (too many production systems and not enough test systems). This was caused by the change of cloning routes to show the IP address instead of the link local one. Now about the diff I'm not 100% sure. It for sure changes beheviour a bit. ifindex is now always set and not only for connected routes. Not sure if this is bad or not... mpath should probably be set to 0 in the RTF_CONNECTED case to keep same behaviour. On the other hand it is now possible to have multiple routes to the same network even for connected routes (not sure if it is possible on the same prio or not). IMO this is a more conservative version... case AF_INET: case AF_INET6: if (rtm->rtm_flags & RTF_CONNECTED) { flags |= F_CONNECTED; ifindex = rtm->rtm_index; sa = NULL; mpath = 0; } break; I think this is less dangerous or what you think? > Index: kroute.c > === > RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v > retrieving revision 1.207 > diff -u -p -u -p -r1.207 kroute.c > --- kroute.c 5 Dec 2015 18:28:04 - 1.207 > +++ kroute.c 21 Mar 2016 14:44:52 - > @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt > sa = NULL; > mpath = 0; /* link local stuff can't be mpath */ > break; > + case AF_INET: > + case AF_INET6: > + if (rtm->rtm_flags & RTF_CONNECTED) > + flags |= F_CONNECTED; > + ifindex = rtm->rtm_index; > + break; > } > > if (rtm->rtm_type == RTM_DELETE) { > > > > > -- > If you are a fatalist, what can you do about it? > -- Ann Edwards-Duff -- :wq Claudio
bgpd: fix adding a new interface and network inet connected
We ran into a situation where we accidentally blackholed traffic going to a new Internet Exchange. When we added the new vlans and new peers, the nexthop address on that vlan was *not* our neighbor's address, but instead used our own IP address on that new interface. "bgpctl show rib", showed the correct IP, but the wrong one was used when installing into the fib. Restarting bgpd installed the correct nexthop. Additionally, a user had reported that "network inet connected" didn't work any more. This fix also convinces bgpd to redistribute the network as soon as it is created. The underlying cause was a combination of missing the F_CONNECTED flag, as well as having an ifindex of 0, which is lo0's index. OK? Index: kroute.c === RCS file: /cvs/src/usr.sbin/bgpd/kroute.c,v retrieving revision 1.207 diff -u -p -u -p -r1.207 kroute.c --- kroute.c5 Dec 2015 18:28:04 - 1.207 +++ kroute.c21 Mar 2016 14:44:52 - @@ -3178,6 +3178,12 @@ dispatch_rtmsg_addr(struct rt_msghdr *rt sa = NULL; mpath = 0; /* link local stuff can't be mpath */ break; + case AF_INET: + case AF_INET6: + if (rtm->rtm_flags & RTF_CONNECTED) + flags |= F_CONNECTED; + ifindex = rtm->rtm_index; + break; } if (rtm->rtm_type == RTM_DELETE) { -- If you are a fatalist, what can you do about it? -- Ann Edwards-Duff
Useless flag in arp(8)
When entries are displayed the SIN_PROXY bit is never set, so remove this dead code. proxy entries correspond to "published" one as explained in the manual. Index: arp.8 === RCS file: /cvs/src/usr.sbin/arp/arp.8,v retrieving revision 1.36 diff -u -p -r1.36 arp.8 --- arp.8 27 Jul 2015 17:28:39 - 1.36 +++ arp.8 21 Mar 2016 13:47:57 - @@ -101,8 +101,6 @@ it will never expire. Flags on the ARP entry, in a single letter. They are: local .Pq Sq l , -proxy -.Pq Sq P and published .Pq Sq p . .El Index: arp.c === RCS file: /cvs/src/usr.sbin/arp/arp.c,v retrieving revision 1.71 diff -u -p -r1.71 arp.c --- arp.c 26 Jan 2016 18:26:19 - 1.71 +++ arp.c 21 Mar 2016 13:46:53 - @@ -568,7 +568,6 @@ print_entry(struct sockaddr_dl *sdl, str printf(" %s%s%s\n", (rtm->rtm_flags & RTF_LOCAL) ? "l" : "", - (sin->sin_other & SIN_PROXY) ? "P" : "", (rtm->rtm_flags & RTF_ANNOUNCE) ? "p" : ""); }
Re: realpath additions
> Date: Mon, 21 Mar 2016 15:31:42 +0300 > From: Alexei Malinin> > Hello. > > I'm not sure but it seems to me that there are several missed things: > - checking path against NULL, POSIX says that we should return EINVAL in that case. > - setting errno to ENOMEM in case of malloc() failure, malloc already does that > - clarification in comments. More words don't clarify things. > --- src/lib/libc/stdlib/realpath.c.orig Tue Oct 13 23:55:37 2015 > +++ src/lib/libc/stdlib/realpath.c Mon Mar 21 15:16:52 2016 > @@ -41,8 +41,9 @@ > * char *realpath(const char *path, char resolved[PATH_MAX]); > * > * Find the real name of path, by removing all ".", ".." and symlink > - * components. Returns (resolved) on success, or (NULL) on failure, > - * in which case the path which caused trouble is left in (resolved). > + * components. Returns (resolved) on success; sets (errno) and returns > + * (NULL) on failure, in which case the path which caused trouble > + * is left in (resolved). > */ > char * > realpath(const char *path, char *resolved) > @@ -54,6 +55,11 @@ > int serrno, slen, mem_allocated; > char left[PATH_MAX], next_token[PATH_MAX], symlink[PATH_MAX]; > > + if (path == NULL) { > + errno = ENOENT; > + return (NULL); > + } > + > if (path[0] == '\0') { > errno = ENOENT; > return (NULL); > @@ -63,8 +69,10 @@ > > if (resolved == NULL) { > resolved = malloc(PATH_MAX); > - if (resolved == NULL) > + if (resolved == NULL) { > + errno = ENOMEM; > return (NULL); > + } > mem_allocated = 1; > } else > mem_allocated = 0; > > >
Re: realpath additions
On Mon, Mar 21, 2016 at 03:31:42PM +0300, Alexei Malinin wrote: > Hello. > > I'm not sure but it seems to me that there are several missed things: > - checking path against NULL, > - setting errno to ENOMEM in case of malloc() failure, > - clarification in comments. > > > -- > Alexei Malinin Some comments inline. Thanks. > --- src/lib/libc/stdlib/realpath.c.orig Tue Oct 13 23:55:37 2015 > +++ src/lib/libc/stdlib/realpath.c Mon Mar 21 15:16:52 2016 > @@ -41,8 +41,9 @@ > * char *realpath(const char *path, char resolved[PATH_MAX]); > * > * Find the real name of path, by removing all ".", ".." and symlink > - * components. Returns (resolved) on success, or (NULL) on failure, > - * in which case the path which caused trouble is left in (resolved). > + * components. Returns (resolved) on success; sets (errno) and returns > + * (NULL) on failure, in which case the path which caused trouble > + * is left in (resolved). > */ > char * > realpath(const char *path, char *resolved) > @@ -54,6 +55,11 @@ > int serrno, slen, mem_allocated; > char left[PATH_MAX], next_token[PATH_MAX], symlink[PATH_MAX]; > > + if (path == NULL) { > + errno = ENOENT; > + return (NULL); > + } > + according to http://pubs.opengroup.org/onlinepubs/9699919799/functions/realpath.html, it should be EINVAL instead of ENOENT. The realpath() function shall fail if: ... [EINVAL] The file_name argument is a null pointer. > if (path[0] == '\0') { > errno = ENOENT; > return (NULL); > @@ -63,8 +69,10 @@ > > if (resolved == NULL) { > resolved = malloc(PATH_MAX); > - if (resolved == NULL) > + if (resolved == NULL) { > + errno = ENOMEM; it is unneeded here: malloc(3) already set errno to ENOMEM on failure. > return (NULL); > + } > mem_allocated = 1; > } else > mem_allocated = 0; > > > -- Sebastien Marie
Improve ARP proxy regression
With the diff I just sent all the ARP regression tests suddenly pass. But I did not fix anything! This is because the actual proxy ARP test is incomplete. Diff below fixes it by adding *two* different entries for a given IP. One is "published" and should be returned when an echo request is received, the other one is for internal use. This test will now fail if a kernel with ART is used. This will be addressed shortly. ok? Index: Makefile === RCS file: /cvs/src/regress/sys/netinet/arp/Makefile,v retrieving revision 1.3 diff -u -p -r1.3 Makefile --- Makefile4 Dec 2015 23:43:04 - 1.3 +++ Makefile21 Mar 2016 13:21:43 - @@ -31,6 +31,7 @@ LOCAL_IF ?= LOCAL_MAC ?= REMOTE_MAC ?= FAKE_MAC ?= 12:34:56:78:9a:bc +PROXY_MAC ?= 00:90:27:bb:cc:dd REMOTE_SSH ?= LOCAL_ADDR ?= 10.188.70.17 @@ -271,9 +272,11 @@ TARGETS += arp-proxy run-regress-arp-proxy: addr.py @echo '\n $@ ' @echo Send ARP Request for fake address that is proxied + ssh -t ${REMOTE_SSH} ${SUDO} arp -s ${FAKE_ADDR} ${PROXY_MAC} ssh -t ${REMOTE_SSH} ${SUDO} arp -s ${FAKE_ADDR} ${FAKE_MAC} pub ${SUDO} ${PYTHON}arp_proxy.py ssh ${REMOTE_SSH} ${SUDO} arp -an >arp.log + ssh -t ${REMOTE_SSH} ${SUDO} arp -d ${FAKE_ADDR} ssh -t ${REMOTE_SSH} ${SUDO} arp -d ${FAKE_ADDR} grep '^${FAKE_ADDR} .* ${FAKE_MAC} .* static * p$$' arp.log Index: README === RCS file: /cvs/src/regress/sys/netinet/arp/README,v retrieving revision 1.2 diff -u -p -r1.2 README --- README 23 Dec 2015 14:27:50 - 1.2 +++ README 21 Mar 2016 13:20:44 - @@ -50,6 +50,7 @@ LOCAL_IF=tap0 LOCAL_MAC=fe:e1:ba:d0:d5:6d REMOTE_MAC=70:5f:ca:21:8d:70 FAKE_MAC=12:34:56:78:9a:bc +PROXY_MAC=00:90:27:bb:cc:dd REMOTE_SSH=q70 LOCAL_ADDR=10.188.70.17
Proxy ARP and published entry
When the caller of arplookup() asked for a proxy'd ARP entry, make sure the entry returned by rtalloc(9) is indeed "published". This is currently always true for ARP entries added with arp(8) but it is not the case if you add your own entry with the 33rd bit set but without setting RTF_ANNOUNCE. It is also not the case with ART. Since the 33rd bit trick is an implementation detail of the current routing table, which I'm doing differently with ART, let's check for the correct flag. ok? Index: netinet/if_ether.c === RCS file: /cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.202 diff -u -p -r1.202 if_ether.c --- netinet/if_ether.c 7 Mar 2016 11:00:36 - 1.202 +++ netinet/if_ether.c 21 Mar 2016 13:22:33 - @@ -696,6 +696,12 @@ arplookup(u_int32_t addr, int create, in rtfree(rt); return (NULL); } + + if (proxy && !ISSET(rt->rt_flags, RTF_ANNOUNCE)) { + rtfree(rt); + return (NULL); + } + return (rt); }
realpath additions
Hello. I'm not sure but it seems to me that there are several missed things: - checking path against NULL, - setting errno to ENOMEM in case of malloc() failure, - clarification in comments. -- Alexei Malinin --- src/lib/libc/stdlib/realpath.c.orig Tue Oct 13 23:55:37 2015 +++ src/lib/libc/stdlib/realpath.c Mon Mar 21 15:16:52 2016 @@ -41,8 +41,9 @@ * char *realpath(const char *path, char resolved[PATH_MAX]); * * Find the real name of path, by removing all ".", ".." and symlink - * components. Returns (resolved) on success, or (NULL) on failure, - * in which case the path which caused trouble is left in (resolved). + * components. Returns (resolved) on success; sets (errno) and returns + * (NULL) on failure, in which case the path which caused trouble + * is left in (resolved). */ char * realpath(const char *path, char *resolved) @@ -54,6 +55,11 @@ int serrno, slen, mem_allocated; char left[PATH_MAX], next_token[PATH_MAX], symlink[PATH_MAX]; + if (path == NULL) { + errno = ENOENT; + return (NULL); + } + if (path[0] == '\0') { errno = ENOENT; return (NULL); @@ -63,8 +69,10 @@ if (resolved == NULL) { resolved = malloc(PATH_MAX); - if (resolved == NULL) + if (resolved == NULL) { + errno = ENOMEM; return (NULL); + } mem_allocated = 1; } else mem_allocated = 0;
[patch] www faq/current.html - validation nitpick
Sat, 19 Mar 2016 20:33:23 -0600 (MDT) Philip Guenther> CVSROOT: /cvs > Module name: www > Changes by: guent...@cvs.openbsd.org2016/03/19 20:33:23 > > Modified files: > faq: current.html > > Log message: > Need to build ld.so first Closing a href tag spotted in a validating HTML browser. Like previous nitpick in http://marc.info/?m=145606683202367 sorry for the extra info. $ diff -u current.html-1.1650 current.html --- current.html-1.1650 Sun Mar 20 18:28:25 2016 +++ current.htmlSun Mar 20 18:30:13 2016 @@ -50,7 +50,7 @@ 2016/01/05 - pf.conf(5) set debug syntax change 2016/01/06 - pledge(2) path whitelist disabled 2016/03/07 - lpd(8) spool directory change -2016/03/19 - csu and ld.so update +2016/03/19 - csu and ld.so update
Re: tcp syn cache random reseed
On Sun, Mar 20, 2016 at 07:28:45PM +0100, Alexander Bluhm wrote: > On Sat, Mar 19, 2016 at 10:41:06PM +0100, Alexander Bluhm wrote: > > Perhaps the tcps_sc_seedrandom counter with a netstat -s line should > > be commited anyway to show the problem. > > ok? OK claudio@ > bluhm > > Index: sys/netinet/tcp_input.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v > retrieving revision 1.314 > diff -u -p -r1.314 tcp_input.c > --- sys/netinet/tcp_input.c 7 Mar 2016 18:44:00 - 1.314 > +++ sys/netinet/tcp_input.c 19 Mar 2016 20:09:25 - > @@ -3371,8 +3371,10 @@ syn_cache_insert(struct syn_cache *sc, s >* If there are no entries in the hash table, reinitialize >* the hash secrets. >*/ > - if (tcp_syn_cache_count == 0) > + if (tcp_syn_cache_count == 0) { > arc4random_buf(tcp_syn_hash, sizeof(tcp_syn_hash)); > + tcpstat.tcps_sc_seedrandom++; > + } > > SYN_HASHALL(sc->sc_hash, >sc_src.sa, >sc_dst.sa); > sc->sc_bucketidx = sc->sc_hash % tcp_syn_cache_size; > Index: sys/netinet/tcp_var.h > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_var.h,v > retrieving revision 1.109 > diff -u -p -r1.109 tcp_var.h > --- sys/netinet/tcp_var.h 27 Aug 2015 20:56:16 - 1.109 > +++ sys/netinet/tcp_var.h 19 Mar 2016 20:53:39 - > @@ -440,6 +440,7 @@ structtcpstat { > u_int64_t tcps_sc_dropped; /* # of SYNs dropped (no route/mem) */ > u_int64_t tcps_sc_collisions; /* # of hash collisions */ > u_int64_t tcps_sc_retransmitted;/* # of retransmissions */ > + u_int64_t tcps_sc_seedrandom; /* # of syn cache seeds with random */ > > u_int64_t tcps_conndrained; /* # of connections drained */ > > Index: usr.bin/netstat/inet.c > === > RCS file: /data/mirror/openbsd/cvs/src/usr.bin/netstat/inet.c,v > retrieving revision 1.144 > diff -u -p -r1.144 inet.c > --- usr.bin/netstat/inet.c20 Aug 2015 22:32:41 - 1.144 > +++ usr.bin/netstat/inet.c20 Mar 2016 18:25:55 - > @@ -455,6 +455,7 @@ tcp_stats(char *name) > p(tcps_sc_dupesyn, "\t%qd duplicate SYN%s received for entries " > "already in the cache\n"); > p(tcps_sc_dropped, "\t%qd SYN%s dropped (no route or no space)\n"); > + p(tcps_sc_seedrandom, "\t%qd SYN cache seed%s with new random\n"); > > p(tcps_sack_recovery_episode, "\t%qd SACK recovery episode%s\n"); > p(tcps_sack_rexmits, > -- :wq Claudio
Re: tcp syn cache random reseed
On Mon, Mar 21, 2016 at 08:25:59PM +1000, David Gwynne wrote: > how can i judge if this is better than just using a single hash with a strong > function? The attack I see is that you can measure the bucket distribution by timing the SYN+ACK response. You can collect samples that end in the same bucket. After you have collected enough, start your DoS attack. I think that just collecting data is also possible with a strong hash function. With a weak function you may collect less and can start guessing early on top of that. But reseeding after a number of packets prevents to collect information over a long peroid. Unfortunately I have no analysis or prcatical experience with timing attacks. It is just a conclusion from reading the code. bluhm
use m_dup_pkt to realign bridge packets
could someone test this on a strict arch? ok? Index: if_bridge.c === RCS file: /cvs/src/sys/net/if_bridge.c,v retrieving revision 1.276 diff -u -p -r1.276 if_bridge.c --- if_bridge.c 8 Mar 2016 09:09:43 - 1.276 +++ if_bridge.c 21 Mar 2016 10:54:20 - @@ -137,7 +137,6 @@ int bridge_ipsec(struct bridge_softc *, int bridge_clone_create(struct if_clone *, int); intbridge_clone_destroy(struct ifnet *ifp); intbridge_delete(struct bridge_softc *, struct bridge_iflist *); -struct mbuf *bridge_m_dup(struct mbuf *); #defineETHERADDR_IS_IP_MCAST(a) \ /* struct etheraddr *a; */ \ @@ -800,7 +799,7 @@ bridge_output(struct ifnet *ifp, struct used = 1; mc = m; } else { - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_NOWAIT); if (mc == NULL) { sc->sc_if.if_oerrors++; continue; @@ -1090,7 +1089,7 @@ bridge_process(struct ifnet *ifp, struct (ifl->bif_state == BSTP_IFSTATE_DISCARDING)) goto reenqueue; - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_NOWAIT); if (mc == NULL) goto reenqueue; @@ -1227,7 +1226,7 @@ bridge_broadcast(struct bridge_softc *sc mc = m; used = 1; } else { - mc = bridge_m_dup(m); + mc = m_dup_pkt(m, ETHER_ALIGN, M_NOWAIT); if (mc == NULL) { sc->sc_if.if_oerrors++; continue; @@ -1277,7 +1276,7 @@ bridge_localbroadcast(struct bridge_soft return; } - m1 = bridge_m_dup(m); + m1 = m_dup_pkt(m, ETHER_ALIGN, M_NOWAIT); if (m1 == NULL) { sc->sc_if.if_oerrors++; return; @@ -2021,37 +2020,4 @@ bridge_copyaddr(struct sockaddr *src, st memcpy(dst, src, src->sa_len); else dst->sa_family = AF_UNSPEC; -} - -/* - * Specialized deep copy to ensure that the payload after the Ethernet - * header is nicely aligned. - */ -struct mbuf * -bridge_m_dup(struct mbuf *m) -{ - struct mbuf *m1, *m2, *mx; - - m1 = m_copym2(m, 0, ETHER_HDR_LEN, M_DONTWAIT); - if (m1 == NULL) { - return (NULL); - } - m2 = m_copym2(m, ETHER_HDR_LEN, M_COPYALL, M_DONTWAIT); - if (m2 == NULL) { - m_freem(m1); - return (NULL); - } - - for (mx = m1; mx->m_next != NULL; mx = mx->m_next) - /*EMPTY*/; - mx->m_next = m2; - - if (m1->m_flags & M_PKTHDR) { - int len = 0; - for (mx = m1; mx != NULL; mx = mx->m_next) - len += mx->m_len; - m1->m_pkthdr.len = len; - } - - return (m1); }
Re: tcp syn cache random reseed
> On 21 Mar 2016, at 4:28 AM, Alexander Bluhmwrote: > > On Sat, Mar 19, 2016 at 10:41:06PM +0100, Alexander Bluhm wrote: >> Perhaps the tcps_sc_seedrandom counter with a netstat -s line should >> be commited anyway to show the problem. > > ok? how can i judge if this is better than just using a single hash with a strong function?
arm: purge arm9e, too
Hi, I would like to get rid of even more unused CPUs, so we end up with only armish, zaurus (armv5) and armv7. This diff removes ARM9E, but I also have diffs prepared to get rid of ARM10 and ARM11. ok? Patrick diff --git sys/arch/arm/arm/cpu.c sys/arch/arm/arm/cpu.c index a3fe271..6fa6bc3 100644 --- sys/arch/arm/arm/cpu.c +++ sys/arch/arm/arm/cpu.c @@ -84,8 +84,6 @@ cpu_attach(struct device *dv) enum cpu_class { CPU_CLASS_NONE, - CPU_CLASS_ARM9ES, - CPU_CLASS_ARM9EJS, CPU_CLASS_ARM10E, CPU_CLASS_XSCALE, CPU_CLASS_ARM11J, @@ -147,15 +145,6 @@ struct cpuidtab { }; const struct cpuidtab cpuids[] = { - { CPU_ID_ARM926EJS, CPU_CLASS_ARM9EJS, "ARM926EJ-S", - generic_steppings }, - { CPU_ID_ARM946ES, CPU_CLASS_ARM9ES, "ARM946E-S", - generic_steppings }, - { CPU_ID_ARM966ES, CPU_CLASS_ARM9ES, "ARM966E-S", - generic_steppings }, - { CPU_ID_ARM966ESR1,CPU_CLASS_ARM9ES, "ARM966E-S", - generic_steppings }, - { CPU_ID_ARM1020E, CPU_CLASS_ARM10E, "ARM1020E", generic_steppings }, { CPU_ID_ARM1022ES, CPU_CLASS_ARM10E, "ARM1022E-S", @@ -255,8 +244,6 @@ struct cpu_classtab { const struct cpu_classtab cpu_classes[] = { { "unknown",NULL }, /* CPU_CLASS_NONE */ - { "ARM9E-S","CPU_ARM9E" }, /* CPU_CLASS_ARM9ES */ - { "ARM9EJ-S", "CPU_ARM9E" }, /* CPU_CLASS_ARM9EJS */ { "ARM10E", "CPU_ARM10" }, /* CPU_CLASS_ARM10E */ { "XScale", "CPU_XSCALE_..." }, /* CPU_CLASS_XSCALE */ { "ARM11J", "CPU_ARM11" }, /* CPU_CLASS_ARM11J */ @@ -323,8 +310,6 @@ identify_arm_cpu(struct device *dv, struct cpu_info *ci) printf("%s:", dv->dv_xname); switch (cpu_class) { - case CPU_CLASS_ARM9ES: - case CPU_CLASS_ARM9EJS: case CPU_CLASS_ARM10E: case CPU_CLASS_XSCALE: case CPU_CLASS_ARM11J: @@ -376,10 +361,6 @@ identify_arm_cpu(struct device *dv, struct cpu_info *ci) skip_pcache: switch (cpu_class) { -#ifdef CPU_ARM9E - case CPU_CLASS_ARM9ES: - case CPU_CLASS_ARM9EJS: -#endif #ifdef CPU_ARM10 case CPU_CLASS_ARM10E: #endif diff --git sys/arch/arm/arm/cpufunc.c sys/arch/arm/arm/cpufunc.c index b301e13..435be3a 100644 --- sys/arch/arm/arm/cpufunc.c +++ sys/arch/arm/arm/cpufunc.c @@ -87,7 +87,7 @@ int arm_dcache_align_mask; /* 1 == use cpu_sleep(), 0 == don't */ int cpu_do_powersave; -#if defined(CPU_ARM9E) || defined(CPU_ARM10) +#if defined(CPU_ARM10) struct cpu_functions armv5_ec_cpufuncs = { /* CPU functions */ @@ -143,7 +143,7 @@ struct cpu_functions armv5_ec_cpufuncs = { arm10_context_switch, /* context_switch */ arm9e_setup /* cpu setup*/ }; -#endif /* CPU_ARM9E || CPU_ARM10 */ +#endif /* CPU_ARM10 */ #ifdef CPU_ARM10 @@ -383,7 +383,7 @@ struct cpu_functions cpufuncs; u_int cputype; u_int cpu_reset_needs_v4_MMU_disable; /* flag used in locore.s */ -#if defined(CPU_ARM9E) || defined(CPU_ARM10) || defined(CPU_ARM11) || \ +#if defined(CPU_ARM10) || defined(CPU_ARM11) || \ defined(CPU_XSCALE_80321) || defined(CPU_XSCALE_PXA2X0) static void get_cachetype_cp15 (void); @@ -461,7 +461,7 @@ get_cachetype_cp15() out: arm_dcache_align_mask = arm_dcache_align - 1; } -#endif /* ARM7TDMI || ARM9 || XSCALE */ +#endif /* ARM7TDMI || XSCALE */ #ifdef CPU_ARMv7 void arm_get_cachetype_cp15v7 (void); @@ -627,16 +627,14 @@ set_cpufuncs() * CPU type where we want to use it by default, then we set it. */ -#if defined(CPU_ARM9E) || defined(CPU_ARM10) - if (cputype == CPU_ID_ARM926EJS || cputype == CPU_ID_ARM1026EJS) { +#ifdef CPU_ARM10 + if (cputype == CPU_ID_ARM1026EJS) { cpufuncs = armv5_ec_cpufuncs; cpu_reset_needs_v4_MMU_disable = 1; /* V4 or higher */ get_cachetype_cp15(); pmap_pte_init_generic(); return 0; } -#endif /* CPU_ARM9E || CPU_ARM10 */ -#ifdef CPU_ARM10 if (/* cputype == CPU_ID_ARM1020T || */ cputype == CPU_ID_ARM1020E) { /* @@ -764,7 +762,7 @@ set_cpufuncs() * CPU Setup code */ -#if defined(CPU_ARM9E) || defined(CPU_ARM10) +#if defined(CPU_ARM10) void arm9e_setup() { @@ -797,9 +795,7 @@ arm9e_setup() /* And again. */ cpu_idcache_wbinv_all(); } -#endif /* CPU_ARM9E || CPU_ARM10 */ -#if defined(CPU_ARM10) void arm10_setup() { diff --git sys/arch/arm/conf/files.arm sys/arch/arm/conf/files.arm index 5258a09..f021384 100644 --- sys/arch/arm/conf/files.arm +++ sys/arch/arm/conf/files.arm @@ -39,12 +39,12 @@ filearch/arm/arm/bcopyinout.S file arch/arm/arm/copystr.S file arch/arm/arm/cpufunc.c file arch/arm/arm/cpufunc_asm.S -file
Re: tcp syn cache random reseed
On 20/03/16(Sun) 19:19, Alexander Bluhm wrote: > On Sat, Mar 19, 2016 at 10:41:06PM +0100, Alexander Bluhm wrote: > > The drawback is that the the cache lookup has to be done in two syn > > caches when an ACK arrives. > > This can be prevented most of the time. Switch the cache only after > 10 uses. So most of the time the passive cache is empty and > then no lookup is done. I like it. Do you think it could be useful to export the value of the current active cache set and/or the value of ``tcp_syn_use_limit''? Anyway, your diff is ok. > Index: netinet/tcp_input.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/tcp_input.c,v > retrieving revision 1.314 > diff -u -p -r1.314 tcp_input.c > --- netinet/tcp_input.c 7 Mar 2016 18:44:00 - 1.314 > +++ netinet/tcp_input.c 20 Mar 2016 17:47:08 - > @@ -3260,40 +3260,46 @@ tcp_mss_adv(struct mbuf *m, int af) > int tcp_syn_cache_size = TCP_SYN_HASH_SIZE; > int tcp_syn_cache_limit = TCP_SYN_HASH_SIZE*TCP_SYN_BUCKET_SIZE; > int tcp_syn_bucket_limit = 3*TCP_SYN_BUCKET_SIZE; > -int tcp_syn_cache_count; > -struct syn_cache_head tcp_syn_cache[TCP_SYN_HASH_SIZE]; > -u_int32_t tcp_syn_hash[5]; > - > -#define SYN_HASH(sa, sp, dp) \ > - (((sa)->s_addr ^ tcp_syn_hash[0]) * \ > - (u_int32_t)(dp))<<16) + ((u_int32_t)(sp))) ^ tcp_syn_hash[4])) > +int tcp_syn_use_limit = 10; > + > +struct syn_cache_set { > +struct syn_cache_head > scs_buckethead[TCP_SYN_HASH_SIZE]; > +int scs_count; > +int scs_use; > +u_int32_tscs_random[5]; > +} tcp_syn_cache[2]; > +int tcp_syn_cache_active; > + > +#define SYN_HASH(sa, sp, dp, rand) \ > + (((sa)->s_addr ^ (rand)[0]) * \ > + (u_int32_t)(dp))<<16) + ((u_int32_t)(sp))) ^ (rand)[4])) > #ifndef INET6 > -#define SYN_HASHALL(hash, src, dst) \ > +#define SYN_HASHALL(hash, src, dst, rand) \ > do { \ > hash = SYN_HASH((src)->sin_addr,\ > satosin(src)->sin_port, \ > - satosin(dst)->sin_port);\ > + satosin(dst)->sin_port, (rand));\ > } while (/*CONSTCOND*/ 0) > #else > -#define SYN_HASH6(sa, sp, dp) \ > - (((sa)->s6_addr32[0] ^ tcp_syn_hash[0]) * \ > - ((sa)->s6_addr32[1] ^ tcp_syn_hash[1]) *\ > - ((sa)->s6_addr32[2] ^ tcp_syn_hash[2]) *\ > - ((sa)->s6_addr32[3] ^ tcp_syn_hash[3]) *\ > - (u_int32_t)(dp))<<16) + ((u_int32_t)(sp))) ^ tcp_syn_hash[4])) > +#define SYN_HASH6(sa, sp, dp, rand) \ > + (((sa)->s6_addr32[0] ^ (rand)[0]) * \ > + ((sa)->s6_addr32[1] ^ (rand)[1]) * \ > + ((sa)->s6_addr32[2] ^ (rand)[2]) * \ > + ((sa)->s6_addr32[3] ^ (rand)[3]) * \ > + (u_int32_t)(dp))<<16) + ((u_int32_t)(sp))) ^ (rand)[4])) > > -#define SYN_HASHALL(hash, src, dst) \ > +#define SYN_HASHALL(hash, src, dst, rand) \ > do { \ > switch ((src)->sa_family) { \ > case AF_INET: \ > hash = SYN_HASH((src)->sin_addr,\ > satosin(src)->sin_port, \ > - satosin(dst)->sin_port);\ > + satosin(dst)->sin_port, (rand));\ > break; \ > case AF_INET6: \ > hash = SYN_HASH6((src)->sin6_addr, \ > satosin6(src)->sin6_port, \ > - satosin6(dst)->sin6_port); \ > + satosin6(dst)->sin6_port, (rand)); \ > break; \ > default:\ > hash = 0; \ > @@ -3305,13 +3311,12 @@ void > syn_cache_rm(struct syn_cache *sc) > { > sc->sc_flags |= SCF_DEAD; > - TAILQ_REMOVE(_syn_cache[sc->sc_bucketidx].sch_bucket, > - sc, sc_bucketq); > + TAILQ_REMOVE(>sc_buckethead->sch_bucket, sc, sc_bucketq); > sc->sc_tp = NULL; > LIST_REMOVE(sc, sc_tpq); > - tcp_syn_cache[sc->sc_bucketidx].sch_length--; > + sc->sc_buckethead->sch_length--; > timeout_del(>sc_timer); > - tcp_syn_cache_count--; > +
Switch to the new routing table backend: ART
The heart of our ongoing network stack work to take the IP forwarding path out of the kernel lock relies on an MP-safe routing table. Our plan is to use a lock-free lookup based on ART and SRPs. But before turning ART MP-safe I'd like to squash the remaining bugs. The only way to be sure to find them is to enable ART by default. ART currently does not support ARP proxy. That's the first item to fix on my list but I don't think this should prevent us from switching right now. So diff below turns the ART define "on" and adapt the regression tests. I'd be happy to hear from people using non-trivial routing setups. If you encounter a network/routing table related regression please make sure to include as much information as possible, including the dmesg and the outputs of "$ route -n show" with the new and old kernels. ok? Index: sys/conf/GENERIC === RCS file: /cvs/src/sys/conf/GENERIC,v retrieving revision 1.225 diff -u -p -r1.225 GENERIC --- sys/conf/GENERIC25 Feb 2016 20:27:16 - 1.225 +++ sys/conf/GENERIC16 Mar 2016 15:07:41 - @@ -50,6 +50,7 @@ optionTCP_ECN # Explicit Congestion N option TCP_SIGNATURE # TCP MD5 Signatures, for BGP routing sessions #optionTCP_FACK# Forward Acknowledgements for TCP +option ART # Allotment Routing Table option INET6 # IPv6 option IPSEC # IPsec #optionKEY # PF_KEY (implied by IPSEC) Index: regress/sbin/route/rttest1.ok === RCS file: /cvs/src/regress/sbin/route/rttest1.ok,v retrieving revision 1.5 diff -u -p -r1.5 rttest1.ok --- regress/sbin/route/rttest1.ok 9 Nov 2015 10:49:36 - 1.5 +++ regress/sbin/route/rttest1.ok 16 Mar 2016 16:44:08 - @@ -2,11 +2,11 @@ Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface -10.0/16192.0.2.4 UGS00 32768 8 lo10004 -10.0/10192.0.2.4 UGS00 32768 8 lo10004 10/8 192.0.2.1 UGS00 32768 8 lo10001 -10.8.0/24 192.0.2.1 UGS00 32768 8 lo10001 +10.0/10192.0.2.4 UGS00 32768 8 lo10004 +10.0/16192.0.2.4 UGS00 32768 8 lo10004 10.8/16192.0.2.3 UGS00 32768 8 lo10003 +10.8.0/24 192.0.2.1 UGS00 32768 8 lo10001 10.8.1/24 192.0.2.2 UGS00 32768 8 lo10002 10.8.3/24 192.0.2.3 UGS00 32768 8 lo10003 10.8.4/24 192.0.2.4 UGS00 32768 8 lo10004 Index: regress/sbin/route/rttest19.ok === RCS file: /cvs/src/regress/sbin/route/rttest19.ok,v retrieving revision 1.4 diff -u -p -r1.4 rttest19.ok --- regress/sbin/route/rttest19.ok 9 Nov 2015 10:49:36 - 1.4 +++ regress/sbin/route/rttest19.ok 16 Mar 2016 16:43:33 - @@ -2,9 +2,9 @@ Routing tables Internet: DestinationGatewayFlags Refs Use Mtu Prio Iface -10.8.1.0/26192.0.2.2 UGS00 32768 8 lo10002 -10.8.1.0/25192.0.2.4 UGS00 32768 8 lo10004 10.8.1/24 192.0.2.1 UGS00 32768 8 lo10001 +10.8.1.0/25192.0.2.4 UGS00 32768 8 lo10004 +10.8.1.0/26192.0.2.2 UGS00 32768 8 lo10002 192.0.2.1 192.0.2.1 UHl00 32768 1 lo10001 192.0.2.2 192.0.2.2 UHl00 32768 1 lo10002 192.0.2.3 192.0.2.3 UHl00 32768 1 lo10003
Re: Scheduler hack for multi-threaded processes
Am 03/21/16 um 01:29 schrieb Mark Kettenis: > > No. It's a hack. It points out aproblem that should be investigated > deeper. > Maybe that's not only thread related. This diff only makes a difference with multi-threaded processes. It may be worth considering looking at the process level as well.