Re: CVS commit: src/sys/net

2023-09-04 Thread Rin Okuyama

Thanks for kind words!

Apparently, we need some test to detect kernel and userland mismatch
for bpf(4) header...

I'd also like to fix problems by which ATF does not complete on ERL3.
Some CPU/memory consuming tests, e.g., lib/libc/regex/t_exhaust,
trigger watchdog.

I guess something wrong in interrupt handler for mips (IIRC, there
are similar kind of behaviors for hpcmips machines). Disabling
__HAVE_FAST_SOFTINT by hand does not work around the problem...

For n32 userland, most rump-based tests in /usr/tests/net crash.
But for n64 userland, on the other hand, they seem to work just fine.

This may be due to bugs for n32 kernel. IIUC, most kernel codes are
compiled for o32 or n64, and only the exception is n32 rump kernel for
mips64e[bl] userland.

Thanks,
rin

On 2023/08/26 9:10, David H. Gutteridge wrote:

Module Name:    src
Committed By:   rin
Date:   Wed Aug 23 13:21:17 UTC 2023

Modified Files:
   src/sys/net: bpf.h

Log Message:
bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64

It cannot fit within 18 bytes, of course ;)

As we had never provided working bpf(4) implementation for LP64
userland on mips, just use natural structure size here.


Thanks for fixing this. This means dhcpcd now works on Octeon hardware
using the n64 userland. Previously, it would fail with "ps_bpf_recvbpf:
Resource temporarily unavailable". (Perhaps you'd noticed this?) This
was reported a while ago[1] by an end user on port-mips@, so we should
presumably pull this up.

Where you say "we had never provided working bpf(4) implementation", I
had been looking into the dhcpcd issue myself, but hadn't thought to
look here. That is, I had run the tests under net/bpf and net/t_ip_reass
(the latter modified to open /dev/bpf directly, not via rump) on my
ERL3, and these all passed, giving me the impression we did indeed have
a working bpf(4) implementation on n64. Since we didn't, it seems we
need more test coverage here.

Thanks,

Dave

1. http://mail-index.netbsd.org/port-mips/2021/07/04/msg001112.html



Re: CVS commit: src/sys/net

2023-08-25 Thread David H. Gutteridge

Module Name:src
Committed By:   rin
Date:   Wed Aug 23 13:21:17 UTC 2023

Modified Files:
   src/sys/net: bpf.h

Log Message:
bpf: Fix SIZEOF_BPF_HDR (for LP64 userland) on mips64

It cannot fit within 18 bytes, of course ;)

As we had never provided working bpf(4) implementation for LP64
userland on mips, just use natural structure size here.


Thanks for fixing this. This means dhcpcd now works on Octeon hardware
using the n64 userland. Previously, it would fail with "ps_bpf_recvbpf:
Resource temporarily unavailable". (Perhaps you'd noticed this?) This
was reported a while ago[1] by an end user on port-mips@, so we should
presumably pull this up.

Where you say "we had never provided working bpf(4) implementation", I
had been looking into the dhcpcd issue myself, but hadn't thought to
look here. That is, I had run the tests under net/bpf and net/t_ip_reass
(the latter modified to open /dev/bpf directly, not via rump) on my
ERL3, and these all passed, giving me the impression we did indeed have
a working bpf(4) implementation on n64. Since we didn't, it seems we
need more test coverage here.

Thanks,

Dave

1. http://mail-index.netbsd.org/port-mips/2021/07/04/msg001112.html


re: CVS commit: src/sys/net

2023-01-05 Thread matthew green
"Jonathan A. Kollasch" writes:
> Module Name:  src
> Committed By: jakllsch
> Date: Thu Jan  5 02:38:51 UTC 2023
>
> Modified Files:
>   src/sys/net: if_wg.c
>
> Log Message:
> Check for authorization for SIOCSDRVSPEC and SIOCGDRVSPEC ioctls for wg(4).
>
> Addresses PR 57161.

might be nice to push this down for SIOCGDRVSPEC.  it sure seems
right for *set* operation, but perhaps for *get*, it can just
elide the sensitive portion in the output ioctl (either make it
empty or make it not present at all?)  it doesn't seem too hard,
just moving the check into wg_ioctl_get() for the problematic
parts...

the idea being to match "ifconfig" on eg, wifi, only showing the
configured passwrds to root.

thanks.


.mrg.


Re: CVS commit: src/sys/net

2022-10-25 Thread Masanobu SAITOH


On 2022/10/25 14:55, Masanobu SAITOH wrote:
> 
> 
> On 2022/10/25 14:51, matthew green wrote:
>> "SAITOH Masanobu" writes:
>>> Module Name:src
>>> Committed By:   msaitoh
>>> Date:   Mon Oct 24 07:45:44 UTC 2022
>>>
>>> Modified Files:
>>> src/sys/net: if_dl.h
>>>
>>> Log Message:
>>> Increase sdl_data so that more then IFNAMSIZ bytes are available.
>>>
>>>  - Increase the size of dl_data[] from 12 to 24.
>>>  - Same as OpenBSD.
>>
>> isn't this a binary compat issue?  eg, 'struct sockaddr_dl' changes,
>> and that, and things based upon it, are in user interfaces.  i had
>> a look and i believe it's a problem, but maybe i missed something.
>>
>> thanks.
>>
>>
>> .mrg.
> 
> struct dl_addr is at the end of struct sockaddr_dl.
> dl_data is at the end of struct dl_addr.
> So I think it's has no problem for old binaries.

I've roughly tested with old arp, ndp, ifconfig, ifmcstat and all /usr/tests
on a new kernel and I didn't saw any problem.

I think getifaddrs(3) has no problem. The routing message has no problem
because struct rtm_msglen has rtm_msglen and we can get the next message
using with it. I can't see any kernel data structure which has
struct sockaddr_dl foobadr[xxx] array.

One problem might be exist. An old userland program allocate buffer
with sizeof(struct sockaddr_dl), the size is smaller than the newer.
The old program might copy with memcpy(old_sized_buf, newdata, dla->sdl_len).

Should I backout the change until the COMPAT code is written?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2022-10-24 Thread Masanobu SAITOH



On 2022/10/25 14:51, matthew green wrote:
> "SAITOH Masanobu" writes:
>> Module Name: src
>> Committed By:msaitoh
>> Date:Mon Oct 24 07:45:44 UTC 2022
>>
>> Modified Files:
>>  src/sys/net: if_dl.h
>>
>> Log Message:
>> Increase sdl_data so that more then IFNAMSIZ bytes are available.
>>
>>  - Increase the size of dl_data[] from 12 to 24.
>>  - Same as OpenBSD.
> 
> isn't this a binary compat issue?  eg, 'struct sockaddr_dl' changes,
> and that, and things based upon it, are in user interfaces.  i had
> a look and i believe it's a problem, but maybe i missed something.
> 
> thanks.
> 
> 
> .mrg.

struct dl_addr is at the end of struct sockaddr_dl.
dl_data is at the end of struct dl_addr.
So I think it's has no problem for old binaries.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


re: CVS commit: src/sys/net

2022-10-24 Thread matthew green
"SAITOH Masanobu" writes:
> Module Name:  src
> Committed By: msaitoh
> Date: Mon Oct 24 07:45:44 UTC 2022
>
> Modified Files:
>   src/sys/net: if_dl.h
>
> Log Message:
> Increase sdl_data so that more then IFNAMSIZ bytes are available.
>
>  - Increase the size of dl_data[] from 12 to 24.
>  - Same as OpenBSD.

isn't this a binary compat issue?  eg, 'struct sockaddr_dl' changes,
and that, and things based upon it, are in user interfaces.  i had
a look and i believe it's a problem, but maybe i missed something.

thanks.


.mrg.


Re: CVS commit: src/sys/net

2021-12-03 Thread Tobias Nygren
On Mon, 15 Nov 2021 07:07:06 +
Shoichi YAMAGUCHI  wrote:

> Date: Mon Nov 15 07:07:06 UTC 2021
> 
> Modified Files:
>   src/sys/net: if_ether.h if_ethersubr.c if_vlan.c
>   src/sys/net/lagg: if_lagg.c
> 
> Log Message:
> introduced APIs to configure VLAN TAG to ethernet devices

Hello,

This change seems to have broke something MTU related when
more than two VLANs are configured on the same NIC.
The first VLAN inherits the parent's MTU but the subsequent
ones have 4 byte too small MTU causing network problems.

Kernel from 2021-11-15:

# ifconfig |grep mtu
wm0: flags=0x8843 mtu 1500
vlan100: flags=0x8843 mtu 1500
vlan101: flags=0x8843 mtu 1500
vlan102: flags=0x8843 mtu 1500
vlan103: flags=0x8843 mtu 1500
vlan104: flags=0x8843 mtu 1500
vlan105: flags=0x8843 mtu 1500
vlan106: flags=0x8843 mtu 1500

Kernel from 2021-11-16:

wm0: flags=0x8843 mtu 1500
vlan100: flags=0x8843 mtu 1500
vlan101: flags=0x8843 mtu 1496
vlan102: flags=0x8843 mtu 1496
vlan103: flags=0x8843 mtu 1496
vlan104: flags=0x8843 mtu 1496
vlan105: flags=0x8843 mtu 1496
vlan106: flags=0x8843 mtu 1496

I could not force increase the MTU either:

root@tinfoilhat-b:log> ifconfig vlan100 mtu 1500
root@tinfoilhat-b:log> ifconfig vlan101 mtu 1500
ifconfig: SIOCSIFMTU: Invalid argument

Kind regards,
-Tobias


Re: CVS commit: src/sys/net

2021-05-06 Thread Christos Zoulas
In article <20210506061816.94bb1f...@cvs.netbsd.org>,
Shoichi YAMAGUCHI  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  yamaguchi
>Date:  Thu May  6 06:18:16 UTC 2021
>
>Modified Files:
>   src/sys/net: if_spppsubr.c
>
>Log Message:
>do not clear destination address if there is no saved address
>and add initialization of saved_hisaddr for safety

You don't need to byte-swap in order to compare with 0... :-)

christos



re: CVS commit: src/sys/net

2021-04-29 Thread matthew green
> (I haven't investigated exactly what's going on leading to ~5 KB-byte
> stack frames, but this shuts gcc up, at least, and the hypotheses
> sound plausible to me!)

this matches what i saw when looking at it.  i was going to
test the exact same idea (noinline).


.mrg.


Re: CVS commit: src/sys/net

2021-02-14 Thread Roy Marples

On 13/02/2021 14:19, Jonathan A. Kollasch wrote:

On Sat, Feb 13, 2021 at 07:28:05AM +, Roy Marples wrote:

Module Name:src
Committed By:   roy
Date:   Sat Feb 13 07:28:05 UTC 2021

Modified Files:
src/sys/net: if_ether.h if_ethersubr.c

Log Message:
if_ether: Ensure that ether_header is aligned


To generate a diff of this commit:
cvs rdiff -u -r1.84 -r1.85 src/sys/net/if_ether.h
cvs rdiff -u -r1.289 -r1.290 src/sys/net/if_ethersubr.c


This appears to ensure the Ethernet header is naturally aligned on a
32-bit boundary.  The 16-bit ether_type field is the only thing in
ether_header that is wider than a uint8_t.

Many drivers will place the start of the receive buffer at an
ETHER_ALIGN (which is 2) offset to ensure the layer three header
is 32-bit aligned after the 14-byte Ethernet header.  Thus this
will result in always calling m_copyup() in ether_input() on strict
alignment platforms.


Reverted


Re: CVS commit: src/sys/net

2021-02-13 Thread Jonathan A. Kollasch
On Sat, Feb 13, 2021 at 07:28:05AM +, Roy Marples wrote:
> Module Name:  src
> Committed By: roy
> Date: Sat Feb 13 07:28:05 UTC 2021
> 
> Modified Files:
>   src/sys/net: if_ether.h if_ethersubr.c
> 
> Log Message:
> if_ether: Ensure that ether_header is aligned
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.84 -r1.85 src/sys/net/if_ether.h
> cvs rdiff -u -r1.289 -r1.290 src/sys/net/if_ethersubr.c

This appears to ensure the Ethernet header is naturally aligned on a
32-bit boundary.  The 16-bit ether_type field is the only thing in
ether_header that is wider than a uint8_t.

Many drivers will place the start of the receive buffer at an
ETHER_ALIGN (which is 2) offset to ensure the layer three header
is 32-bit aligned after the 14-byte Ethernet header.  Thus this
will result in always calling m_copyup() in ether_input() on strict
alignment platforms.


Re: CVS commit: src/sys/net

2021-01-31 Thread Kengo NAKAHARA

Hi,

I have one question about the following commit.

Why stoeplitz_hash_ip4() and stoeplitz_hash_ip4port() argument types
are different between toeplitz.c and toeplitz.h?  They have in_addr_t
and in_port_t argument types in toeplitz.c, howerver uint32_t and
uint16_t in toeplitz.h.

On 2021/01/31 6:23, Jared D. McNeill wrote:

Module Name:src
Committed By:   jmcneill
Date:   Sat Jan 30 21:23:08 UTC 2021

Modified Files:
src/sys/net: files.net
Added Files:
src/sys/net: toeplitz.c toeplitz.h

Log Message:
Add symmetric toeplitz implementation with integration for NICs, from OpenBSD.


To generate a diff of this commit:
cvs rdiff -u -r1.29 -r1.30 src/sys/net/files.net
cvs rdiff -u -r0 -r1.1 src/sys/net/toeplitz.c src/sys/net/toeplitz.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Thanks,


--
//
Internet Initiative Japan Inc.

Device Engineering Section,
Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2020-01-28 Thread Kamil Rytarowski
On 29.01.2020 07:40, Jason Thorpe wrote:
> 
>> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski  wrote:
>>
>> The distribution build breaks for me with:
> 
> Should be fixed with:
> 
>  /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v  <--  Makefile
>  new revision: 1.33; previous revision: 1.32
> 
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_init'
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_fini'
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_to_if_data'
>>
> 
> -- thorpej
> 

It works now!



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2020-01-28 Thread Jason Thorpe


> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski  wrote:
> 
> The distribution build breaks for me with:

Should be fixed with:

 /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v  <--  Makefile
 new revision: 1.33; previous revision: 1.32

> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_init'
> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_fini'
> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_to_if_data'
> 

-- thorpej



Re: CVS commit: src/sys/net

2020-01-28 Thread Kamil Rytarowski
On 29.01.2020 04:16, Jason R Thorpe wrote:
> Module Name:  src
> Committed By: thorpej
> Date: Wed Jan 29 03:16:28 UTC 2020
> 
> Modified Files:
>   src/sys/net: Makefile files.net if.c if.h
> Added Files:
>   src/sys/net: if_stats.c if_stats.h
> 
> Log Message:
> Add support for MP-safe network interface statistics by maintaining them
> in per-cpu storage, and collecting them for export in an if_data structure
> when user-space wants them.
> 
> The new if_stat API is structured to make a gradual transition to the
> new way in network drivers possible, and per-cpu stats are currently
> disabled (thus there is no kernel ABI change).  Once all drivers have
> been converted, the old ABI will be removed, and per-cpu stats will be
> enabled universally.
> 

The distribution build breaks for me with:

/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_init'
/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_fini'
/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_to_if_data'



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2019-12-04 Thread Masanobu SAITOH
On 2019/12/05 14:29, SAITOH Masanobu wrote:
> Module Name:  src
> Committed By: msaitoh
> Date: Thu Dec  5 05:29:28 UTC 2019
> 
> Modified Files:
>   src/sys/net: if_media.h
> 
> Log Message:
> Fix previous comment change for ifm_media. It was correct.
> 
>  The real problem is that some driver misuse ifm_media as the current active
> media. struct mii_data has the current active media(mii_media_active). If a
> driver use mii(4), it can be use mii->mii_media_active for this purpose.
> struct ifmedia has no entry for this purpose. Some drivers have an entry

s/Some drivers/Some drivers not using mii(4)/

> in their own softc to keep the value, but some other's don't have it and
> they mistakenly use ifm_media.
> 
>  We might add a new entry to struct ifmedia in future to avoid this confusion
> and for simplify.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.67 -r1.68 src/sys/net/if_media.h
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2019-09-23 Thread Ryota Ozaki
On Mon, Sep 23, 2019 at 4:14 PM Kamil Rytarowski  wrote:
>
> On 23.09.2019 06:53, Rin Okuyama wrote:
> > Hi,
> >
> > On 2019/09/22 18:30, Kamil Rytarowski wrote:
> >> On 12.04.2018 06:38, Ryota Ozaki wrote:
> >>> Module Name:src
> >>> Committed By:ozaki-r
> >>> Date:Thu Apr 12 04:38:13 UTC 2018
> >>>
> >>> Modified Files:
> >>> src/sys/net: if.h route.c route.h rtsock.c
> >>>
> >>> Log Message:
> >>> Resolve tangled lock dependencies in route.c
> >>>
> >>> This change sweeps remaining lock decisions based on if locked or not
> >>> by moving
> >>> utility functions of rtentry updates from rtsock.c and ensuring
> >>> holding the
> >>> rt_lock.  It also improves the atomicity of a update of a rtentry.
> >>>
> >>
> >>> +static struct ifaddr *
> >>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct
> >>> rtentry *rt,
> >>> +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
> >>> +{
> >>
> >>
> >> Do we need to pass info as a value? It is pretty large here (1024 bytes).
> >
> > Yeah, we were just discussing on this alert of LGTM bot.
> >
> > We can use const pointer here. I will commit the fix soon.
> >
> > Thanks,
> > rin

Thank you for the commit!

>
> Thanks for addressing it! I wonder whether there is performance impact
> here (is this hot-path code?).

The function is not used in any packet processing (hot paths)
and used only for route updates that are uncommon for most users.

  ozaki-r


Re: CVS commit: src/sys/net

2019-09-23 Thread Kamil Rytarowski
On 23.09.2019 06:53, Rin Okuyama wrote:
> Hi,
> 
> On 2019/09/22 18:30, Kamil Rytarowski wrote:
>> On 12.04.2018 06:38, Ryota Ozaki wrote:
>>> Module Name:    src
>>> Committed By:    ozaki-r
>>> Date:    Thu Apr 12 04:38:13 UTC 2018
>>>
>>> Modified Files:
>>> src/sys/net: if.h route.c route.h rtsock.c
>>>
>>> Log Message:
>>> Resolve tangled lock dependencies in route.c
>>>
>>> This change sweeps remaining lock decisions based on if locked or not
>>> by moving
>>> utility functions of rtentry updates from rtsock.c and ensuring
>>> holding the
>>> rt_lock.  It also improves the atomicity of a update of a rtentry.
>>>
>>
>>> +static struct ifaddr *
>>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct
>>> rtentry *rt,
>>> +    struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
>>> +{
>>
>>
>> Do we need to pass info as a value? It is pretty large here (1024 bytes).
> 
> Yeah, we were just discussing on this alert of LGTM bot.
> 
> We can use const pointer here. I will commit the fix soon.
> 
> Thanks,
> rin

Thanks for addressing it! I wonder whether there is performance impact
here (is this hot-path code?).



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2019-09-22 Thread Rin Okuyama

Hi,

On 2019/09/22 18:30, Kamil Rytarowski wrote:

On 12.04.2018 06:38, Ryota Ozaki wrote:

Module Name:src
Committed By:   ozaki-r
Date:   Thu Apr 12 04:38:13 UTC 2018

Modified Files:
src/sys/net: if.h route.c route.h rtsock.c

Log Message:
Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock.  It also improves the atomicity of a update of a rtentry.




+static struct ifaddr *
+rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt,
+struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
+{



Do we need to pass info as a value? It is pretty large here (1024 bytes).


Yeah, we were just discussing on this alert of LGTM bot.

We can use const pointer here. I will commit the fix soon.

Thanks,
rin


Re: CVS commit: src/sys/net

2019-09-22 Thread Kamil Rytarowski
On 12.04.2018 06:38, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Thu Apr 12 04:38:13 UTC 2018
> 
> Modified Files:
>   src/sys/net: if.h route.c route.h rtsock.c
> 
> Log Message:
> Resolve tangled lock dependencies in route.c
> 
> This change sweeps remaining lock decisions based on if locked or not by 
> moving
> utility functions of rtentry updates from rtsock.c and ensuring holding the
> rt_lock.  It also improves the atomicity of a update of a rtentry.
> 

> +static struct ifaddr *
> +rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt,
> +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
> +{


Do we need to pass info as a value? It is pretty large here (1024 bytes).



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Maxime Villard

Le 06/08/2019 à 12:31, Christos Zoulas a écrit :

I did not see any messages about it, and the fix is fine until rmind comes up 
with something better.


Yes turns out it was an off-list email


It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).


Given your second commit, I should understand that rmind is ok with this
change right?


christos


On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:

Le 06/08/2019 à 12:25, Christos Zoulas a écrit :

Module Name:src
Committed By:   christos
Date:   Tue Aug  6 10:25:13 UTC 2019
Modified Files:
src/sys/net/npf: npf_conn.c
Log Message:
Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.
To generate a diff of this commit:
cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Rmind said he had a fix and was testing it. Please revert this.




Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Christos Zoulas
Yes, rmind asked me to commit the change in private email.

christos

> On Aug 6, 2019, at 3:59 PM, Maxime Villard  wrote:
> 
> Le 06/08/2019 à 12:31, Christos Zoulas a écrit :
>> I did not see any messages about it, and the fix is fine until rmind comes 
>> up with something better.
> 
> Yes turns out it was an off-list email
> 
>> It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).
> 
> Given your second commit, I should understand that rmind is ok with this
> change right?
> 
>> christos
>>> On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:
>>> 
>>> Le 06/08/2019 à 12:25, Christos Zoulas a écrit :
 Module Name:   src
 Committed By:  christos
 Date:  Tue Aug  6 10:25:13 UTC 2019
 Modified Files:
src/sys/net/npf: npf_conn.c
 Log Message:
 Introduce an npf_conn_destroy_idx() that can handle partially constructed
 conn structures.
 To generate a diff of this commit:
 cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
>>> 
>>> Rmind said he had a fix and was testing it. Please revert this.



Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Christos Zoulas
I did not see any messages about it, and the fix is fine until rmind comes up 
with something better.
It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).

christos

> On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:
> 
> Le 06/08/2019 à 12:25, Christos Zoulas a écrit :
>> Module Name: src
>> Committed By:christos
>> Date:Tue Aug  6 10:25:13 UTC 2019
>> Modified Files:
>>  src/sys/net/npf: npf_conn.c
>> Log Message:
>> Introduce an npf_conn_destroy_idx() that can handle partially constructed
>> conn structures.
>> To generate a diff of this commit:
>> cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
>> Please note that diffs are not public domain; they are subject to the
>> copyright notices on the relevant files.
> 
> Rmind said he had a fix and was testing it. Please revert this.



Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Maxime Villard

Le 06/08/2019 à 12:25, Christos Zoulas a écrit :

Module Name:src
Committed By:   christos
Date:   Tue Aug  6 10:25:13 UTC 2019

Modified Files:
src/sys/net/npf: npf_conn.c

Log Message:
Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.


To generate a diff of this commit:
cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Rmind said he had a fix and was testing it. Please revert this.


Re: CVS commit: src/sys/net

2019-04-15 Thread Christos Zoulas
In article ,
Ryota Ozaki   wrote:
>On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas  wrote:
>>
>> Module Name:src
>> Committed By:   christos
>> Date:   Mon Apr 15 20:51:46 UTC 2019
>>
>> Modified Files:
>> src/sys/net: if.c
>>
>> Log Message:
>> Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
>> disclosure. From Andy Nguyen, many thanks!
>
>Should we apply the same fix to all compat variants of ifconf?

Yup, good catch! Also pullups :-)

christos



Re: CVS commit: src/sys/net

2019-04-15 Thread Ryota Ozaki
On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas  wrote:
>
> Module Name:src
> Committed By:   christos
> Date:   Mon Apr 15 20:51:46 UTC 2019
>
> Modified Files:
> src/sys/net: if.c
>
> Log Message:
> Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
> disclosure. From Andy Nguyen, many thanks!

Should we apply the same fix to all compat variants of ifconf?

  ozaki-r


Re: CVS commit: src/sys/net

2019-03-25 Thread Robert Elz
Date:Tue, 26 Mar 2019 00:23:32 +
From:"Paul Goyette" 
Message-ID:  <20190326002332.c0241f...@cvs.netbsd.org>

  | XXX Someone(tm) needs to update MAKEDEV to create the /dev/srtN device
  | nodes (with device-major 179)!

Yes, I know ...

That's easy - but as I mentioned in off list mail, my private
pre-requisite for doing it is making srt(4) exist first.

kre

ps: thanks for all the modularisation fixes.   I wonder how much
more exists that appears as if it could be made a module, but
actually cannot?



re: CVS commit: src/sys/net

2019-02-18 Thread matthew green
"Christos Zoulas" writes:
> Module Name:  src
> Committed By: christos
> Date: Mon Feb 18 23:13:14 UTC 2019
> 
> Modified Files:
>   src/sys/net: zlib.c
> 
> Log Message:
> add fallthrough's

i specific avoided changing upstream code where possible by adding
the relevant cflags.

where was this failing?

same with most of the drm changes (commited) and (posted) patch, too.

can you either revert these as unnecessary or go remove the flags
for them?  they're duplicated in multiple places (module/kernel).

thanks.


.mrg.


Re: CVS commit: src/sys/net

2018-12-14 Thread Rin Okuyama

Thank you for finding it out. I removed it in accident.

rin

On 2018/12/14 21:27, Martin Husemann wrote:

Module Name:src
Committed By:   martin
Date:   Fri Dec 14 12:27:22 UTC 2018

Modified Files:
src/sys/net: if_bridge.c

Log Message:
Need  for ip6_statinc() prototype.


To generate a diff of this commit:
cvs rdiff -u -r1.161 -r1.162 src/sys/net/if_bridge.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Re: CVS commit: src/sys/net

2018-12-13 Thread Jason Thorpe
Panic seems not optimal.  A rate limited log message at LOG_ERROR seems better.

-- thorpej
Sent from my iPhone.

> On Dec 13, 2018, at 12:54 PM, Rin Okuyama  wrote:
> 
> Module Name:src
> Committed By:rin
> Date:Thu Dec 13 20:54:50 UTC 2018
> 
> Modified Files:
>src/sys/net: ether_sw_offload.c
> 
> Log Message:
> Panic rather than silently dropping packets when TX offload options are
> enabled for unsupported frame types.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.3 -r1.4 src/sys/net/ether_sw_offload.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 



Re: CVS commit: src/sys/net

2018-07-10 Thread Martin Husemann
On Tue, Jul 10, 2018 at 05:02:27PM +, Christos Zoulas wrote:
> Should be pulled up to -8, with the other patch.

Already done ;-)


Re: CVS commit: src/sys/net

2018-07-10 Thread Christos Zoulas
In article <20180710110040.e80fcf...@cvs.netbsd.org>,
Robert Elz  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  kre
>Date:  Tue Jul 10 11:00:40 UTC 2018
>
>Modified Files:
>   src/sys/net: if_llatbl.c
>
>Log Message:
>Avoid attempting to call arp related functions if there is no
>arp in the kernel.

Should be pulled up to -8, with the other patch.

christos



Re: CVS commit: src/sys/net/npf

2018-04-07 Thread Maxime Villard

Le 07/04/2018 à 23:28, Christos Zoulas a écrit :

In article <20180407090627.20058f...@cvs.netbsd.org>,
Maxime Villard  wrote:

-=-=-=-=-=-

Module Name:src
Committed By:   maxv
Date:   Sat Apr  7 09:06:27 UTC 2018

Modified Files:
src/sys/net/npf: npf_inet.c

Log Message:
Rewrite npf_fetch_tcpopts:

* Instead of doing several nbuf_advance/nbuf_ensure_contig and
   playing with gotos, fetch the TCP options only once, and iterate over
   the (safe) area. The code is similar to tcp_dooptions.

* When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is
   the one we're expecting. If it isn't, then skip the option. This
   wasn't done before, and not doing it allowed a packet to bypass the
   max-mss clamping procedure. Discussed on tech-net@.



This seems to break

cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning
ENETUNREACH.

christos


My bad (again).

Seems like the TCP code is getting me confused all the time.


Re: CVS commit: src/sys/net/npf

2018-04-07 Thread Christos Zoulas
In article <20180407090627.20058f...@cvs.netbsd.org>,
Maxime Villard  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  maxv
>Date:  Sat Apr  7 09:06:27 UTC 2018
>
>Modified Files:
>   src/sys/net/npf: npf_inet.c
>
>Log Message:
>Rewrite npf_fetch_tcpopts:
>
> * Instead of doing several nbuf_advance/nbuf_ensure_contig and
>   playing with gotos, fetch the TCP options only once, and iterate over
>   the (safe) area. The code is similar to tcp_dooptions.
>
> * When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is
>   the one we're expecting. If it isn't, then skip the option. This
>   wasn't done before, and not doing it allowed a packet to bypass the
>   max-mss clamping procedure. Discussed on tech-net@.
>

This seems to break

cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning
ENETUNREACH.

christos



Re: CVS commit: src/sys/net/npf

2018-01-30 Thread Maxime Villard

Le 31/01/2018 à 00:18, Mindaugas Rasiukevicius a écrit :

[...]

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.

[...]
If the npc_hlen value is beyond the packet length, NPF's nbuf interface
will catch that, since it performs the bounds check.


I meant to say that the IPv6 loop in npf_cache_ip seems suspicious to me.

while (nbuf_advance(nbuf, hlen, 0) != NULL) {
[...]
hlen = (ip6e->ip6e_len + 1) << 3;
[...]
npc->npc_hlen += hlen;
}
[continue execution...]

Here, if you have a 'hlen' that goes beyond the mbuf, nbuf_advance will fail,
and we're not handling this case. npc_hlen got incremented along the way, and
it now points past the end of the mbuf.

Perhaps that's handled properly later, but in all cases, we ought to handle
the error right here instead of processing the packet any further.

Note however that NPF is rather at the end of my TODO list, and I'll come back
to it later.

Maxime


Re: CVS commit: src/sys/net/npf

2018-01-30 Thread Mindaugas Rasiukevicius
"Maxime Villard"  wrote:
> Module Name:  src
> Committed By: maxv
> Date: Fri Dec 15 21:00:26 UTC 2017
> 
> Modified Files:
>   src/sys/net/npf: npf.h
> 
> Log Message:
> Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
> bypass a certain number of filtering rules.
> 
> Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
> 8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
> has large extensions.

Thanks for discovering and fixing this.  I think this is the first
serious remote vulnerability in NPF, although limited to IPv6 only.

> Fix this by using uint32_t. While here, it seems to me there is also a
> memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
> a value that goes beyond the mbuf.

A minor aspect, but promoting npf_hlen to uint32_t results in wasteful
padding in the struct, so it is better to re-order the struct members
in this case.

If the npc_hlen value is beyond the packet length, NPF's nbuf interface
will catch that, since it performs the bounds check.  However, I think
we should implement some sanity check for the npc_hlen value.  RFC 7112
suggests that the IPv6 header chain should not exceed the MTU size (and
thus fit in the first fragment, in case of fragmentation).  Some value
along these lines could be the basis for a sanity check..

-- 
Mindaugas


Re: CVS commit: src/sys/net

2018-01-19 Thread Maxime Villard

Le 19/01/2018 à 13:31, Takeshi Nakayama a écrit :

Module Name:src
Committed By:   nakayama
Date:   Fri Jan 19 12:31:28 UTC 2018

Modified Files:
src/sys/net: if_ethersubr.c

Log Message:
Fix inverted logic.


To generate a diff of this commit:
cvs rdiff -u -r1.256 -r1.257 src/sys/net/if_ethersubr.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Ah yes, my bad, sorry about that. When I saw this ugly #ifdef DIAGNOSTIC and
panic I just machinally replaced it by KASSERTMSG.

Maxime


Re: CVS commit: src/sys/net

2018-01-08 Thread maya
On Tue, Jan 09, 2018 at 01:07:50AM +, m...@netbsd.org wrote:
> This is causing PR kern/52914 (on netbsd-8, not current)

Copied wrong number
kern/52895

it's in -current but also -8


Re: CVS commit: src/sys/net

2018-01-08 Thread maya
This is causing PR kern/52914 (on netbsd-8, not current)

On Fri, Dec 15, 2017 at 04:04:59AM +, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Fri Dec 15 04:04:59 UTC 2017
> 
> Modified Files:
>   src/sys/net: if.c
> 
> Log Message:
> Remove IFNET_GLOBAL_LOCK where it's unnecessary because IFNET_LOCK is held
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.415 -r1.416 src/sys/net/if.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 

> Modified files:
> 
> Index: src/sys/net/if.c
> diff -u src/sys/net/if.c:1.415 src/sys/net/if.c:1.416
> --- src/sys/net/if.c:1.415Fri Dec 15 04:03:46 2017
> +++ src/sys/net/if.c  Fri Dec 15 04:04:58 2017
> @@ -1,4 +1,4 @@
> -/*   $NetBSD: if.c,v 1.415 2017/12/15 04:03:46 ozaki-r Exp $ */
> +/*   $NetBSD: if.c,v 1.416 2017/12/15 04:04:58 ozaki-r Exp $ */
>  
>  /*-
>   * Copyright (c) 1999, 2000, 2001, 2008 The NetBSD Foundation, Inc.
> @@ -90,7 +90,7 @@
>   */
>  
>  #include 
> -__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.415 2017/12/15 04:03:46 ozaki-r Exp $");
> +__KERNEL_RCSID(0, "$NetBSD: if.c,v 1.416 2017/12/15 04:04:58 ozaki-r Exp $");
>  
>  #if defined(_KERNEL_OPT)
>  #include "opt_inet.h"
> @@ -1792,11 +1792,18 @@ ifa_insert(struct ifnet *ifp, struct ifa
>  
>   ifa->ifa_ifp = ifp;
>  
> - IFNET_GLOBAL_LOCK();
> + /*
> +  * Check !IFF_RUNNING for initialization routines that normally don't
> +  * take IFNET_LOCK but it's safe because there is no competitor.
> +  * XXX there are false positive cases because IFF_RUNNING can be off on
> +  * if_stop.
> +  */
> + KASSERT(!ISSET(ifp->if_flags, IFF_RUNNING) ||
> + IFNET_LOCKED(ifp));
> +
>   TAILQ_INSERT_TAIL(&ifp->if_addrlist, ifa, ifa_list);
>   IFADDR_ENTRY_INIT(ifa);
>   IFADDR_WRITER_INSERT_TAIL(ifp, ifa);
> - IFNET_GLOBAL_UNLOCK();
>  
>   ifaref(ifa);
>  }
> @@ -1806,14 +1813,19 @@ ifa_remove(struct ifnet *ifp, struct ifa
>  {
>  
>   KASSERT(ifa->ifa_ifp == ifp);
> + /*
> +  * if_is_deactivated indicates ifa_remove is called form if_detach
> +  * where is safe even if IFNET_LOCK isn't held.
> +  */
> + KASSERT(if_is_deactivated(ifp) || IFNET_LOCKED(ifp));
>  
> - IFNET_GLOBAL_LOCK();
>   TAILQ_REMOVE(&ifp->if_addrlist, ifa, ifa_list);
>   IFADDR_WRITER_REMOVE(ifa);
>  #ifdef NET_MPSAFE
> + IFNET_GLOBAL_LOCK();
>   pserialize_perform(ifnet_psz);
> -#endif
>   IFNET_GLOBAL_UNLOCK();
> +#endif
>  
>  #ifdef NET_MPSAFE
>   psref_target_destroy(&ifa->ifa_psref, ifa_psref_class);
> 



Re: CVS commit: src/sys/net

2017-12-20 Thread Ryota Ozaki
2017/12/18 22:09 "Christos Zoulas" :

On Dec 18,  2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Oh cool. I didn't know that technique.

Great, can you undo the change then?


I already did it :)

  ozaki-r


Thanks,

christos


Re: CVS commit: src/sys/net

2017-12-18 Thread Christos Zoulas
On Dec 18,  2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Oh cool. I didn't know that technique.

Great, can you undo the change then?

Thanks,

christos


Re: CVS commit: src/sys/net

2017-12-17 Thread Ryota Ozaki
On Thu, Dec 14, 2017 at 10:03 PM, Christos Zoulas  wrote:
> On Dec 14,  5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
> -- Subject: Re: CVS commit: src/sys/net
>
> | Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
> | Anyway no objection to revert the change that is not quite important
> | (but makes my life easy).
>
> just do:
>
> cat << _EOF >> /etc/mk.conf
> .if make(netbsd)
> COPTS.route.c += -O0
> .endif
> _EOF
>
> That (or something similar) should work.

Oh cool. I didn't know that technique.

Thanks,
  ozaki-r


Re: CVS commit: src/sys/net

2017-12-14 Thread Christos Zoulas
On Dec 14,  5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
| Anyway no objection to revert the change that is not quite important
| (but makes my life easy).

just do:

cat << _EOF >> /etc/mk.conf
.if make(netbsd)
COPTS.route.c += -O0
.endif
_EOF

That (or something similar) should work.

christos


Re: CVS commit: src/sys/net

2017-12-14 Thread Ryota Ozaki
On Thu, Dec 14, 2017 at 3:27 PM, Christos Zoulas  wrote:
> In article <20171214054314.e76edf...@cvs.netbsd.org>,
> Ryota Ozaki  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:   src
>>Committed By:  ozaki-r
>>Date:  Thu Dec 14 05:43:14 UTC 2017
>>
>>Modified Files:
>>   src/sys/net: rtsock.c
>>
>>Log Message:
>>Spinkle __noinline to some non-performance-sensitive functions for debugging
>
> Well, "for debugging" we can always compile with different flags rather
> than penalize everyone/everytime however that small might be.

Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
Anyway no objection to revert the change that is not quite important
(but makes my life easy).

  ozaki-r


Re: CVS commit: src/sys/net

2017-12-13 Thread Christos Zoulas
In article <20171214054314.e76edf...@cvs.netbsd.org>,
Ryota Ozaki  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  ozaki-r
>Date:  Thu Dec 14 05:43:14 UTC 2017
>
>Modified Files:
>   src/sys/net: rtsock.c
>
>Log Message:
>Spinkle __noinline to some non-performance-sensitive functions for debugging

Well, "for debugging" we can always compile with different flags rather
than penalize everyone/everytime however that small might be.

christos



Re: CVS commit: src/sys/net

2017-11-21 Thread Masanobu SAITOH

On 2017/11/22 14:40, Robert Elz wrote:

 Date:Wed, 22 Nov 2017 04:56:52 +
 From:"SAITOH Masanobu" 
 Message-ID:  <20171122045652.2ee75f...@cvs.netbsd.org>

   |  Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535.
   |  The spec states 0 and 65535 are reserved.

Tags are 12 bigs, so 0..4095 (65535 won't fit).   The code that was
added in this commit looks correct (0xfff - it would be better as a
named const defined somewhere) but the comment that accompanies it
(and this commit log entry) are not.


Yes, you're correct. I fixed the comment.

 Thanks.


kre




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2017-11-21 Thread Robert Elz
Date:Wed, 22 Nov 2017 04:56:52 +
From:"SAITOH Masanobu" 
Message-ID:  <20171122045652.2ee75f...@cvs.netbsd.org>

  |  Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535.
  |  The spec states 0 and 65535 are reserved.

Tags are 12 bigs, so 0..4095 (65535 won't fit).   The code that was
added in this commit looks correct (0xfff - it would be better as a
named const defined somewhere) but the comment that accompanies it
(and this commit log entry) are not.

kre



Re: CVS commit: src/sys/net

2017-09-24 Thread Ryota Ozaki
On Fri, Sep 22, 2017 at 5:53 PM, Joerg Sonnenberger  wrote:
> On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote:
>> Module Name:  src
>> Committed By: ozaki-r
>> Date: Fri Sep 22 05:05:32 UTC 2017
>>
>> Modified Files:
>>   src/sys/net: route.c
>>
>> Log Message:
>> Remove the global lock for rtcache
>>
>> Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only 
>> by
>> their users. And in existing usages a rtcache is guranteed to be not accessed
>> simultaneously. So the rtcache framework doesn't need any exclusion controls
>> in itself.
>
> I'm a bit suspicous about this since the generation counter is not
> atomic. So what guarantees that route manipulations of different
> subsystems don't race?

If !NET_MPSAFE, it's softnet_lock, however, if NET_MPSAFE, nothing guarantees
that as you suspect. I've fixed the issue. Thanks!

  ozaki-r


Re: CVS commit: src/sys/net

2017-09-22 Thread Joerg Sonnenberger
On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Fri Sep 22 05:05:32 UTC 2017
> 
> Modified Files:
>   src/sys/net: route.c
> 
> Log Message:
> Remove the global lock for rtcache
> 
> Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
> their users. And in existing usages a rtcache is guranteed to be not accessed
> simultaneously. So the rtcache framework doesn't need any exclusion controls
> in itself.

I'm a bit suspicous about this since the generation counter is not
atomic. So what guarantees that route manipulations of different
subsystems don't race?

Joerg


Re: CVS commit: src/sys/net

2017-04-05 Thread Ryota Ozaki
On Thu, Apr 6, 2017 at 3:34 AM, Taylor R Campbell
 wrote:
>> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08)
>> From: Paul Goyette 
>>
>> @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
>>   {
>>  int rc;
>>  struct ifreq ifr;
>> +   bool need_unlock = false;
>> +
>> +   /* XXX if_ioctl_lock may or may not be held here */
>> +   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
>> +   mutex_enter(ifp->if_ioctl_lock);
>> +   need_unlock = true;
>> +   }
>>
>> It is my understanding that using mutex_owned() to effect locking
>> decisions is forbidden.
>>
>> I understand the intent of doing it this way, but perhaps we should
>> revisit the callers and enforce that they always take the lock?
>
> Quite right.  Please revert this change and post a note to tech-net@
> about how to resolve the issue properly if you need help, including a
> statement of what code paths do or don't hold the ioctl lock and why
> it's hard to address them.

It tried to put the ioctl lock at upper callers and gave up doing so
because that's too complex (ioctl is easily called recursively) and
the ioctl lock for an interface doesn't fit to upper callers that
don't related to ioctl at all. So I don't think it's worthwhile to
investigate where we should lock correctly at upper callers. If we
correctly fix the issue, we should make ioctl callees MP-safe somehow.

Anyway I know the change isn't the best approach so I'll revert
the commit.

>
> From what I recall (haven't looked recently, though) the ifioctl
> locking may be broken, and may be better written with localcount(9).

That part now uses psref and if_ioctl_lock. I'm not sure localcount
suits more for that.

  ozaki-r


Re: CVS commit: src/sys/net

2017-04-05 Thread Taylor R Campbell
> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08)
> From: Paul Goyette 
> 
> @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
>   {
>  int rc;
>  struct ifreq ifr;
> +   bool need_unlock = false;
> +
> +   /* XXX if_ioctl_lock may or may not be held here */
> +   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
> +   mutex_enter(ifp->if_ioctl_lock);
> +   need_unlock = true;
> +   }
> 
> It is my understanding that using mutex_owned() to effect locking 
> decisions is forbidden.
> 
> I understand the intent of doing it this way, but perhaps we should 
> revisit the callers and enforce that they always take the lock?

Quite right.  Please revert this change and post a note to tech-net@
about how to resolve the issue properly if you need help, including a
statement of what code paths do or don't hold the ioctl lock and why
it's hard to address them.

>From what I recall (haven't looked recently, though) the ifioctl
locking may be broken, and may be better written with localcount(9).


Re: CVS commit: src/sys/net

2017-04-05 Thread Paul Goyette

@@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
 {
int rc;
struct ifreq ifr;
+   bool need_unlock = false;
+
+   /* XXX if_ioctl_lock may or may not be held here */
+   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
+   mutex_enter(ifp->if_ioctl_lock);
+   need_unlock = true;
+   }

It is my understanding that using mutex_owned() to effect locking 
decisions is forbidden.


I understand the intent of doing it this way, but perhaps we should 
revisit the callers and enforce that they always take the lock?




On Wed, 5 Apr 2017, Ryota Ozaki wrote:


Module Name:src
Committed By:   ozaki-r
Date:   Wed Apr  5 03:47:51 UTC 2017

Modified Files:
src/sys/net: if.c if.h if_ethersubr.c link_proto.c

Log Message:
Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.


To generate a diff of this commit:
cvs rdiff -u -r1.389 -r1.390 src/sys/net/if.c
cvs rdiff -u -r1.236 -r1.237 src/sys/net/if.h
cvs rdiff -u -r1.240 -r1.241 src/sys/net/if_ethersubr.c
cvs rdiff -u -r1.34 -r1.35 src/sys/net/link_proto.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


!DSPAM:58e49ead273461966420671!




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
oops, I was wrong here, please disregard :-)


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote:
> @@ -803,32 +790,14 @@ tunread(dev_t dev, struct uio *uio, int 
>   goto out;
>   }
>   tp->tun_flags |= TUN_RWAIT;
> - if (mtsleep((void *)tp, PZERO|PCATCH|PNORELOCK,
> - "tunread", 0, &tp->tun_lock) != 0) {
> + if (cv_wait_sig(&tp->tun_cv, &tp->tun_lock)) {
>   error = EINTR;
> - goto out_nolock;
> - } else {
> - /*
> -  * Maybe the interface was destroyed while
> -  * we were sleeping, so let's ensure that
> -  * we're looking at the same (valid) tun
> -  * interface before looping.
> -  */
> - tp = tun_find_unit(dev);
> - if (tp == NULL) {
> - error = ENXIO;
> - goto out_nolock;
> - }
> - if (tp->tun_if.if_index != index) {
> - error = ENXIO;
> - goto out;
> - }
> + goto out;
>   }
>   }

If you goto out if tp is NULL, it will dereference it trying to
mutex_exit(&tp->tun_lock);


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
Hi,

On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote:
> @@ -534,14 +527,12 @@ tun_output(struct ifnet *ifp, struct mbu
>  const struct rtentry *rt)
>  {
>   struct tun_softc *tp = ifp->if_softc;
> - int s;
>   int error;
>  #if defined(INET) || defined(INET6)
>   int mlen;
>   uint32_t*af;
>  #endif
>  
> - s = splnet();
>   mutex_enter(&tp->tun_lock);
>   TUNDEBUG ("%s: tun_output\n", ifp->if_xname);
>  
> @@ -551,6 +542,8 @@ tun_output(struct ifnet *ifp, struct mbu
>   error = EHOSTDOWN;
>   goto out;
>   }
> + // XXXrmind
> + mutex_exit(&tp->tun_lock);
>  
>   /*
>* if the queueing discipline needs packet classification,
> @@ -576,7 +569,7 @@ tun_output(struct ifnet *ifp, struct mbu
>   error = ENOBUFS;
>   goto out;
>   }
> - bcopy(dst, mtod(m0, char *), dst->sa_len);
> + memcpy(mtod(m0, char *), dst, dst->sa_len);
>   }
>  
>   if (tp->tun_flags & TUN_IFHEAD) {
> @@ -617,9 +610,10 @@ tun_output(struct ifnet *ifp, struct mbu
>   goto out;
>   }
>  
> + mutex_enter(&tp->tun_lock);

you call goto out; with &tp->tun-lock not held, it then
mutex_exits it.


Re: CVS commit: src/sys/net

2016-06-27 Thread Kengo NAKAHARA
Hi,

On 2016/06/27 18:55, Roy Marples wrote:
> On 27/06/2016 09:58, Kengo NAKAHARA wrote:
>> Module Name: src
>> Committed By:knakahara
>> Date:Mon Jun 27 08:58:50 UTC 2016
>>
>> Modified Files:
>>  src/sys/net: if.c if.h
>>
>> Log Message:
>> reduce link state changing softint if it is not required
>>
>> ok by ozaki-r@n.o
> 
> There is a spelling mistake, an n is missing.
> 
> if_is_link_state_chageable(ifp)
> 
> should be
> 
> if_is_link_state_changeable(ifp)

Thank you for your point out. I fix it if.c:r1.347 and if.h:r1.217.


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2016-06-27 Thread Roy Marples
On 27/06/2016 09:58, Kengo NAKAHARA wrote:
> Module Name:  src
> Committed By: knakahara
> Date: Mon Jun 27 08:58:50 UTC 2016
> 
> Modified Files:
>   src/sys/net: if.c if.h
> 
> Log Message:
> reduce link state changing softint if it is not required
> 
> ok by ozaki-r@n.o

There is a spelling mistake, an n is missing.

if_is_link_state_chageable(ifp)

should be

if_is_link_state_changeable(ifp)

Roy


Re: CVS commit: src/sys/net

2016-04-11 Thread Taylor R Campbell
   Date: Mon, 11 Apr 2016 14:33:41 +0900
   From: Ryota Ozaki 

   On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell
wrote:
   > Nice, thanks for doing this!  Have you tried subjecting it to load,
   > with options DIAGNOSTIC?

   Yes. It passes ATF tests (that enable DIAGNOSTIC).

OK, great!  However, I wonder how much load they subject it to, if
they failed to catch the missing membar_producer in the list code
bfeore.

   >-   LIST_REMOVE(bif, bif_next);
   >+   PSLIST_WRITER_REMOVE(bif, bif_next);
   >+   PSLIST_ENTRY_DESTROY(bif, bif_next);
   >
   >BRIDGE_PSZ_PERFORM(sc);
   >
   > This order is incorrect.  You cannot destroy the entry until you have
   > confirmed all readers are done with it.  You must pserialize_perform
   > before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
   > page about this:
   >
   > [...]

   I see. (I should be claimed RTFM :-/)

I clarified some wording early on in the DESCRIPTION section of the
man page.  Please let me know if there's any verbiage in there that's
confusing or hard to follow!

   >@@ -972,7 +984,8 @@ retry:
   >}
   >
   >i = 0;
   >-   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
   >+   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   >+   bif_next) {
   >
   > PSLIST_WRITER_FOREACH

   Hm, so we need to choose a macro not what we do in the loop but how the loop
   is protected. Indeed in terms of use of membar.

Correct.  This is why PSLIST_READER_FOREACH is in the section `READER
OPERATIONS' of the man page, which is opened by `The following
operations may be performed on list heads and entries when the caller
is in a passively serialized read section.'.


Re: CVS commit: src/sys/net

2016-04-10 Thread Ryota Ozaki
On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell
 wrote:
>Date: Mon, 11 Apr 2016 02:04:14 +
>From: Ryota Ozaki 
>
>Module Name:src
>Committed By:   ozaki-r
>Date:   Mon Apr 11 02:04:14 UTC 2016
>
>Modified Files:
>src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h
>
>Log Message:
>Use pslist(9) in bridge(4)
>
>This adds missing memory barriers to list operations for pserialize.
>
> Nice, thanks for doing this!  Have you tried subjecting it to load,
> with options DIAGNOSTIC?

Yes. It passes ATF tests (that enable DIAGNOSTIC).

>
>Index: src/sys/net/bridgestp.c
>diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20
>--- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016
>+++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016
>@@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg
> {
>struct bridge_iflist *bif;
>
>-   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
>+   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
>+   bif_next) {
>
> I'm pretty sure everything in bridgestp.c runs under a writer lock, so
> you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH.
>
> (PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the
> statement of intent that this is in an exclusive write section, not a
> shared read section.)
>
>@@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc
>
>BRIDGE_LOCK(sc);
>
>-   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
>+   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
>+   bif_next) {
>
> For example, this one is definitely done under a writer lock!

Oh yes. I think I already forgot bridgestp.c at all!

>
>Index: src/sys/net/if_bridge.c
>diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112
>--- src/sys/net/if_bridge.c:1.111   Mon Mar 28 04:38:04 2016
>+++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016
>@@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp)
>bridge_stop(ifp, 1);
>
>BRIDGE_LOCK(sc);
>-   while ((bif = LIST_FIRST(&sc->sc_iflist)) != NULL)
>+   for (;;) {
>+   bif = PSLIST_WRITER_FIRST(&sc->sc_iflist, struct 
> bridge_iflist,
>+   bif_next);
>+   if (bif == NULL)
>+   break;
>bridge_delete_member(sc, bif);
>+   }
>+   PSLIST_DESTROY(&sc->sc_iflist);
>
> Any particular reason you switched from `while ((bif = ...) != NULL)'
> to `for (;;) { bif = ...; if (bif == NULL) break;'?   I find the while
> construct clearer, myself.

Just because it's too long (for me) to put it in the condition.

>
>@@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc
>ifs->if_bridge = NULL;
>ifs->if_bridgeif = NULL;
>
>-   LIST_REMOVE(bif, bif_next);
>+   PSLIST_WRITER_REMOVE(bif, bif_next);
>+   PSLIST_ENTRY_DESTROY(bif, bif_next);
>
>BRIDGE_PSZ_PERFORM(sc);
>
> This order is incorrect.  You cannot destroy the entry until you have
> confirmed all readers are done with it.  You must pserialize_perform
> before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
> page about this:
>
> `Either _element_ must never have been inserted into a list, or it
> must have been inserted and removed, and the caller must have waited
> for all parallel readers to finish reading it first.'
>
> The reason is that PSLIST_WRITER_REMOVE only changes the next pointer
> of the *previous* element so that no new readers can discover the
> current element, but leaves the next pointer of the *current* element
> intact so that readers that are still active can get to the next
> element.  Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the
> current element and kasserts that you removed the entry.

I see. (I should be claimed RTFM :-/)

>
> Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null
> poison pointer like QUEUEDEBUG does, so that anyone trying to use it
> afterward will actually crash instead of just thinking they hit the
> end of the list.
>
>@@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s
> retry:
>BRIDGE_LOCK(sc);
>count = 0;
>-   LIST_FOREACH(bif, &sc->sc_iflist, bif_next)
>+   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
>+   bif_next)
>
> Use PSLIST_WRITER_FOREACH here too.
>
>@@ -959,7 +970,8 @@ retry:
>BRIDGE_LOCK(sc);
>
>i = 0;
>-   LIST_FOREACH(bif, &sc->sc_iflist, bif_next)
>+   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
>+   bif_next)
>
> PSLIST_WRITER_FOREACH
>
>i++;
>if (i > count) {
>/*
>@@ -972,7 +984,8 

Re: CVS commit: src/sys/net

2016-04-10 Thread Taylor R Campbell
   Date: Mon, 11 Apr 2016 02:04:14 +
   From: Ryota Ozaki 

   Module Name:src
   Committed By:   ozaki-r
   Date:   Mon Apr 11 02:04:14 UTC 2016

   Modified Files:
   src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h

   Log Message:
   Use pslist(9) in bridge(4)

   This adds missing memory barriers to list operations for pserialize.

Nice, thanks for doing this!  Have you tried subjecting it to load,
with options DIAGNOSTIC?

   Index: src/sys/net/bridgestp.c
   diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20
   --- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016
   +++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016
   @@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg
{
   struct bridge_iflist *bif;

   -   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   +   bif_next) {

I'm pretty sure everything in bridgestp.c runs under a writer lock, so
you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH.

(PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the
statement of intent that this is in an exclusive write section, not a
shared read section.)

   @@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc 

   BRIDGE_LOCK(sc);

   -   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   +   bif_next) {

For example, this one is definitely done under a writer lock!

   Index: src/sys/net/if_bridge.c
   diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112
   --- src/sys/net/if_bridge.c:1.111   Mon Mar 28 04:38:04 2016
   +++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016
   @@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp)
   bridge_stop(ifp, 1);

   BRIDGE_LOCK(sc);
   -   while ((bif = LIST_FIRST(&sc->sc_iflist)) != NULL)
   +   for (;;) {
   +   bif = PSLIST_WRITER_FIRST(&sc->sc_iflist, struct 
bridge_iflist,
   +   bif_next);
   +   if (bif == NULL)
   +   break;
   bridge_delete_member(sc, bif);
   +   }
   +   PSLIST_DESTROY(&sc->sc_iflist);

Any particular reason you switched from `while ((bif = ...) != NULL)'
to `for (;;) { bif = ...; if (bif == NULL) break;'?   I find the while
construct clearer, myself.

   @@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc
   ifs->if_bridge = NULL;
   ifs->if_bridgeif = NULL;

   -   LIST_REMOVE(bif, bif_next);
   +   PSLIST_WRITER_REMOVE(bif, bif_next);
   +   PSLIST_ENTRY_DESTROY(bif, bif_next);

   BRIDGE_PSZ_PERFORM(sc);

This order is incorrect.  You cannot destroy the entry until you have
confirmed all readers are done with it.  You must pserialize_perform
before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
page about this:

`Either _element_ must never have been inserted into a list, or it
must have been inserted and removed, and the caller must have waited
for all parallel readers to finish reading it first.'

The reason is that PSLIST_WRITER_REMOVE only changes the next pointer
of the *previous* element so that no new readers can discover the
current element, but leaves the next pointer of the *current* element
intact so that readers that are still active can get to the next
element.  Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the
current element and kasserts that you removed the entry.

Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null
poison pointer like QUEUEDEBUG does, so that anyone trying to use it
afterward will actually crash instead of just thinking they hit the
end of the list.

   @@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s
retry:
   BRIDGE_LOCK(sc);
   count = 0;
   -   LIST_FOREACH(bif, &sc->sc_iflist, bif_next)
   +   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   +   bif_next)

Use PSLIST_WRITER_FOREACH here too.

   @@ -959,7 +970,8 @@ retry:
   BRIDGE_LOCK(sc);

   i = 0;
   -   LIST_FOREACH(bif, &sc->sc_iflist, bif_next)
   +   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   +   bif_next)

PSLIST_WRITER_FOREACH

   i++;
   if (i > count) {
   /*
   @@ -972,7 +984,8 @@ retry:
   }

   i = 0;
   -   LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, &sc->sc_iflist, struct bridge_iflist,
   +   bif_next) {

PSLIST_WRITER_FOREACH


re: CVS commit: src/sys/net

2016-04-06 Thread matthew green
> And add an assertion of if_addrlen and ll_addr.

thanks.  i almost did this myself :-)


Re: CVS commit: src/sys/net

2016-02-21 Thread Ryota Ozaki
On Fri, Feb 19, 2016 at 12:16 PM, Roy Marples  wrote:
> On 18/02/2016 11:29, Ryota Ozaki wrote:
>> Okay, I've changed my mind. Let's commit your patch anyway and
>> keep an eye on the lossage messages to know how often and how
>> many lossage happens. If there are issues, we can improve then.
>> It's not too late.
>
> The irony is that overnight I changed my mind also!
> The challenge was to achieve a longer queue in the same size or less.
> According to size(1), the end result of this patch is exactly the same
> size as my previous one :)
>
> This has been achieved with bitmask array macros to densely pack a 2 bit
> integer into a larger integer as a backing store.
> I decided to use a uint16_t which allows for 8 events in the queue.
> Simply changing the integral type will magically expand the queue, so it
> could be changed to a uint64_t to allow 32 events in the queue but this
> is probably overkill.
> For really size challenged systems, this could be changed to a uint8_t
> to allow 4 events in the queue ... which should still be big enough
> normally, but you wanted 10 so the closest match was 8.
>
> I've managed to keep the assignment of the link state to the interface
> within the initial if_link_state_change() call which my prior patch
> moved out.
>
> Also, only one queued event is processed at a time, subsequent ones have
> to wait for another softint(9) to occur.
> After all, the host may have >1 interface and we don't want to hog the
> network processing a larger queue.
>
>> I'm sorry for disturbing you.
>
> No, no. Thankyou for challenging me :)
>
> Maybe the macros here could be put to a more generic use (or there are
> already generic macros for this which I missed?) but unsure how right now.
>
> All being said, I learned a lot about bit shifting doing this and I
> might have made a mistake, but sample test cases for my macros appear to
> work fine. I would appreciate an OK from someone before I commit this.

Wow, I must be convinced by the new design :) I don't understand the macros
very much though, coverity would warn us if there are issues.

Thank you for your effort!
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-18 Thread Roy Marples
On 18/02/2016 11:29, Ryota Ozaki wrote:
> Okay, I've changed my mind. Let's commit your patch anyway and
> keep an eye on the lossage messages to know how often and how
> many lossage happens. If there are issues, we can improve then.
> It's not too late.

The irony is that overnight I changed my mind also!
The challenge was to achieve a longer queue in the same size or less.
According to size(1), the end result of this patch is exactly the same
size as my previous one :)

This has been achieved with bitmask array macros to densely pack a 2 bit
integer into a larger integer as a backing store.
I decided to use a uint16_t which allows for 8 events in the queue.
Simply changing the integral type will magically expand the queue, so it
could be changed to a uint64_t to allow 32 events in the queue but this
is probably overkill.
For really size challenged systems, this could be changed to a uint8_t
to allow 4 events in the queue ... which should still be big enough
normally, but you wanted 10 so the closest match was 8.

I've managed to keep the assignment of the link state to the interface
within the initial if_link_state_change() call which my prior patch
moved out.

Also, only one queued event is processed at a time, subsequent ones have
to wait for another softint(9) to occur.
After all, the host may have >1 interface and we don't want to hog the
network processing a larger queue.

> I'm sorry for disturbing you.

No, no. Thankyou for challenging me :)

Maybe the macros here could be put to a more generic use (or there are
already generic macros for this which I missed?) but unsure how right now.

All being said, I learned a lot about bit shifting doing this and I
might have made a mistake, but sample test cases for my macros appear to
work fine. I would appreciate an OK from someone before I commit this.

Roy
Index: sys/net/if.c
===
RCS file: /cvsroot/src/sys/net/if.c,v
retrieving revision 1.324
diff -u -r1.324 if.c
--- sys/net/if.c15 Feb 2016 08:08:04 -  1.324
+++ sys/net/if.c19 Feb 2016 03:13:45 -
@@ -603,7 +603,8 @@
ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */
 
ifp->if_link_state = LINK_STATE_UNKNOWN;
-   ifp->if_old_link_state = LINK_STATE_UNKNOWN;
+   ifp->if_link_queue = -1; /* all bits set, see link_state_change() */
+   ifp->if_link_queue_state = LINK_STATE_UNKNOWN;
 
ifp->if_capenable = 0;
ifp->if_csum_flags_tx = 0;
@@ -1563,51 +1564,141 @@
 }
 
 /*
- * Handle a change in the interface link state.
- * XXX: We should listen to the routing socket in-kernel rather
- * than calling in6_if_link_* functions directly from here.
+ * bitmask macros to manage a densely packed link_state change queue.
+ * Because we need to store LINK_STATE_UNKNOWN(0), LINK_STATE_DOWN(1) and
+ * LINK_STATE_UP(2) we need 2 bits for each state change.
+ * As a state change to store is 0, treat all bits set as an unset item.
+ */
+#define LQ_ITEM_BITS   2
+#define LQ_ITEM_MASK   ((1 << LQ_ITEM_BITS) - 1)
+#define LQ_MASK(i) (LQ_ITEM_MASK << (i) * LQ_ITEM_BITS)
+#define LINK_STATE_UNSET   LQ_ITEM_MASK
+#define LQ_ITEM(q, i)  (((q) & LQ_MASK((i))) >> (i) * LQ_ITEM_BITS)
+#define LQ_STORE(q, i, v)\
+   do {  \
+   (q) &= ~LQ_MASK((i)); \
+   (q) |= (v) << (i) * LQ_ITEM_BITS; \
+   } while (0 /* CONSTCOND */)
+#define LQ_MAX(q)  ((sizeof((q)) * NBBY) / LQ_ITEM_BITS)
+#define LQ_POP(q, v) \
+   do {  \
+   (v) = LQ_ITEM((q), 0);\
+   (q) >>= LQ_ITEM_BITS; \
+   (q) |= LINK_STATE_UNSET << (LQ_MAX((q)) - 1) * LQ_ITEM_BITS;  \
+   } while (0 /* CONSTCOND */)
+#define LQ_PUSH(q, v)\
+   do {  \
+   (q) >>= LQ_ITEM_BITS; \
+   (q) |= (v) << (LQ_MAX((q)) - 1) * LQ_ITEM_BITS;   \
+   } while (0 /* CONSTCOND */)
+#define LQ_FIND_UNSET(q, i)  \
+   for ((i) = 0; i < LQ_MAX((q)); (i)++) {   \
+   if (LQ_ITEM((q), (i)) == LINK_STATE_UNSET)\
+   break;\
+   }
+/*
+ * Handle a change in the interface link state and
+ * queue notifications.
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_

Re: CVS commit: src/sys/net

2016-02-18 Thread Ryota Ozaki
On Thu, Feb 18, 2016 at 12:08 PM, Ryota Ozaki  wrote:
> On Thu, Feb 18, 2016 at 12:18 AM, Roy Marples  wrote:
>> On 17/02/2016 01:46, Roy Marples wrote:
>>> Patch attached to solve change from a priority array into a bit mask
>>> approach where we swallow the whole queue in a softint.
>>
>> I did the same tests as noted in PR 50602 - put the system under load by
>> building clang from src and qt5 from pkgsrc at the same time, both with
>> -j3 on my Core2 @2.4Ghz dev machine.
>> The load was measured at about 9 by top, the machine itself was pretty
>> un-useable.
>>
>> It also has a wm(4) interface, which the reporter had.
>> Also an iwn(4) interface.
>> Toggling both as fast as possible - my daughter power cycled the AP
>> while I unplugged / replugged the ethernet cable.
>>
>> Doing this for about 10 minutes resulted in no lossage reported with my
>> patch and like state transitions happened perfectly.
>
> Good.
>
>>
>> I would imagine any lossage would occur as a result of a driver or
>> hardware defect, where more important information is likely found on the
>> console as well.
>
> IIUC amd64/i386 have the fast softint facility and it tries to schedule
> a softint just after hwintr finishes, and so more than one events are
> unlikely to come before the softint starts? Please someone correct
> me if my guess is wrong.

Okay, I've changed my mind. Let's commit your patch anyway and
keep an eye on the lossage messages to know how often and how
many lossage happens. If there are issues, we can improve then.
It's not too late.

I'm sorry for disturbing you.
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-17 Thread Masanobu SAITOH

On 2016/02/18 11:54, Ryota Ozaki wrote:

On Wed, Feb 17, 2016 at 9:12 PM, Roy Marples  wrote:

On 17/02/2016 09:02, Ryota Ozaki wrote:

So what events would you choose to skip, if not the scheme that Roy
described?


(I think I confused you, sorry...)

I rather want to not skip anything as much as possible
(except for repeating same events (e.g., up/up/up) because
keeping them all changes the original behavior).

I intend to skip/eliminate events only if there are too many
events happen in a short period (i.e., need queuing) to protect
the system from overloading. In that case (it's a very rare case
I think), we just drop an earliest event first.


How much is too many and what is a short period?


We can choose a number that applications unlikely to handle the events
(10 or so). A short period means a period between the first interrupt
for a link state event happens and a softint for link state changes
starts running.


Once you start skipping/eliminating events, how is your solution any
better? How do you measure some lossage vs some lossage?


Mine doesn't drop events if there are only a few events while yours
drops one event even if there are just two events.

I suppose that a few or several events can happen in "a short period"
easier than a dozen of events (or more) and the latter implies
some hardware troubles (or VMM defects?) and needs a special care
to protect the system, for example we give up delivering all
events. For the former, we shouldn't skip/eliminate events.



Also, we can't just drop the earliest event first - we have to ensure
that each state is left in the queue.
Consider starting in UP:
DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP

We cannot just discard the fact it went down because important events
attached to DOWN won't trigger.


We can preserve DOWN specially if we need.



Lastly, have we considered the system could be overloaded due to so many
link state change events? A longer queue or more complicated would only
make this worse.


Of course, I care and so accept dropping events, but do you really think
just two events cause overload?



 From an earlier post of yours:

Even if a UP state is transient, it's an event that may provide us a
hint of network conditions for diagnostic. We may be able to get it
from the console output, but it's not so convenient; we need to
track events via two different facilities.


If you're skipping/eliminating events as well then you would also need a
second facility to record this. Other than scribbling on the console,
what did you have in mind? Could this be used elsewhere in the system
where equvialent network assertations are recorded?


I don't plan to provide another facility to notify events (even if we
provide something, nobody wants to use it, I think). Yes, it's a
limitation that we cannot always provide full events, no objection on that.
But we can still tell that something bad is happening by sending a bunch
of events at once.

   ozaki-r


This discussion reminds me cardbus's Chattering suppression:

http://nxr.netbsd.org/xref/src/sys/dev/cardbus/cardslot.c#309
http://nxr.netbsd.org/xref/src/sys/dev/pci/pccbb.c#1053

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2016-02-17 Thread Ryota Ozaki
On Thu, Feb 18, 2016 at 12:18 AM, Roy Marples  wrote:
> On 17/02/2016 01:46, Roy Marples wrote:
>> Patch attached to solve change from a priority array into a bit mask
>> approach where we swallow the whole queue in a softint.
>
> I did the same tests as noted in PR 50602 - put the system under load by
> building clang from src and qt5 from pkgsrc at the same time, both with
> -j3 on my Core2 @2.4Ghz dev machine.
> The load was measured at about 9 by top, the machine itself was pretty
> un-useable.
>
> It also has a wm(4) interface, which the reporter had.
> Also an iwn(4) interface.
> Toggling both as fast as possible - my daughter power cycled the AP
> while I unplugged / replugged the ethernet cable.
>
> Doing this for about 10 minutes resulted in no lossage reported with my
> patch and like state transitions happened perfectly.

Good.

>
> I would imagine any lossage would occur as a result of a driver or
> hardware defect, where more important information is likely found on the
> console as well.

IIUC amd64/i386 have the fast softint facility and it tries to schedule
a softint just after hwintr finishes, and so more than one events are
unlikely to come before the softint starts? Please someone correct
me if my guess is wrong.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-17 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 9:12 PM, Roy Marples  wrote:
> On 17/02/2016 09:02, Ryota Ozaki wrote:
>>> So what events would you choose to skip, if not the scheme that Roy
>>> described?
>>
>> (I think I confused you, sorry...)
>>
>> I rather want to not skip anything as much as possible
>> (except for repeating same events (e.g., up/up/up) because
>> keeping them all changes the original behavior).
>>
>> I intend to skip/eliminate events only if there are too many
>> events happen in a short period (i.e., need queuing) to protect
>> the system from overloading. In that case (it's a very rare case
>> I think), we just drop an earliest event first.
>
> How much is too many and what is a short period?

We can choose a number that applications unlikely to handle the events
(10 or so). A short period means a period between the first interrupt
for a link state event happens and a softint for link state changes
starts running.

> Once you start skipping/eliminating events, how is your solution any
> better? How do you measure some lossage vs some lossage?

Mine doesn't drop events if there are only a few events while yours
drops one event even if there are just two events.

I suppose that a few or several events can happen in "a short period"
easier than a dozen of events (or more) and the latter implies
some hardware troubles (or VMM defects?) and needs a special care
to protect the system, for example we give up delivering all
events. For the former, we shouldn't skip/eliminate events.

>
> Also, we can't just drop the earliest event first - we have to ensure
> that each state is left in the queue.
> Consider starting in UP:
> DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP
>
> We cannot just discard the fact it went down because important events
> attached to DOWN won't trigger.

We can preserve DOWN specially if we need.

>
> Lastly, have we considered the system could be overloaded due to so many
> link state change events? A longer queue or more complicated would only
> make this worse.

Of course, I care and so accept dropping events, but do you really think
just two events cause overload?

>
> From an earlier post of yours:
>> Even if a UP state is transient, it's an event that may provide us a
>> hint of network conditions for diagnostic. We may be able to get it
>> from the console output, but it's not so convenient; we need to
>> track events via two different facilities.
>
> If you're skipping/eliminating events as well then you would also need a
> second facility to record this. Other than scribbling on the console,
> what did you have in mind? Could this be used elsewhere in the system
> where equvialent network assertations are recorded?

I don't plan to provide another facility to notify events (even if we
provide something, nobody wants to use it, I think). Yes, it's a
limitation that we cannot always provide full events, no objection on that.
But we can still tell that something bad is happening by sending a bunch
of events at once.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-17 Thread Roy Marples
On 17/02/2016 01:46, Roy Marples wrote:
> Patch attached to solve change from a priority array into a bit mask 
> approach where we swallow the whole queue in a softint.

I did the same tests as noted in PR 50602 - put the system under load by
building clang from src and qt5 from pkgsrc at the same time, both with
-j3 on my Core2 @2.4Ghz dev machine.
The load was measured at about 9 by top, the machine itself was pretty
un-useable.

It also has a wm(4) interface, which the reporter had.
Also an iwn(4) interface.
Toggling both as fast as possible - my daughter power cycled the AP
while I unplugged / replugged the ethernet cable.

Doing this for about 10 minutes resulted in no lossage reported with my
patch and like state transitions happened perfectly.

I would imagine any lossage would occur as a result of a driver or
hardware defect, where more important information is likely found on the
console as well.

Is there any more testing I can do here?

Roy


Re: CVS commit: src/sys/net

2016-02-17 Thread Roy Marples
On 17/02/2016 09:02, Ryota Ozaki wrote:
>> So what events would you choose to skip, if not the scheme that Roy
>> described?
> 
> (I think I confused you, sorry...)
> 
> I rather want to not skip anything as much as possible
> (except for repeating same events (e.g., up/up/up) because
> keeping them all changes the original behavior).
> 
> I intend to skip/eliminate events only if there are too many
> events happen in a short period (i.e., need queuing) to protect
> the system from overloading. In that case (it's a very rare case
> I think), we just drop an earliest event first.

How much is too many and what is a short period?
Once you start skipping/eliminating events, how is your solution any
better? How do you measure some lossage vs some lossage?

Also, we can't just drop the earliest event first - we have to ensure
that each state is left in the queue.
Consider starting in UP:
DOWN/UNKNOWN/UP/UNKNOWN/UP/UNKNOWN/UP

We cannot just discard the fact it went down because important events
attached to DOWN won't trigger.

Lastly, have we considered the system could be overloaded due to so many
link state change events? A longer queue or more complicated would only
make this worse.

>From an earlier post of yours:
> Even if a UP state is transient, it's an event that may provide us a 
> hint of network conditions for diagnostic. We may be able to get it 
> from the console output, but it's not so convenient; we need to
> track events via two different facilities.

If you're skipping/eliminating events as well then you would also need a
second facility to record this. Other than scribbling on the console,
what did you have in mind? Could this be used elsewhere in the system
where equvialent network assertations are recorded?

Roy


Re: CVS commit: src/sys/net

2016-02-17 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 4:52 PM, Taylor R Campbell
 wrote:
>Date: Wed, 17 Feb 2016 15:12:31 +0900
>From: Ryota Ozaki 
>
>On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell
> wrote:
>> Note that this queueing takes effect only if the link state changes
>> multiple times within maybe a few microseconds, before the softint can
>> run.  If your link state is changing that many times so quickly,
>> losing an event or two is probably the least of your worries
>
>Yeah, it's nitpicking, but for that reason, I think it's better to pass
>events as they are to userland.
>
>> -- but
>> you're probably more interested in seeing something like ...down...up
>> than ...up/up/up.
>
>Yes. (up/up/up events are eliminated in the first place though.)
>
> So what events would you choose to skip, if not the scheme that Roy
> described?

(I think I confused you, sorry...)

I rather want to not skip anything as much as possible
(except for repeating same events (e.g., up/up/up) because
keeping them all changes the original behavior).

I intend to skip/eliminate events only if there are too many
events happen in a short period (i.e., need queuing) to protect
the system from overloading. In that case (it's a very rare case
I think), we just drop an earliest event first.

>
> I bet that, whatever events you would choose to skip, we can still
> prove that the resulting queues need be no longer than, say, three
> elements, and we'd still usefully report link flapping to userland --
> as long as we can make enough progress to run softints and userland
> processes at all.
>
> Here are some example reductions that intuitively sound reasonable to
> me:
>
> down/down = down
> up/up = up
> down/unknown/up = down/up
> down/up/down = down
>
> What other sequences of events would you simplify?

As I said, I want to keep the sequences (except for repeating same events)
so the last two examples will be sent as they are.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 15:12:31 +0900
   From: Ryota Ozaki 

   On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell
wrote:
   > Note that this queueing takes effect only if the link state changes
   > multiple times within maybe a few microseconds, before the softint can
   > run.  If your link state is changing that many times so quickly,
   > losing an event or two is probably the least of your worries

   Yeah, it's nitpicking, but for that reason, I think it's better to pass
   events as they are to userland.

   > -- but
   > you're probably more interested in seeing something like ...down...up
   > than ...up/up/up.

   Yes. (up/up/up events are eliminated in the first place though.)

So what events would you choose to skip, if not the scheme that Roy
described?

I bet that, whatever events you would choose to skip, we can still
prove that the resulting queues need be no longer than, say, three
elements, and we'd still usefully report link flapping to userland --
as long as we can make enough progress to run softints and userland
processes at all.

Here are some example reductions that intuitively sound reasonable to
me:

down/down = down
up/up = up
down/unknown/up = down/up
down/up/down = down

What other sequences of events would you simplify?


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell
 wrote:
>Date: Wed, 17 Feb 2016 11:49:48 +0900
>From: Ryota Ozaki 
>
>And the priority provides asymmetric event deliveries; when the state
>repeats up and down, a down event is delivered if the final state is down
>while a down event and a up event are delivered if the final state is up.
>It's confusable to me.
>
> This is the part I'm still not sure about: Roy's patch reduces up/down
> to just down, but leaves down/up alone.  I'm guessing that Roy expects
> applications like dhcpcd to want to clear some state if the link ever
> goes down, but to be uninterested in knowing that the link went up for
> a microsecond and then back down again.
>
>Can we pass events as they are as much as possible? I don't complain that
>event reductions in the kernel, but I think it should be down based on time
>series manner (e.g., pick latest three events), not based on some priority
>things. If we accept event reductions, we can do it with bit-encoding
>(w/o a linked list (memory allocations)), for example represent the state
>as 2 bits and encode event series into a variable (say 16 events in int).
>
> Note that this queueing takes effect only if the link state changes
> multiple times within maybe a few microseconds, before the softint can
> run.  If your link state is changing that many times so quickly,
> losing an event or two is probably the least of your worries

Yeah, it's nitpicking, but for that reason, I think it's better to pass
events as they are to userland.

> -- but
> you're probably more interested in seeing something like ...down...up
> than ...up/up/up.

Yes. (up/up/up events are eliminated in the first place though.)

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 1:40 PM, Roy Marples  wrote:
> On 2016-02-17 02:49, Ryota Ozaki wrote:
>>>
>>> If you don't read the patch, here is the comment I added to the
>>> if_link_state_change() function:
>>>
>>> * Queue a change in the interface link state.
>>> *
>>> * The queue itself is very limited:
>>> *   no state can be queued more than once
>>> *   a higher priority state will remove lower priority states
>>> *
>>> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
>>> * Rationale is that if the kernel can't process the link state change
>>> queue
>>> * fast enough then userland has no chance of catching up.
>>> * Any lossage is logged to the console.
>>> * The queue state priority is ordered so that link state aware programs
>>> * will still have the correct end result regardless of any lossage.
>>>
>>> Any comments or objections?
>>
>>
>> I worry about the priority order, where it comes from? I feel that
>> the kernel does too much decision; which event is important. I think
>> it depends on applications.
>
>
> Fair enough.
> Can you state a use case for an application needing the full ordering?

Event tracking of link states for example.

>
> Let us be more specific
> UP: DOWN - UP
> Is passed fully
> DOWN: UP - DOWN
> UP is not passed to userland.
> What do you want to achieve in userland during the UP when the kernel
> already considers the link DOWN?

Even if a UP state is transient, it's an event that may provide us a
hint of network conditions for diagnostic. We may be able to get it
from the console output, but it's not so convenient; we need to track
events via two different facilities.

>
>> And the priority provides asymmetric event deliveries; when the state
>> repeats up and down, a down event is delivered if the final state is down
>> while a down event and a up event are delivered if the final state is up.
>> It's confusable to me.
>
>
> It's not that confusing :)
> If you're an application, what benefit do you have of processing an UP state
> on the link when a DOWN state is about to follow?
> Aside from logging it (which we still do, just not via a route message just
> a console diagnostic) what would you do you with it?
> Because a DOWN state is coming, it cannot possibly do anything over the
> interface, so why bother?

Applications can ignore the UP event. IMO we should delegate such a decision
to applications because we don't know all requirements of all applications.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 11:49:48 +0900
   From: Ryota Ozaki 

   And the priority provides asymmetric event deliveries; when the state
   repeats up and down, a down event is delivered if the final state is down
   while a down event and a up event are delivered if the final state is up.
   It's confusable to me.

This is the part I'm still not sure about: Roy's patch reduces up/down
to just down, but leaves down/up alone.  I'm guessing that Roy expects
applications like dhcpcd to want to clear some state if the link ever
goes down, but to be uninterested in knowing that the link went up for
a microsecond and then back down again.

   Can we pass events as they are as much as possible? I don't complain that
   event reductions in the kernel, but I think it should be down based on time
   series manner (e.g., pick latest three events), not based on some priority
   things. If we accept event reductions, we can do it with bit-encoding
   (w/o a linked list (memory allocations)), for example represent the state
   as 2 bits and encode event series into a variable (say 16 events in int).

Note that this queueing takes effect only if the link state changes
multiple times within maybe a few microseconds, before the softint can
run.  If your link state is changing that many times so quickly,
losing an event or two is probably the least of your worries -- but
you're probably more interested in seeing something like ...down...up
than ...up/up/up.


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples

On 2016-02-17 02:49, Ryota Ozaki wrote:

If you don't read the patch, here is the comment I added to the
if_link_state_change() function:

* Queue a change in the interface link state.
*
* The queue itself is very limited:
*   no state can be queued more than once
*   a higher priority state will remove lower priority states
*
* The queue state priority in descending order is DOWN, UNKNOWN, UP.
* Rationale is that if the kernel can't process the link state change 
queue

* fast enough then userland has no chance of catching up.
* Any lossage is logged to the console.
* The queue state priority is ordered so that link state aware 
programs

* will still have the correct end result regardless of any lossage.

Any comments or objections?


I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.


Fair enough.
Can you state a use case for an application needing the full ordering?

Let us be more specific
UP: DOWN - UP
Is passed fully
DOWN: UP - DOWN
UP is not passed to userland.
What do you want to achieve in userland during the UP when the kernel 
already considers the link DOWN?



And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is 
down
while a down event and a up event are delivered if the final state is 
up.

It's confusable to me.


It's not that confusing :)
If you're an application, what benefit do you have of processing an UP 
state on the link when a DOWN state is about to follow?
Aside from logging it (which we still do, just not via a route message 
just a console diagnostic) what would you do you with it?
Because a DOWN state is coming, it cannot possibly do anything over the 
interface, so why bother?


Can we pass events as they are as much as possible? I don't complain 
that
event reductions in the kernel, but I think it should be down based on 
time
series manner (e.g., pick latest three events), not based on some 
priority

things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the 
state
as 2 bits and encode event series into a variable (say 16 events in 
int).


Of course we could.
But until we understand a reason why this is needed why should we?

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 10:46 AM, Roy Marples  wrote:
> On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote:
>> Except for an issue with queueing discussed privately (scheduling a
>> softint that is already scheduled won't cause it to run again, so
>> if_link_state_change_si needs to process the whole queue in one go),
>> that approach looks fine to me, although as we also discussed
>> privately we can easily compact it into a three-bit mask with a
>> trivial update instead of a whole array of states.
>
> Patch attached to solve change from a priority array into a bit mask approach
> where we swallow the whole queue in a softint.
> Thanks to Taylor for some reviews and suggestions.
>
>> This has the consequence that if the link goes up/down in quick
>> succession, and then up/down in quick succession, &c., with the queue
>> processed after each up/down transition, userland will never be
>> notified.  However, if the link goes down/up, then down/up, &c., the
>> userland will be notified of all the transitions.  Roy claims that
>> that's OK, and I'm inclined to believe the author of dhcpcd about
>> this.
>
> If you don't read the patch, here is the comment I added to the
> if_link_state_change() function:
>
> * Queue a change in the interface link state.
> *
> * The queue itself is very limited:
> *   no state can be queued more than once
> *   a higher priority state will remove lower priority states
> *
> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
> * Rationale is that if the kernel can't process the link state change queue
> * fast enough then userland has no chance of catching up.
> * Any lossage is logged to the console.
> * The queue state priority is ordered so that link state aware programs
> * will still have the correct end result regardless of any lossage.
>
> Any comments or objections?

I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.

And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is down
while a down event and a up event are delivered if the final state is up.
It's confusable to me.

Can we pass events as they are as much as possible? I don't complain that
event reductions in the kernel, but I think it should be down based on time
series manner (e.g., pick latest three events), not based on some priority
things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the state
as 2 bits and encode event series into a variable (say 16 events in int).

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote:
> Except for an issue with queueing discussed privately (scheduling a
> softint that is already scheduled won't cause it to run again, so
> if_link_state_change_si needs to process the whole queue in one go),
> that approach looks fine to me, although as we also discussed
> privately we can easily compact it into a three-bit mask with a
> trivial update instead of a whole array of states.

Patch attached to solve change from a priority array into a bit mask approach 
where we swallow the whole queue in a softint.
Thanks to Taylor for some reviews and suggestions.

> This has the consequence that if the link goes up/down in quick
> succession, and then up/down in quick succession, &c., with the queue
> processed after each up/down transition, userland will never be
> notified.  However, if the link goes down/up, then down/up, &c., the
> userland will be notified of all the transitions.  Roy claims that
> that's OK, and I'm inclined to believe the author of dhcpcd about
> this.

If you don't read the patch, here is the comment I added to the
if_link_state_change() function:

* Queue a change in the interface link state.
*
* The queue itself is very limited:
*   no state can be queued more than once
*   a higher priority state will remove lower priority states
*
* The queue state priority in descending order is DOWN, UNKNOWN, UP.
* Rationale is that if the kernel can't process the link state change queue
* fast enough then userland has no chance of catching up.
* Any lossage is logged to the console.
* The queue state priority is ordered so that link state aware programs
* will still have the correct end result regardless of any lossage.

Any comments or objections?

RoyIndex: sys/net/if.c
===
RCS file: /cvsroot/src/sys/net/if.c,v
retrieving revision 1.324
diff -u -r1.324 if.c
--- sys/net/if.c	15 Feb 2016 08:08:04 -	1.324
+++ sys/net/if.c	17 Feb 2016 01:36:07 -
@@ -603,7 +603,7 @@
 	ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */
 
 	ifp->if_link_state = LINK_STATE_UNKNOWN;
-	ifp->if_old_link_state = LINK_STATE_UNKNOWN;
+	ifp->if_link_queue = 0;
 
 	ifp->if_capenable = 0;
 	ifp->if_csum_flags_tx = 0;
@@ -1563,9 +1563,18 @@
 }
 
 /*
- * Handle a change in the interface link state.
- * XXX: We should listen to the routing socket in-kernel rather
- * than calling in6_if_link_* functions directly from here.
+ * Queue a change in the interface link state.
+ *
+ * The queue itself is very limited:
+ *   no state can be queued more than once
+ *   a higher priority state will remove lower priority states
+ *
+ * The queue state priority in descending order is DOWN, UNKNOWN, UP.
+ * Rationale is that if the kernel can't process the link state change queue
+ * fast enough then userland has no chance of catching up.
+ * The queue state priority is ordered so that link state aware programs
+ * will still have the correct end result regardless of any lossage.
+ * Any lossage is logged to the console.
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_state)
@@ -1573,30 +1582,68 @@
 	int s;
 
 	s = splnet();
-	if (ifp->if_link_state == link_state) {
-		splx(s);
-		return;
+
+	/* If state is the same and there is no current queue, do nothing. */
+	if (ifp->if_link_state == link_state && ifp->if_link_queue == 0x00)
+		goto out;
+
+#define LINK_STATE_LOST "%s: lost %s notification\n"
+	switch (link_state) {
+	case LINK_STATE_DOWN:
+		if (isset(&ifp->if_link_queue, LINK_STATE_UP)) {
+			printf(LINK_STATE_LOST,
+			ifp->if_xname, "LINK_STATE_UP");
+			clrbit(&ifp->if_link_queue, LINK_STATE_UP);
+		}
+		if (isset(&ifp->if_link_queue, LINK_STATE_UNKNOWN)) {
+			printf(LINK_STATE_LOST,
+			ifp->if_xname, "LINK_STATE_UNKNOWN");
+			clrbit(&ifp->if_link_queue, LINK_STATE_UNKNOWN);
+		}
+		setbit(&ifp->if_link_queue, LINK_STATE_DOWN);
+		break;
+	case LINK_STATE_UNKNOWN:
+		if (isset(&ifp->if_link_queue, LINK_STATE_UP)) {
+			printf(LINK_STATE_LOST,
+			ifp->if_xname, "LINK_STATE_UP");
+			clrbit(&ifp->if_link_queue, LINK_STATE_UP);
+		}
+		setbit(&ifp->if_link_queue, LINK_STATE_UNKNOWN);
+		break;
+	case LINK_STATE_UP:
+		setbit(&ifp->if_link_queue, LINK_STATE_UP);
+		break;
+	default:
+#ifdef DEBUG
+		printf("%s: invalid link state %d\n",
+		ifp->if_xname, link_state);
+#endif
+		goto out;
 	}
+#undef LINK_STATE_LOST
 
-	ifp->if_link_state = link_state;
 	softint_schedule(ifp->if_link_si);
 
+out:
 	splx(s);
 }
 
-
+/*
+ * Handle a change in the interface link state.
+ * Must be called at splnet().
+ */
 static void
-if_link_state_change_si(void *arg)
+if_link_state_change0(struct ifnet *ifp, int link_state)
 {
-	struct ifnet *ifp = arg;
-	int s;
-	int link_state, old_link_state;
+	int old_link_state;
 	struct domain *dp;
 
-	s = splnet();
-	link_state = ifp->if_link_state;
-	old_link_state = ifp->if_old_link_state;
-	ifp->if_old_link_state = ifp->if_l

Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote:
> This has the consequence that if the link goes up/down in quick
> succession, and then up/down in quick succession, &c., with the queue
> processed after each up/down transition, userland will never be
> notified.  However, if the link goes down/up, then down/up, &c., the
> userland will be notified of all the transitions.  Roy claims that
> that's OK, and I'm inclined to believe the author of dhcpcd about
> this.

For the up/down - process - up/down - process in quick succession we can print 
a diagnostic to the console if the up is trimmed.

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 13:24:15 +
   From: Roy Marples 

   I found the time to work on this, here is the patch I've been running
   for an hour or so upping and downing interfaces.

The rule in Roy's patch is that each new state changes kicks out all
higher-priority transitions, and is queued after all lower-priority
transitions, with priorities DOWN = -1, UNKNOWN = 0, UP = 1.

This has the effect that each queue has the regular expression

down? unknown? up?

with the following equivalences:

down/down = down
down/unknown = down/unknown
down/up = down/up
unknown/down = down
unknown/unknown = unknown
unknown/up = unknown/up
up/down = down
up/unknown = unknown
up/up = up

Except for an issue with queueing discussed privately (scheduling a
softint that is already scheduled won't cause it to run again, so
if_link_state_change_si needs to process the whole queue in one go),
that approach looks fine to me, although as we also discussed
privately we can easily compact it into a three-bit mask with a
trivial update instead of a whole array of states.

This has the consequence that if the link goes up/down in quick
succession, and then up/down in quick succession, &c., with the queue
processed after each up/down transition, userland will never be
notified.  However, if the link goes down/up, then down/up, &c., the
userland will be notified of all the transitions.  Roy claims that
that's OK, and I'm inclined to believe the author of dhcpcd about
this.


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 08:38:55 +
   From: Roy Marples 

   The queue can be kept quite short because we don't care about any prior 
   entries each time state changes to LINK_STATE_DOWN.
   Also, we don't care about any prior entries (other than LINK_STATE_DOWN) 
when 
   setting LINK_STATE_UNKNOWN.
   Finally we set LINK_STATE_UP.

   So the queue should only ever hold a maximum of 3 items. Could even work it 
   like so
   [...]
   Thoughts?

Sounds better than my suggestion, from someone who knows better how
the link state events are used!  Maybe I should have looked at the
list of valid states, too -- all three of them.

We don't even need an array, really.  If I understand correctly, we
obviously have the following equivalences of queues:

(1) unknown/unknown = unknown
(2) unknown/down= down
(3) unknown/up  = down/up
(4)down/unknown = down/unknown
(5)down/down= down
(6)down/up  = down/up
(7)  up/unknown = up/unknown
(8)  up/down= up/down
(9)  up/up  = up

I can reasonably believe (10) up/down/up = down/up and (11)
down/up/down = down, too.  From these, we can reduce almost all
3-queues to 2-queues:

unknown/unknown/unknown =(1) unknown/unknown =(1) unknown
unknown/unknown/down =(1) unknown/down =(2) down
unknown/unknown/up =(3) unknown/up =(2) down/up
unknown/down/unknown =(2) down/unknown
unknown/down/down =(2) down/down =(5) down
unknown/down/up =(2) down/down/up =(5) down/up
unknown/up/unknown =(3) down/up/unknown ?
unknown/up/down =(3) down/up/down =(11) down
unknown/up/up =(3) down/up/up =(9) down/up
down/unknown/unknown =(1) down/unknown
down/unknown/down =(2) down/down =(5) down
down/unknown/up =(3) down/down/up =(5) down/up
down/down/unknown =(5) down/unknown
down/down/down =(5) down/down =(5) down
down/down/up =(5) down/up
down/up/unknown ?
down/up/down =(11) down
down/up/up =(9) down/up
up/unknown/unknown =(1) up/unknown
up/unknown/down =(2) up/down
up/unknown/up = up/down/up =(10) down/up
up/down/unknown ?
up/down/down =(5) up/down
up/down/up =(10) down/up
up/up/unknown = up/unknown
up/up/down = up/down
up/up/up = up/up = up

That leaves us with only two cases where a 3-queue stays a 3-queue and
doesn't reduce to a 2-queue.  For those cases, the possible 4-queues
are:

down/up/unknown/unknown =(1) down/up/unknown
down/up/unknown/down =(2) down/up/down =(11) down
down/up/unknown/up =(3) down/up/down/up =(10) down/up
up/down/unknown/unknown =(1) up/down/unknown
up/down/unknown/down =(2) up/down/down =(5) up/down
up/down/unknown/up =(3) up/down/down/up =(5) up/down/up =(10) down/up

Collecting all the possible queues, we get only nine possibilities:

down
down/unknown
down/up
down/up/unknown
unknown
up
up/down
up/down/unknown
up/unknown

We could store it as a single int (numbered densely, or by 1 <<
LINK_STATE_*) and write the queue-append operation all as one switch.

Perhaps there are more equivalences I didn't capture, or cases that
are never going to occur -- for example, do we ever actually
transition from up/down to unknown, or does unknown only happen when
the interface is first attached?  (I'm not sure.)


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 16:30:54 +0900
   From: Ryota Ozaki 

   On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell
wrote:
   > void
   > if_link_state_change(struct ifnet *ifp, int link_state)
   > {
   > int s;
   >
   > s = splnet();
   > if (ifp->if_link_state_pending == link_state)
   > goto out;
   >
   > if (ifp->if_link_state_pending != ifp->if_link_state) {

   What does the conditional intend?

I don't remember!  It doesn't make sense now that I look at it again.
This was just a sketch -- obviously it was missing some parts, such as
pool_cache_put, and some parts seem to make no sense.  I would just
take it out.

   Anyway I made a patch based on the above code:
 http://www.netbsd.org/~ozaki-r/link_state_change.diff

   What I did are to make your code compilable and workable,
   and limit the number of pending events.

Looks reasonable to me.  We should have some automatic tests for all
this, though!


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
> On Tuesday 16 February 2016 13:37:28 you wrote:
>>> Oh, I see. We shouldn't drop any events of link state changes.
>>
>> i don't see any point in keeping a huge series of link changes
>> if they can be collapsed into an equivalent sequence.  with VMs
>> and other user-controlled systems it's possible to flood such a
>> link state checking engine.
> 
> The queue can be kept quite short because we don't care about any prior 
> entries each time state changes to LINK_STATE_DOWN.
> Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
> setting LINK_STATE_UNKNOWN.
> Finally we set LINK_STATE_UP.
> 
> So the queue should only ever hold a maximum of 3 items. Could even work it 
> like so

I found the time to work on this, here is the patch I've been running
for an hour or so upping and downing interfaces.

OK to commit?

Roy
Index: sys/net/if.c
===
RCS file: /cvsroot/src/sys/net/if.c,v
retrieving revision 1.324
diff -u -r1.324 if.c
--- sys/net/if.c15 Feb 2016 08:08:04 -  1.324
+++ sys/net/if.c16 Feb 2016 13:22:43 -
@@ -603,7 +603,6 @@
ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */
 
ifp->if_link_state = LINK_STATE_UNKNOWN;
-   ifp->if_old_link_state = LINK_STATE_UNKNOWN;
 
ifp->if_capenable = 0;
ifp->if_csum_flags_tx = 0;
@@ -1562,41 +1561,73 @@
}
 }
 
+/* Priority list for link state changes */
+static const int
+if_link_state_prio[LINK_STATE_MAX] = {
+   0,  /* LINK_STATE_UNKNOWN */
+   -1, /* LINK_STATE_DOWN */
+   1,  /* LINK_STATE_UP */
+};
+
 /*
  * Handle a change in the interface link state.
- * XXX: We should listen to the routing socket in-kernel rather
- * than calling in6_if_link_* functions directly from here.
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_state)
 {
-   int s;
+   int s, i;
+
+   KASSERT(link_state >= 0 && link_state < LINK_STATE_MAX);
 
s = splnet();
-   if (ifp->if_link_state == link_state) {
-   splx(s);
-   return;
+   if (ifp->if_link_state == link_state && ifp->if_link_queue_len == 0)
+   goto out;
+
+   for (i = 0; i < ifp->if_link_queue_len; i++) {
+   if (if_link_state_prio[link_state] <=
+   if_link_state_prio[ifp->if_link_queue[i]])
+   {
+   ifp->if_link_queue[i] = link_state;
+   ifp->if_link_queue_len = i + 1;
+   /* Because we are trimming the queue,
+* there is no need to call softint_schedule again. */
+   goto out;
+   }
}
 
-   ifp->if_link_state = link_state;
+   KASSERT(ifp->if_link_queue_len + 1 < __arraycount(ifp->if_link_queue));
+   ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_si);
 
+out:
splx(s);
 }
 
-
 static void
 if_link_state_change_si(void *arg)
 {
struct ifnet *ifp = arg;
-   int s;
+   int s, i;
int link_state, old_link_state;
struct domain *dp;
 
s = splnet();
-   link_state = ifp->if_link_state;
-   old_link_state = ifp->if_old_link_state;
-   ifp->if_old_link_state = ifp->if_link_state;
+   KASSERT(ifp->if_link_queue_len >= 0); /* unlikely */
+   if (ifp->if_link_queue_len == 0)
+   goto out;
+
+   /* Pop the link state change from the queue */
+   link_state = ifp->if_link_queue[0];
+   ifp->if_link_queue_len--;
+   for (i = 0; i < ifp->if_link_queue_len; i++)
+   ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];
+
+   /* Ensure link state is actually changing */
+   if (ifp->if_link_state == link_state)
+   goto out;
+
+   old_link_state = ifp->if_link_state;
+   ifp->if_link_state = link_state;
 
 #ifdef DEBUG
log(LOG_DEBUG, "%s: link state %s (was %s)\n", ifp->if_xname,
@@ -1640,6 +1671,7 @@
dp->dom_if_link_state_change(ifp, link_state);
}
 
+out:
splx(s);
 }
 
Index: sys/net/if.h
===
RCS file: /cvsroot/src/sys/net/if.h,v
retrieving revision 1.197
diff -u -r1.197 if.h
--- sys/net/if.h16 Feb 2016 01:31:26 -  1.197
+++ sys/net/if.h16 Feb 2016 13:22:43 -
@@ -191,10 +191,12 @@
 
 /*
  * Values for if_link_state.
+ * When changing these values, consider if_link_state_prio in if.c.
  */
 #defineLINK_STATE_UNKNOWN  0   /* link invalid/unknown */
 #defineLINK_STATE_DOWN 1   /* link is down */
 #defineLINK_STATE_UP   2   /* link is up */
+#defineLINK_STATE_MAX  3   /* size of link states */
 
 /*
  * Structure defining a queue for a network interface.
@@ 

Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
>   for (i = 0; i < ifp->if_link_queue_len; i++) {
>   if (link_state <= ifp->if_link_queue[i]) {
>   ifp->if_link_queue[i] = link_state;
>   ifp->if_link_queue_len = i;

This should be ifp->if_link_queue_len = i + 1;

>   /* Because we are trimming the queue,
>* there is no need to call softint_schedule again. */
>   goto out;
>   }
>   }

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On Tuesday 16 February 2016 13:37:28 you wrote:
> > Oh, I see. We shouldn't drop any events of link state changes.
> 
> i don't see any point in keeping a huge series of link changes
> if they can be collapsed into an equivalent sequence.  with VMs
> and other user-controlled systems it's possible to flood such a
> link state checking engine.

The queue can be kept quite short because we don't care about any prior 
entries each time state changes to LINK_STATE_DOWN.
Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
setting LINK_STATE_UNKNOWN.
Finally we set LINK_STATE_UP.

So the queue should only ever hold a maximum of 3 items. Could even work it 
like so

void
if_link_state_change(struct ifnet *ifp, int link_state)
{
int s, i;

s = splnet();

if (ifp->if_link_queue_len == 0 && ifp->link_state == link_state)
goto out;

for (i = 0; i < ifp->if_link_queue_len; i++) {
if (link_state <= ifp->if_link_queue[i]) {
ifp->if_link_queue[i] = link_state;
ifp->if_link_queue_len = i;
/* Because we are trimming the queue,
 * there is no need to call softint_schedule again. */
goto out;
}
}

/* Add bounds check here for sanity */
ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_state_softint);

out:
splx(s);
}


static void
if_link_state_change_si(void *arg)
{
int s, link_state, i;

s = splnet();
if (ifp->if_link_queue_len == 0)
goto out;

link_state = ifp->if_link_queue[0];
ifp->if_link_queue_len--;
for (i = 0; i < ifp->link_queue_len; i++)
ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];

/* process state change as normal */

out:
splx(s);
}

Thoughts?

Roy


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell
 wrote:
>Date: Mon, 15 Feb 2016 11:06:52 +
>From: Roy Marples 
>
>On 15/02/2016 10:32, Ryota Ozaki wrote:
>> I think we can fix by returning from if_link_state_change_si
>> if ifp->if_link_state == ifp->if_old_link_state. Is it correct?
>
>Then we're not doing what we should because the protocol actions may not
>fire.
>
>Can we not extend softint_schedule to take some user data? That way,
>each call (which I'm assuming would be sequential) can carry the link
>state change in the user data which can then be applied to the interface.
>
> Changing softint is not really a good idea -- it is a significant
> semantic change that most current users don't require, because they
> already have existing queue mechanisms, e.g. pktq.
>
> If it is necessary to report every link state transition, then we need
> to create a queueing mechanism -- and possibly drop some transitions
> in the case of memory shortage, as we already do in route_enqueue.
>
> Maybe something like this, with a queue and a link_state_pending field
> to avoid creating extra queue entries unnecessarily but guaranteeing
> that the most recent link state change will take effect, and be
> preceded by LINK_STATE_DOWN if anything was lost:

Oops. Our (IIJ) local changes have a similar code (softint + queuing)
and it works for long time. So I concur the design.

>
> struct link_state_change {
> SIMPLEQ_ENTRY(link_state_change)lsc_entry;
> int lsc_state;
> boollsc_lost;
> };
>
> static pool_cache_t link_state_change_pc __read_mostly;
>
> void
> if_link_state_change(struct ifnet *ifp, int link_state)
> {
> int s;
>
> s = splnet();
> if (ifp->if_link_state_pending == link_state)
> goto out;
>
> if (ifp->if_link_state_pending != ifp->if_link_state) {

What does the conditional intend?

> struct link_state_change *lsc;
>
> lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT);
> if (lsc == NULL) {
> ifp->if_link_state_lost = true;
> goto change;
> }
>
> lsc->lsc_state = link_state;
> lsc->lsc_lost = ifp->if_link_state_lost;
> SIMPLEQ_INSERT_TAIL(&ifp->if_link_state_changes, lsc,
> lsc_entry);
> }
>
> ifp->if_link_state_lost = false;
> change: ifp->if_link_state_pending = link_state;
> softint_schedule(ifp->if_link_state_softint);
> out:splx(s);
> }
>
> static void
> if_link_state_change_softintr(void *cookie)
> {
> struct ifnet *ifp = cookie;
> struct link_state_change *lsc;
> int s;
>
> s = splnet();
> while (!SIMPLEQ_EMPTY(&ifp->if_link_state_changes)) {
> lsc = SIMPLEQ_HEAD(&ifp->if_link_state_changes);
> SIMPLEQ_REMOVE_HEAD(&ifp->if_link_state_changes, lsc_entry);
> if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost);
> }
>
> if (ifp->if_link_state_pending != ifp->if_link_state)
> if_link_state_change_1(ifp, ifp->if_link_state_pending,
> ifp->if_link_state_lost);
> ifp->if_link_state_lost = false;
> splx(s);
> }
>
> static void
> if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost)
> {
>
> if (lost ||
> (ifp->if_link_state == LINK_STATE_UP &&
> link_state == LINK_STATE_UNKNOWN)) {
> /* pretend LINK_STATE_DOWN first */
> }
>
> ifp->if_link_state = link_state;
> /* call domain dom_if_link_state_change callbacks */
> /* call rt_ifmsg */
> }

Anyway I made a patch based on the above code:
  http://www.netbsd.org/~ozaki-r/link_state_change.diff

What I did are to make your code compilable and workable,
and limit the number of pending events.

Thanks,
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Tue, Feb 16, 2016 at 11:37 AM, matthew green  wrote:
>> Oh, I see. We shouldn't drop any events of link state changes.
>
> i don't see any point in keeping a huge series of link changes
> if they can be collapsed into an equivalent sequence.  with VMs
> and other user-controlled systems it's possible to flood such a
> link state checking engine.

Sorry for my bad wording. I meant we shouldn't drop events _in normal
cases_; the number of events should be several. If a huge number of
events happen in a short period (e.g., due to hardware troubles,
abnormal VMMs, etc.), we should eliminate events to protect the system.

  ozaki-r


re: CVS commit: src/sys/net

2016-02-15 Thread matthew green
> Oh, I see. We shouldn't drop any events of link state changes.

i don't see any point in keeping a huge series of link changes
if they can be collapsed into an equivalent sequence.  with VMs
and other user-controlled systems it's possible to flood such a
link state checking engine.


.mrg.


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Mon, Feb 15, 2016 at 8:06 PM, Roy Marples  wrote:
> On 15/02/2016 10:32, Ryota Ozaki wrote:
>> On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples  wrote:
>>> On 15/02/2016 08:08, Ryota Ozaki wrote:
 Module Name:  src
 Committed By: ozaki-r
 Date: Mon Feb 15 08:08:04 UTC 2016

 Modified Files:
   src/sys/net: if.c if.h

 Log Message:
 Run if_link_state_change in softint

 if_link_state_change can execute the network stack that is expected to
 not run in hardware interrupt (at least now), however network drivers
 may call it in hardware interrupt. Avoid that by introducing a new
 softint for if_link_state_change.
>>>
>>> I don't know anything about softints, so this could be a silly question,
>>> but this change looks racey to me. Is there a guarantee anywhere that
>>> if_link_state_change() won't be called again before
>>> if_link_state_change_si() has been called?
>>
>> Not guaranteed. Hmm, you're right. There is a race condition;
>> ifp->if_link_state can be overwritten before calling
>> if_link_state_change_si and it does matter if ifp->if_link_state
>> has been back to ifp->if_old_link_state. (If ifp->if_link_state
>> is different from ifp->if_old_link_state, it doesn't matter
>> because it's just that the later state wins.)
>>
>> I think we can fix by returning from if_link_state_change_si
>> if ifp->if_link_state == ifp->if_old_link_state. Is it correct?
>
> Then we're not doing what we should because the protocol actions may not
> fire.

Oh, I see. We shouldn't drop any events of link state changes.

>
> Can we not extend softint_schedule to take some user data? That way,
> each call (which I'm assuming would be sequential) can carry the link
> state change in the user data which can then be applied to the interface.

I sometimes want that feature but I don't think changing the API
is a good idea.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-15 Thread Taylor R Campbell
   Date: Mon, 15 Feb 2016 11:06:52 +
   From: Roy Marples 

   On 15/02/2016 10:32, Ryota Ozaki wrote:
   > I think we can fix by returning from if_link_state_change_si
   > if ifp->if_link_state == ifp->if_old_link_state. Is it correct?

   Then we're not doing what we should because the protocol actions may not
   fire.

   Can we not extend softint_schedule to take some user data? That way,
   each call (which I'm assuming would be sequential) can carry the link
   state change in the user data which can then be applied to the interface.

Changing softint is not really a good idea -- it is a significant
semantic change that most current users don't require, because they
already have existing queue mechanisms, e.g. pktq.

If it is necessary to report every link state transition, then we need
to create a queueing mechanism -- and possibly drop some transitions
in the case of memory shortage, as we already do in route_enqueue.

Maybe something like this, with a queue and a link_state_pending field
to avoid creating extra queue entries unnecessarily but guaranteeing
that the most recent link state change will take effect, and be
preceded by LINK_STATE_DOWN if anything was lost:

struct link_state_change {
SIMPLEQ_ENTRY(link_state_change)lsc_entry;
int lsc_state;
boollsc_lost;
};

static pool_cache_t link_state_change_pc __read_mostly;

void
if_link_state_change(struct ifnet *ifp, int link_state)
{
int s;

s = splnet();
if (ifp->if_link_state_pending == link_state)
goto out;

if (ifp->if_link_state_pending != ifp->if_link_state) {
struct link_state_change *lsc;

lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT);
if (lsc == NULL) {
ifp->if_link_state_lost = true;
goto change;
}

lsc->lsc_state = link_state;
lsc->lsc_lost = ifp->if_link_state_lost;
SIMPLEQ_INSERT_TAIL(&ifp->if_link_state_changes, lsc,
lsc_entry);
}

ifp->if_link_state_lost = false;
change: ifp->if_link_state_pending = link_state;
softint_schedule(ifp->if_link_state_softint);
out:splx(s);
}

static void
if_link_state_change_softintr(void *cookie)
{
struct ifnet *ifp = cookie;
struct link_state_change *lsc;
int s;

s = splnet();
while (!SIMPLEQ_EMPTY(&ifp->if_link_state_changes)) {
lsc = SIMPLEQ_HEAD(&ifp->if_link_state_changes);
SIMPLEQ_REMOVE_HEAD(&ifp->if_link_state_changes, lsc_entry);
if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost);
}

if (ifp->if_link_state_pending != ifp->if_link_state)
if_link_state_change_1(ifp, ifp->if_link_state_pending,
ifp->if_link_state_lost);
ifp->if_link_state_lost = false;
splx(s);
}

static void
if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost)
{

if (lost ||
(ifp->if_link_state == LINK_STATE_UP &&
link_state == LINK_STATE_UNKNOWN)) {
/* pretend LINK_STATE_DOWN first */
}

ifp->if_link_state = link_state;
/* call domain dom_if_link_state_change callbacks */
/* call rt_ifmsg */
}


Re: CVS commit: src/sys/net

2016-02-15 Thread Martin Husemann
On Mon, Feb 15, 2016 at 11:06:52AM +, Roy Marples wrote:
> Can we not extend softint_schedule to take some user data? That way,
> each call (which I'm assuming would be sequential) can carry the link
> state change in the user data which can then be applied to the interface.

Sounds like a workqueue(9) then...

Martin


Re: CVS commit: src/sys/net

2016-02-15 Thread Roy Marples
On 15/02/2016 10:32, Ryota Ozaki wrote:
> On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples  wrote:
>> On 15/02/2016 08:08, Ryota Ozaki wrote:
>>> Module Name:  src
>>> Committed By: ozaki-r
>>> Date: Mon Feb 15 08:08:04 UTC 2016
>>>
>>> Modified Files:
>>>   src/sys/net: if.c if.h
>>>
>>> Log Message:
>>> Run if_link_state_change in softint
>>>
>>> if_link_state_change can execute the network stack that is expected to
>>> not run in hardware interrupt (at least now), however network drivers
>>> may call it in hardware interrupt. Avoid that by introducing a new
>>> softint for if_link_state_change.
>>
>> I don't know anything about softints, so this could be a silly question,
>> but this change looks racey to me. Is there a guarantee anywhere that
>> if_link_state_change() won't be called again before
>> if_link_state_change_si() has been called?
> 
> Not guaranteed. Hmm, you're right. There is a race condition;
> ifp->if_link_state can be overwritten before calling
> if_link_state_change_si and it does matter if ifp->if_link_state
> has been back to ifp->if_old_link_state. (If ifp->if_link_state
> is different from ifp->if_old_link_state, it doesn't matter
> because it's just that the later state wins.)
> 
> I think we can fix by returning from if_link_state_change_si
> if ifp->if_link_state == ifp->if_old_link_state. Is it correct?

Then we're not doing what we should because the protocol actions may not
fire.

Can we not extend softint_schedule to take some user data? That way,
each call (which I'm assuming would be sequential) can carry the link
state change in the user data which can then be applied to the interface.

Roy


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples  wrote:
> On 15/02/2016 08:08, Ryota Ozaki wrote:
>> Module Name:  src
>> Committed By: ozaki-r
>> Date: Mon Feb 15 08:08:04 UTC 2016
>>
>> Modified Files:
>>   src/sys/net: if.c if.h
>>
>> Log Message:
>> Run if_link_state_change in softint
>>
>> if_link_state_change can execute the network stack that is expected to
>> not run in hardware interrupt (at least now), however network drivers
>> may call it in hardware interrupt. Avoid that by introducing a new
>> softint for if_link_state_change.
>
> I don't know anything about softints, so this could be a silly question,
> but this change looks racey to me. Is there a guarantee anywhere that
> if_link_state_change() won't be called again before
> if_link_state_change_si() has been called?

Not guaranteed. Hmm, you're right. There is a race condition;
ifp->if_link_state can be overwritten before calling
if_link_state_change_si and it does matter if ifp->if_link_state
has been back to ifp->if_old_link_state. (If ifp->if_link_state
is different from ifp->if_old_link_state, it doesn't matter
because it's just that the later state wins.)

I think we can fix by returning from if_link_state_change_si
if ifp->if_link_state == ifp->if_old_link_state. Is it correct?

Thanks,
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-15 Thread Roy Marples
On 15/02/2016 08:08, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Mon Feb 15 08:08:04 UTC 2016
> 
> Modified Files:
>   src/sys/net: if.c if.h
> 
> Log Message:
> Run if_link_state_change in softint
> 
> if_link_state_change can execute the network stack that is expected to
> not run in hardware interrupt (at least now), however network drivers
> may call it in hardware interrupt. Avoid that by introducing a new
> softint for if_link_state_change.

I don't know anything about softints, so this could be a silly question,
but this change looks racey to me. Is there a guarantee anywhere that
if_link_state_change() won't be called again before
if_link_state_change_si() has been called?

Roy


Re: CVS commit: src/sys/net

2016-01-13 Thread Kengo NAKAHARA
Hi,

On 2016/01/13 7:10, David Laight wrote:
> On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote:
>> # Sorry, I forgot to subscribe source-changes-d ml, I reply as
>> # a new mail.
>>
>> Hi,
>>
>>> In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
>>> Kengo NAKAHARA  wrote:
 -=-=-=-=-=-

 Module Name:   src
 Committed By:  knakahara
 Date:  Thu Dec 10 08:11:03 UTC 2015

 Modified Files:
src/sys/net: if_gif.c

 Log Message:
 kmem_zalloc(, KM_SLEEP) must not return NULL.
>>>
>>> I would like to solicit opinions about this change and form a general
>>> policy.
>>>
>>> 1. I would like to reduce the use of KASSERT in the kernel, specially
>>> in situations like thee above where the test can be centralized (inside
>>> kmem_alloc) and avoided without being fatal.
>>
>> OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
> 
> There is no real point using KASSERT() when the immediate
> dereference of NULL will have the same effect and is about as
> easy to debug.
> 
> Whether kmem_alloc(KM_SLEEP) can return NULL is another matter.
> Seems wrong to me - maybe even for impossible allications.
> ISTR a problem waiting for KVA and phys-mem being 'difficult'.
> I know that the Linux equivalent will return NULL, not sure when.
> It would be useful to define that allocation for non-huge
> (maybe even several MB) amounts will always sleep.
> 
>   David

Have you read this thread?
http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019762.html

I think Mr. Chuck Silvers points out a similar issue.
http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019772.html


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2016-01-12 Thread David Laight
On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote:
> # Sorry, I forgot to subscribe source-changes-d ml, I reply as
> # a new mail.
> 
> Hi,
> 
> > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> > Kengo NAKAHARA  wrote:
> >>-=-=-=-=-=-
> >>
> >>Module Name:src
> >>Committed By:   knakahara
> >>Date:   Thu Dec 10 08:11:03 UTC 2015
> >>
> >>Modified Files:
> >>src/sys/net: if_gif.c
> >>
> >>Log Message:
> >>kmem_zalloc(, KM_SLEEP) must not return NULL.
> > 
> > I would like to solicit opinions about this change and form a general
> > policy.
> > 
> > 1. I would like to reduce the use of KASSERT in the kernel, specially
> > in situations like thee above where the test can be centralized (inside
> > kmem_alloc) and avoided without being fatal.
> 
> OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.

There is no real point using KASSERT() when the immediate
dereference of NULL will have the same effect and is about as
easy to debug.

Whether kmem_alloc(KM_SLEEP) can return NULL is another matter.
Seems wrong to me - maybe even for impossible allications.
ISTR a problem waiting for KVA and phys-mem being 'difficult'.
I know that the Linux equivalent will return NULL, not sure when.
It would be useful to define that allocation for non-huge
(maybe even several MB) amounts will always sleep.

David

-- 
David Laight: da...@l8s.co.uk


Re: CVS commit: src/sys/net

2015-12-10 Thread Kengo NAKAHARA
Hi,

On 2015/12/11 13:00, Christos Zoulas wrote:
> On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
> -- Subject: Re: CVS commit: src/sys/net
> 
> | Hi,
> | 
> | > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> | > Kengo NAKAHARA  wrote:
> | >>-=-=-=-=-=-
> | >>
> | >>Module Name:  src
> | >>Committed By: knakahara
> | >>Date: Thu Dec 10 08:11:03 UTC 2015
> | >>
> | >>Modified Files:
> | >>  src/sys/net: if_gif.c
> | >>
> | >>Log Message:
> | >>kmem_zalloc(, KM_SLEEP) must not return NULL.
> | > 
> | > I would like to solicit opinions about this change and form a general
> | > policy.
> | > 
> | > 1. I would like to reduce the use of KASSERT in the kernel, specially
> | > in situations like thee above where the test can be centralized (inside
> | > kmem_alloc) and avoided without being fatal.
> | 
> | OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
> | 
> | 
> | > 2. Static analyzer models understand allocators, but they are not
> | > smart enough to determine under which situations they can fail. I
> | > believe even kmem_alloc with KM_SLEEP can fail when the size is
> | > large enough.
> | 
> | I have a question. The man of kmem(9) says:
> | 
> |  kmflags  Either of the following:
> | 
> |   KM_SLEEPIf the allocation cannot be satisfied immediately,
> |   sleep until enough memory is available.
> 
>   
> int ret = uvm_km_kmem_alloc(kmem_va_arena,
> (vsize_t)round_page(size),
> ((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) 
>  | VM_INSTANTFIT, (vmem_addr_t *)&p);
> if (ret) {
> return NULL; 
> } 
>   
> 
> | 
> | Is this manual incorrect?
> | I'm confused... Could you tell me easily comprehensible manual?
> 
> It is documented that it does not fail if you sleep because this
> is the usual scenario. Follow the path to the rebbit hole and you
> go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you
> see that under certain conditions it will fail even when you sleep.

I see. Thank you for your explanation.

> Note that bt_refill return values are not even always checked, so
> this can fail in other mysterious ways.

Oh...

> | > So I propose to always check the return value of allocators with
> | > an 'if' and not a KASSERT.
> | 
> | There are some codes like "foo = kmem_alloc(size, KM_SLEEP);
> | KASSERT(foo != NULL)".
> | Should the codes be unified to use not KASSERT' but if'?
> 
> Yes (when it is possible), and the man page for kmem_alloc should be
> changed to reflect that.

I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now.

> Log Message:
> Spell out that KM_SLEEP allocations can fail.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.17 -r1.18 src/share/man/man9/kmem.9

Thank you for your quickly update!


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2015-12-10 Thread Christos Zoulas
On Dec 11,  1:25pm, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
-- Subject: Re: CVS commit: src/sys/net


| I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now.

Thanks.

| Thank you for your quickly update!

Thanks for bringing this up. I've been meaning to do something about it
for a long time ;-)

christos


Re: CVS commit: src/sys/net

2015-12-10 Thread Christos Zoulas
On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Hi,
| 
| > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
| > Kengo NAKAHARA  wrote:
| >>-=-=-=-=-=-
| >>
| >>Module Name:src
| >>Committed By:   knakahara
| >>Date:   Thu Dec 10 08:11:03 UTC 2015
| >>
| >>Modified Files:
| >>src/sys/net: if_gif.c
| >>
| >>Log Message:
| >>kmem_zalloc(, KM_SLEEP) must not return NULL.
| > 
| > I would like to solicit opinions about this change and form a general
| > policy.
| > 
| > 1. I would like to reduce the use of KASSERT in the kernel, specially
| > in situations like thee above where the test can be centralized (inside
| > kmem_alloc) and avoided without being fatal.
| 
| OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
| 
| 
| > 2. Static analyzer models understand allocators, but they are not
| > smart enough to determine under which situations they can fail. I
| > believe even kmem_alloc with KM_SLEEP can fail when the size is
| > large enough.
| 
| I have a question. The man of kmem(9) says:
| 
|  kmflags  Either of the following:
| 
|   KM_SLEEPIf the allocation cannot be satisfied immediately,
|   sleep until enough memory is available.


int ret = uvm_km_kmem_alloc(kmem_va_arena,
(vsize_t)round_page(size),
((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) 
 | VM_INSTANTFIT, (vmem_addr_t *)&p);
if (ret) {
return NULL; 
} 


| 
| Is this manual incorrect?
| I'm confused... Could you tell me easily comprehensible manual?

It is documented that it does not fail if you sleep because this
is the usual scenario. Follow the path to the rebbit hole and you
go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you
see that under certain conditions it will fail even when you sleep.
Note that bt_refill return values are not even always checked, so
this can fail in other mysterious ways.

| > So I propose to always check the return value of allocators with
| > an 'if' and not a KASSERT.
| 
| There are some codes like "foo = kmem_alloc(size, KM_SLEEP);
| KASSERT(foo != NULL)".
| Should the codes be unified to use not KASSERT' but if'?

Yes (when it is possible), and the man page for kmem_alloc should be
changed to reflect that.

christos


Re: CVS commit: src/sys/net

2015-12-10 Thread Kengo NAKAHARA
# Sorry, I forgot to subscribe source-changes-d ml, I reply as
# a new mail.

Hi,

> In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> Kengo NAKAHARA  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:  src
>>Committed By: knakahara
>>Date: Thu Dec 10 08:11:03 UTC 2015
>>
>>Modified Files:
>>  src/sys/net: if_gif.c
>>
>>Log Message:
>>kmem_zalloc(, KM_SLEEP) must not return NULL.
> 
> I would like to solicit opinions about this change and form a general
> policy.
> 
> 1. I would like to reduce the use of KASSERT in the kernel, specially
> in situations like thee above where the test can be centralized (inside
> kmem_alloc) and avoided without being fatal.

OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.


> 2. Static analyzer models understand allocators, but they are not
> smart enough to determine under which situations they can fail. I
> believe even kmem_alloc with KM_SLEEP can fail when the size is
> large enough.

I have a question. The man of kmem(9) says:

 kmflags  Either of the following:

  KM_SLEEPIf the allocation cannot be satisfied immediately,
  sleep until enough memory is available.

Is this manual incorrect?
I'm confused... Could you tell me easily comprehensible manual?


> So I propose to always check the return value of allocators with
> an 'if' and not a KASSERT.

There are some codes like "foo = kmem_alloc(size, KM_SLEEP); KASSERT(foo != 
NULL)".
Should the codes be unified to use not KASSERT' but if'?


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2015-12-10 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

> In article <20151210081103.e0fbbf...@cvs.netbsd.org>,
> Kengo NAKAHARA  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:  src
>>Committed By: knakahara
>>Date: Thu Dec 10 08:11:03 UTC 2015
>>
>>Modified Files:
>>  src/sys/net: if_gif.c
>>
>>Log Message:
>>kmem_zalloc(, KM_SLEEP) must not return NULL.
>
> I would like to solicit opinions about this change and form a general
> policy.
>
> 1. I would like to reduce the use of KASSERT in the kernel, specially
> in situations like thee above where the test can be centralized (inside
> kmem_alloc) and avoided without being fatal.
>
> 2. Static analyzer models understand allocators, but they are not
> smart enough to determine under which situations they can fail. I
> believe even kmem_alloc with KM_SLEEP can fail when the size is
> large enough.
>
> So I propose to always check the return value of allocators with
> an 'if' and not a KASSERT.

I think that a function needs to have a contract specified in a contract
(and perhaps static analyzer markup).  Code should never KASSERT for
anything that can not be argued (statically shown) to be always true
given contracts.  So I agree with you.


signature.asc
Description: PGP signature


  1   2   3   >