Re: CVS commit: src/sys/net

2020-01-28 Thread Kamil Rytarowski
On 29.01.2020 07:40, Jason Thorpe wrote:
> 
>> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski  wrote:
>>
>> The distribution build breaks for me with:
> 
> Should be fixed with:
> 
>  /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v  <--  Makefile
>  new revision: 1.33; previous revision: 1.32
> 
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_init'
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_fini'
>> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
>> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
>> reference to `rumpns_if_stats_to_if_data'
>>
> 
> -- thorpej
> 

It works now!



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2020-01-28 Thread Jason Thorpe


> On Jan 28, 2020, at 10:25 PM, Kamil Rytarowski  wrote:
> 
> The distribution build breaks for me with:

Should be fixed with:

 /cvsroot/src/sys/rump/net/lib/libnet/Makefile,v  <--  Makefile
 new revision: 1.33; previous revision: 1.32

> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_init'
> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_fini'
> /public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
> /public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
> reference to `rumpns_if_stats_to_if_data'
> 

-- thorpej



Re: CVS commit: src/sys/net

2020-01-28 Thread Kamil Rytarowski
On 29.01.2020 04:16, Jason R Thorpe wrote:
> Module Name:  src
> Committed By: thorpej
> Date: Wed Jan 29 03:16:28 UTC 2020
> 
> Modified Files:
>   src/sys/net: Makefile files.net if.c if.h
> Added Files:
>   src/sys/net: if_stats.c if_stats.h
> 
> Log Message:
> Add support for MP-safe network interface statistics by maintaining them
> in per-cpu storage, and collecting them for export in an if_data structure
> when user-space wants them.
> 
> The new if_stat API is structured to make a gradual transition to the
> new way in network drivers possible, and per-cpu stats are currently
> disabled (thus there is no kernel ABI change).  Once all drivers have
> been converted, the old ABI will be removed, and per-cpu stats will be
> enabled universally.
> 

The distribution build breaks for me with:

/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_init'
/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_fini'
/public/netbsd-root/tooldir.NetBSD-9.99.38-amd64/lib/gcc/x86_64--netbsd/8.3.0/../../../../x86_64--netbsd/bin/ld:
/public/netbsd-root/destdir.amd64/usr/lib/librumpnet_net.so: undefined
reference to `rumpns_if_stats_to_if_data'



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2019-12-04 Thread Masanobu SAITOH
On 2019/12/05 14:29, SAITOH Masanobu wrote:
> Module Name:  src
> Committed By: msaitoh
> Date: Thu Dec  5 05:29:28 UTC 2019
> 
> Modified Files:
>   src/sys/net: if_media.h
> 
> Log Message:
> Fix previous comment change for ifm_media. It was correct.
> 
>  The real problem is that some driver misuse ifm_media as the current active
> media. struct mii_data has the current active media(mii_media_active). If a
> driver use mii(4), it can be use mii->mii_media_active for this purpose.
> struct ifmedia has no entry for this purpose. Some drivers have an entry

s/Some drivers/Some drivers not using mii(4)/

> in their own softc to keep the value, but some other's don't have it and
> they mistakenly use ifm_media.
> 
>  We might add a new entry to struct ifmedia in future to avoid this confusion
> and for simplify.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.67 -r1.68 src/sys/net/if_media.h
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2019-09-23 Thread Ryota Ozaki
On Mon, Sep 23, 2019 at 4:14 PM Kamil Rytarowski  wrote:
>
> On 23.09.2019 06:53, Rin Okuyama wrote:
> > Hi,
> >
> > On 2019/09/22 18:30, Kamil Rytarowski wrote:
> >> On 12.04.2018 06:38, Ryota Ozaki wrote:
> >>> Module Name:src
> >>> Committed By:ozaki-r
> >>> Date:Thu Apr 12 04:38:13 UTC 2018
> >>>
> >>> Modified Files:
> >>> src/sys/net: if.h route.c route.h rtsock.c
> >>>
> >>> Log Message:
> >>> Resolve tangled lock dependencies in route.c
> >>>
> >>> This change sweeps remaining lock decisions based on if locked or not
> >>> by moving
> >>> utility functions of rtentry updates from rtsock.c and ensuring
> >>> holding the
> >>> rt_lock.  It also improves the atomicity of a update of a rtentry.
> >>>
> >>
> >>> +static struct ifaddr *
> >>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct
> >>> rtentry *rt,
> >>> +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
> >>> +{
> >>
> >>
> >> Do we need to pass info as a value? It is pretty large here (1024 bytes).
> >
> > Yeah, we were just discussing on this alert of LGTM bot.
> >
> > We can use const pointer here. I will commit the fix soon.
> >
> > Thanks,
> > rin

Thank you for the commit!

>
> Thanks for addressing it! I wonder whether there is performance impact
> here (is this hot-path code?).

The function is not used in any packet processing (hot paths)
and used only for route updates that are uncommon for most users.

  ozaki-r


Re: CVS commit: src/sys/net

2019-09-23 Thread Kamil Rytarowski
On 23.09.2019 06:53, Rin Okuyama wrote:
> Hi,
> 
> On 2019/09/22 18:30, Kamil Rytarowski wrote:
>> On 12.04.2018 06:38, Ryota Ozaki wrote:
>>> Module Name:    src
>>> Committed By:    ozaki-r
>>> Date:    Thu Apr 12 04:38:13 UTC 2018
>>>
>>> Modified Files:
>>> src/sys/net: if.h route.c route.h rtsock.c
>>>
>>> Log Message:
>>> Resolve tangled lock dependencies in route.c
>>>
>>> This change sweeps remaining lock decisions based on if locked or not
>>> by moving
>>> utility functions of rtentry updates from rtsock.c and ensuring
>>> holding the
>>> rt_lock.  It also improves the atomicity of a update of a rtentry.
>>>
>>
>>> +static struct ifaddr *
>>> +rt_update_get_ifa(const struct rt_addrinfo info, const struct
>>> rtentry *rt,
>>> +    struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
>>> +{
>>
>>
>> Do we need to pass info as a value? It is pretty large here (1024 bytes).
> 
> Yeah, we were just discussing on this alert of LGTM bot.
> 
> We can use const pointer here. I will commit the fix soon.
> 
> Thanks,
> rin

Thanks for addressing it! I wonder whether there is performance impact
here (is this hot-path code?).



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net

2019-09-22 Thread Rin Okuyama

Hi,

On 2019/09/22 18:30, Kamil Rytarowski wrote:

On 12.04.2018 06:38, Ryota Ozaki wrote:

Module Name:src
Committed By:   ozaki-r
Date:   Thu Apr 12 04:38:13 UTC 2018

Modified Files:
src/sys/net: if.h route.c route.h rtsock.c

Log Message:
Resolve tangled lock dependencies in route.c

This change sweeps remaining lock decisions based on if locked or not by moving
utility functions of rtentry updates from rtsock.c and ensuring holding the
rt_lock.  It also improves the atomicity of a update of a rtentry.




+static struct ifaddr *
+rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt,
+struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
+{



Do we need to pass info as a value? It is pretty large here (1024 bytes).


Yeah, we were just discussing on this alert of LGTM bot.

We can use const pointer here. I will commit the fix soon.

Thanks,
rin


Re: CVS commit: src/sys/net

2019-09-22 Thread Kamil Rytarowski
On 12.04.2018 06:38, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Thu Apr 12 04:38:13 UTC 2018
> 
> Modified Files:
>   src/sys/net: if.h route.c route.h rtsock.c
> 
> Log Message:
> Resolve tangled lock dependencies in route.c
> 
> This change sweeps remaining lock decisions based on if locked or not by 
> moving
> utility functions of rtentry updates from rtsock.c and ensuring holding the
> rt_lock.  It also improves the atomicity of a update of a rtentry.
> 

> +static struct ifaddr *
> +rt_update_get_ifa(const struct rt_addrinfo info, const struct rtentry *rt,
> +struct ifnet **ifp, struct psref *psref_ifp, struct psref *psref)
> +{


Do we need to pass info as a value? It is pretty large here (1024 bytes).



signature.asc
Description: OpenPGP digital signature


Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Maxime Villard

Le 06/08/2019 à 12:31, Christos Zoulas a écrit :

I did not see any messages about it, and the fix is fine until rmind comes up 
with something better.


Yes turns out it was an off-list email


It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).


Given your second commit, I should understand that rmind is ok with this
change right?


christos


On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:

Le 06/08/2019 à 12:25, Christos Zoulas a écrit :

Module Name:src
Committed By:   christos
Date:   Tue Aug  6 10:25:13 UTC 2019
Modified Files:
src/sys/net/npf: npf_conn.c
Log Message:
Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.
To generate a diff of this commit:
cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Rmind said he had a fix and was testing it. Please revert this.




Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Christos Zoulas
Yes, rmind asked me to commit the change in private email.

christos

> On Aug 6, 2019, at 3:59 PM, Maxime Villard  wrote:
> 
> Le 06/08/2019 à 12:31, Christos Zoulas a écrit :
>> I did not see any messages about it, and the fix is fine until rmind comes 
>> up with something better.
> 
> Yes turns out it was an off-list email
> 
>> It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).
> 
> Given your second commit, I should understand that rmind is ok with this
> change right?
> 
>> christos
>>> On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:
>>> 
>>> Le 06/08/2019 à 12:25, Christos Zoulas a écrit :
 Module Name:   src
 Committed By:  christos
 Date:  Tue Aug  6 10:25:13 UTC 2019
 Modified Files:
src/sys/net/npf: npf_conn.c
 Log Message:
 Introduce an npf_conn_destroy_idx() that can handle partially constructed
 conn structures.
 To generate a diff of this commit:
 cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.
>>> 
>>> Rmind said he had a fix and was testing it. Please revert this.



Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Christos Zoulas
I did not see any messages about it, and the fix is fine until rmind comes up 
with something better.
It is not nice to have HEAD unusable for 2 weeks now (since July 22nd).

christos

> On Aug 6, 2019, at 1:26 PM, Maxime Villard  wrote:
> 
> Le 06/08/2019 à 12:25, Christos Zoulas a écrit :
>> Module Name: src
>> Committed By:christos
>> Date:Tue Aug  6 10:25:13 UTC 2019
>> Modified Files:
>>  src/sys/net/npf: npf_conn.c
>> Log Message:
>> Introduce an npf_conn_destroy_idx() that can handle partially constructed
>> conn structures.
>> To generate a diff of this commit:
>> cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c
>> Please note that diffs are not public domain; they are subject to the
>> copyright notices on the relevant files.
> 
> Rmind said he had a fix and was testing it. Please revert this.



Re: CVS commit: src/sys/net/npf

2019-08-06 Thread Maxime Villard

Le 06/08/2019 à 12:25, Christos Zoulas a écrit :

Module Name:src
Committed By:   christos
Date:   Tue Aug  6 10:25:13 UTC 2019

Modified Files:
src/sys/net/npf: npf_conn.c

Log Message:
Introduce an npf_conn_destroy_idx() that can handle partially constructed
conn structures.


To generate a diff of this commit:
cvs rdiff -u -r1.27 -r1.28 src/sys/net/npf/npf_conn.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Rmind said he had a fix and was testing it. Please revert this.


Re: CVS commit: src/sys/net

2019-04-15 Thread Christos Zoulas
In article ,
Ryota Ozaki   wrote:
>On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas  wrote:
>>
>> Module Name:src
>> Committed By:   christos
>> Date:   Mon Apr 15 20:51:46 UTC 2019
>>
>> Modified Files:
>> src/sys/net: if.c
>>
>> Log Message:
>> Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
>> disclosure. From Andy Nguyen, many thanks!
>
>Should we apply the same fix to all compat variants of ifconf?

Yup, good catch! Also pullups :-)

christos



Re: CVS commit: src/sys/net

2019-04-15 Thread Ryota Ozaki
On Tue, Apr 16, 2019 at 5:51 AM Christos Zoulas  wrote:
>
> Module Name:src
> Committed By:   christos
> Date:   Mon Apr 15 20:51:46 UTC 2019
>
> Modified Files:
> src/sys/net: if.c
>
> Log Message:
> Zero out the ifreq struct for SIOCGIFCONF to avoid up to 127 bytes of stack
> disclosure. From Andy Nguyen, many thanks!

Should we apply the same fix to all compat variants of ifconf?

  ozaki-r


Re: CVS commit: src/sys/net

2019-03-26 Thread Robert Elz
Date:Tue, 26 Mar 2019 00:23:32 +
From:"Paul Goyette" 
Message-ID:  <20190326002332.c0241f...@cvs.netbsd.org>

  | XXX Someone(tm) needs to update MAKEDEV to create the /dev/srtN device
  | nodes (with device-major 179)!

Yes, I know ...

That's easy - but as I mentioned in off list mail, my private
pre-requisite for doing it is making srt(4) exist first.

kre

ps: thanks for all the modularisation fixes.   I wonder how much
more exists that appears as if it could be made a module, but
actually cannot?



re: CVS commit: src/sys/net

2019-02-18 Thread matthew green
"Christos Zoulas" writes:
> Module Name:  src
> Committed By: christos
> Date: Mon Feb 18 23:13:14 UTC 2019
> 
> Modified Files:
>   src/sys/net: zlib.c
> 
> Log Message:
> add fallthrough's

i specific avoided changing upstream code where possible by adding
the relevant cflags.

where was this failing?

same with most of the drm changes (commited) and (posted) patch, too.

can you either revert these as unnecessary or go remove the flags
for them?  they're duplicated in multiple places (module/kernel).

thanks.


.mrg.


Re: CVS commit: src/sys/net

2018-12-14 Thread Rin Okuyama

Thank you for finding it out. I removed it in accident.

rin

On 2018/12/14 21:27, Martin Husemann wrote:

Module Name:src
Committed By:   martin
Date:   Fri Dec 14 12:27:22 UTC 2018

Modified Files:
src/sys/net: if_bridge.c

Log Message:
Need  for ip6_statinc() prototype.


To generate a diff of this commit:
cvs rdiff -u -r1.161 -r1.162 src/sys/net/if_bridge.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Re: CVS commit: src/sys/net

2018-12-13 Thread Jason Thorpe
Panic seems not optimal.  A rate limited log message at LOG_ERROR seems better.

-- thorpej
Sent from my iPhone.

> On Dec 13, 2018, at 12:54 PM, Rin Okuyama  wrote:
> 
> Module Name:src
> Committed By:rin
> Date:Thu Dec 13 20:54:50 UTC 2018
> 
> Modified Files:
>src/sys/net: ether_sw_offload.c
> 
> Log Message:
> Panic rather than silently dropping packets when TX offload options are
> enabled for unsupported frame types.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.3 -r1.4 src/sys/net/ether_sw_offload.c
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 



Re: CVS commit: src/sys/net

2018-07-10 Thread Martin Husemann
On Tue, Jul 10, 2018 at 05:02:27PM +, Christos Zoulas wrote:
> Should be pulled up to -8, with the other patch.

Already done ;-)


Re: CVS commit: src/sys/net

2018-07-10 Thread Christos Zoulas
In article <20180710110040.e80fcf...@cvs.netbsd.org>,
Robert Elz  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  kre
>Date:  Tue Jul 10 11:00:40 UTC 2018
>
>Modified Files:
>   src/sys/net: if_llatbl.c
>
>Log Message:
>Avoid attempting to call arp related functions if there is no
>arp in the kernel.

Should be pulled up to -8, with the other patch.

christos



Re: CVS commit: src/sys/net/npf

2018-04-08 Thread Maxime Villard

Le 07/04/2018 à 23:28, Christos Zoulas a écrit :

In article <20180407090627.20058f...@cvs.netbsd.org>,
Maxime Villard  wrote:

-=-=-=-=-=-

Module Name:src
Committed By:   maxv
Date:   Sat Apr  7 09:06:27 UTC 2018

Modified Files:
src/sys/net/npf: npf_inet.c

Log Message:
Rewrite npf_fetch_tcpopts:

* Instead of doing several nbuf_advance/nbuf_ensure_contig and
   playing with gotos, fetch the TCP options only once, and iterate over
   the (safe) area. The code is similar to tcp_dooptions.

* When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is
   the one we're expecting. If it isn't, then skip the option. This
   wasn't done before, and not doing it allowed a packet to bypass the
   max-mss clamping procedure. Discussed on tech-net@.



This seems to break

cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning
ENETUNREACH.

christos


My bad (again).

Seems like the TCP code is getting me confused all the time.


Re: CVS commit: src/sys/net/npf

2018-04-07 Thread Christos Zoulas
In article <20180407090627.20058f...@cvs.netbsd.org>,
Maxime Villard  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  maxv
>Date:  Sat Apr  7 09:06:27 UTC 2018
>
>Modified Files:
>   src/sys/net/npf: npf_inet.c
>
>Log Message:
>Rewrite npf_fetch_tcpopts:
>
> * Instead of doing several nbuf_advance/nbuf_ensure_contig and
>   playing with gotos, fetch the TCP options only once, and iterate over
>   the (safe) area. The code is similar to tcp_dooptions.
>
> * When handling TCPOPT_MAXSEG and TCPOPT_WINDOW, ensure the length is
>   the one we're expecting. If it isn't, then skip the option. This
>   wasn't done before, and not doing it allowed a packet to bypass the
>   max-mss clamping procedure. Discussed on tech-net@.
>

This seems to break

cvs -d cvs.netbsd.org:/cvsroot diff, with write via ssh returning
ENETUNREACH.

christos



Re: CVS commit: src/sys/net/npf

2018-01-30 Thread Maxime Villard

Le 31/01/2018 à 00:18, Mindaugas Rasiukevicius a écrit :

[...]

Fix this by using uint32_t. While here, it seems to me there is also a
memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
a value that goes beyond the mbuf.

[...]
If the npc_hlen value is beyond the packet length, NPF's nbuf interface
will catch that, since it performs the bounds check.


I meant to say that the IPv6 loop in npf_cache_ip seems suspicious to me.

while (nbuf_advance(nbuf, hlen, 0) != NULL) {
[...]
hlen = (ip6e->ip6e_len + 1) << 3;
[...]
npc->npc_hlen += hlen;
}
[continue execution...]

Here, if you have a 'hlen' that goes beyond the mbuf, nbuf_advance will fail,
and we're not handling this case. npc_hlen got incremented along the way, and
it now points past the end of the mbuf.

Perhaps that's handled properly later, but in all cases, we ought to handle
the error right here instead of processing the packet any further.

Note however that NPF is rather at the end of my TODO list, and I'll come back
to it later.

Maxime


Re: CVS commit: src/sys/net/npf

2018-01-30 Thread Mindaugas Rasiukevicius
"Maxime Villard"  wrote:
> Module Name:  src
> Committed By: maxv
> Date: Fri Dec 15 21:00:26 UTC 2017
> 
> Modified Files:
>   src/sys/net/npf: npf.h
> 
> Log Message:
> Fix a vulnerability in NPF, that allows whatever incoming IPv6 packet to
> bypass a certain number of filtering rules.
> 
> Basically there is an integer overflow in npf_cache_ip: npc_hlen is a
> 8bit unsigned int, and can wrap to zero if the IPv6 packet being processed
> has large extensions.

Thanks for discovering and fixing this.  I think this is the first
serious remote vulnerability in NPF, although limited to IPv6 only.

> Fix this by using uint32_t. While here, it seems to me there is also a
> memory overflow: still in npf_cache_ip, npc_hlen may be incremented with
> a value that goes beyond the mbuf.

A minor aspect, but promoting npf_hlen to uint32_t results in wasteful
padding in the struct, so it is better to re-order the struct members
in this case.

If the npc_hlen value is beyond the packet length, NPF's nbuf interface
will catch that, since it performs the bounds check.  However, I think
we should implement some sanity check for the npc_hlen value.  RFC 7112
suggests that the IPv6 header chain should not exceed the MTU size (and
thus fit in the first fragment, in case of fragmentation).  Some value
along these lines could be the basis for a sanity check..

-- 
Mindaugas


Re: CVS commit: src/sys/net

2018-01-19 Thread Maxime Villard

Le 19/01/2018 à 13:31, Takeshi Nakayama a écrit :

Module Name:src
Committed By:   nakayama
Date:   Fri Jan 19 12:31:28 UTC 2018

Modified Files:
src/sys/net: if_ethersubr.c

Log Message:
Fix inverted logic.


To generate a diff of this commit:
cvs rdiff -u -r1.256 -r1.257 src/sys/net/if_ethersubr.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


Ah yes, my bad, sorry about that. When I saw this ugly #ifdef DIAGNOSTIC and
panic I just machinally replaced it by KASSERTMSG.

Maxime


Re: CVS commit: src/sys/net

2018-01-08 Thread maya
On Tue, Jan 09, 2018 at 01:07:50AM +, m...@netbsd.org wrote:
> This is causing PR kern/52914 (on netbsd-8, not current)

Copied wrong number
kern/52895

it's in -current but also -8


Re: CVS commit: src/sys/net

2017-12-20 Thread Ryota Ozaki
2017/12/18 22:09 "Christos Zoulas" <chris...@zoulas.com>:

On Dec 18,  2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Oh cool. I didn't know that technique.

Great, can you undo the change then?


I already did it :)

  ozaki-r


Thanks,

christos


Re: CVS commit: src/sys/net

2017-12-18 Thread Christos Zoulas
On Dec 18,  2:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Oh cool. I didn't know that technique.

Great, can you undo the change then?

Thanks,

christos


Re: CVS commit: src/sys/net

2017-12-17 Thread Ryota Ozaki
On Thu, Dec 14, 2017 at 10:03 PM, Christos Zoulas <chris...@zoulas.com> wrote:
> On Dec 14,  5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
> -- Subject: Re: CVS commit: src/sys/net
>
> | Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
> | Anyway no objection to revert the change that is not quite important
> | (but makes my life easy).
>
> just do:
>
> cat << _EOF >> /etc/mk.conf
> .if make(netbsd)
> COPTS.route.c += -O0
> .endif
> _EOF
>
> That (or something similar) should work.

Oh cool. I didn't know that technique.

Thanks,
  ozaki-r


Re: CVS commit: src/sys/net

2017-12-14 Thread Christos Zoulas
On Dec 14,  5:42pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
| Anyway no objection to revert the change that is not quite important
| (but makes my life easy).

just do:

cat << _EOF >> /etc/mk.conf
.if make(netbsd)
COPTS.route.c += -O0
.endif
_EOF

That (or something similar) should work.

christos


Re: CVS commit: src/sys/net

2017-12-14 Thread Ryota Ozaki
On Thu, Dec 14, 2017 at 3:27 PM, Christos Zoulas  wrote:
> In article <20171214054314.e76edf...@cvs.netbsd.org>,
> Ryota Ozaki  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:   src
>>Committed By:  ozaki-r
>>Date:  Thu Dec 14 05:43:14 UTC 2017
>>
>>Modified Files:
>>   src/sys/net: rtsock.c
>>
>>Log Message:
>>Spinkle __noinline to some non-performance-sensitive functions for debugging
>
> Well, "for debugging" we can always compile with different flags rather
> than penalize everyone/everytime however that small might be.

Is there a flag to suppress auto inlining? COPTS=-fno-inline build.sh?
Anyway no objection to revert the change that is not quite important
(but makes my life easy).

  ozaki-r


Re: CVS commit: src/sys/net

2017-12-13 Thread Christos Zoulas
In article <20171214054314.e76edf...@cvs.netbsd.org>,
Ryota Ozaki  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  ozaki-r
>Date:  Thu Dec 14 05:43:14 UTC 2017
>
>Modified Files:
>   src/sys/net: rtsock.c
>
>Log Message:
>Spinkle __noinline to some non-performance-sensitive functions for debugging

Well, "for debugging" we can always compile with different flags rather
than penalize everyone/everytime however that small might be.

christos



Re: CVS commit: src/sys/net

2017-11-21 Thread Masanobu SAITOH

On 2017/11/22 14:40, Robert Elz wrote:

 Date:Wed, 22 Nov 2017 04:56:52 +
 From:"SAITOH Masanobu" 
 Message-ID:  <20171122045652.2ee75f...@cvs.netbsd.org>

   |  Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535.
   |  The spec states 0 and 65535 are reserved.

Tags are 12 bigs, so 0..4095 (65535 won't fit).   The code that was
added in this commit looks correct (0xfff - it would be better as a
named const defined somewhere) but the comment that accompanies it
(and this commit log entry) are not.


Yes, you're correct. I fixed the comment.

 Thanks.


kre




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: CVS commit: src/sys/net

2017-11-21 Thread Robert Elz
Date:Wed, 22 Nov 2017 04:56:52 +
From:"SAITOH Masanobu" 
Message-ID:  <20171122045652.2ee75f...@cvs.netbsd.org>

  |  Return EINVAL in vlan_config() when a VLAN ID is 0 or 65535.
  |  The spec states 0 and 65535 are reserved.

Tags are 12 bigs, so 0..4095 (65535 won't fit).   The code that was
added in this commit looks correct (0xfff - it would be better as a
named const defined somewhere) but the comment that accompanies it
(and this commit log entry) are not.

kre



Re: CVS commit: src/sys/net

2017-09-24 Thread Ryota Ozaki
On Fri, Sep 22, 2017 at 5:53 PM, Joerg Sonnenberger  wrote:
> On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote:
>> Module Name:  src
>> Committed By: ozaki-r
>> Date: Fri Sep 22 05:05:32 UTC 2017
>>
>> Modified Files:
>>   src/sys/net: route.c
>>
>> Log Message:
>> Remove the global lock for rtcache
>>
>> Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only 
>> by
>> their users. And in existing usages a rtcache is guranteed to be not accessed
>> simultaneously. So the rtcache framework doesn't need any exclusion controls
>> in itself.
>
> I'm a bit suspicous about this since the generation counter is not
> atomic. So what guarantees that route manipulations of different
> subsystems don't race?

If !NET_MPSAFE, it's softnet_lock, however, if NET_MPSAFE, nothing guarantees
that as you suspect. I've fixed the issue. Thanks!

  ozaki-r


Re: CVS commit: src/sys/net

2017-09-22 Thread Joerg Sonnenberger
On Fri, Sep 22, 2017 at 05:05:32AM +, Ryota Ozaki wrote:
> Module Name:  src
> Committed By: ozaki-r
> Date: Fri Sep 22 05:05:32 UTC 2017
> 
> Modified Files:
>   src/sys/net: route.c
> 
> Log Message:
> Remove the global lock for rtcache
> 
> Thanks to removal of LIST_ENTRY of struct route, rtcaches are accessed only by
> their users. And in existing usages a rtcache is guranteed to be not accessed
> simultaneously. So the rtcache framework doesn't need any exclusion controls
> in itself.

I'm a bit suspicous about this since the generation counter is not
atomic. So what guarantees that route manipulations of different
subsystems don't race?

Joerg


Re: CVS commit: src/sys/net

2017-04-05 Thread Ryota Ozaki
On Thu, Apr 6, 2017 at 3:34 AM, Taylor R Campbell
 wrote:
>> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08)
>> From: Paul Goyette 
>>
>> @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
>>   {
>>  int rc;
>>  struct ifreq ifr;
>> +   bool need_unlock = false;
>> +
>> +   /* XXX if_ioctl_lock may or may not be held here */
>> +   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
>> +   mutex_enter(ifp->if_ioctl_lock);
>> +   need_unlock = true;
>> +   }
>>
>> It is my understanding that using mutex_owned() to effect locking
>> decisions is forbidden.
>>
>> I understand the intent of doing it this way, but perhaps we should
>> revisit the callers and enforce that they always take the lock?
>
> Quite right.  Please revert this change and post a note to tech-net@
> about how to resolve the issue properly if you need help, including a
> statement of what code paths do or don't hold the ioctl lock and why
> it's hard to address them.

It tried to put the ioctl lock at upper callers and gave up doing so
because that's too complex (ioctl is easily called recursively) and
the ioctl lock for an interface doesn't fit to upper callers that
don't related to ioctl at all. So I don't think it's worthwhile to
investigate where we should lock correctly at upper callers. If we
correctly fix the issue, we should make ioctl callees MP-safe somehow.

Anyway I know the change isn't the best approach so I'll revert
the commit.

>
> From what I recall (haven't looked recently, though) the ifioctl
> locking may be broken, and may be better written with localcount(9).

That part now uses psref and if_ioctl_lock. I'm not sure localcount
suits more for that.

  ozaki-r


Re: CVS commit: src/sys/net

2017-04-05 Thread Taylor R Campbell
> Date: Wed, 5 Apr 2017 15:46:38 +0800 (+08)
> From: Paul Goyette 
> 
> @@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
>   {
>  int rc;
>  struct ifreq ifr;
> +   bool need_unlock = false;
> +
> +   /* XXX if_ioctl_lock may or may not be held here */
> +   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
> +   mutex_enter(ifp->if_ioctl_lock);
> +   need_unlock = true;
> +   }
> 
> It is my understanding that using mutex_owned() to effect locking 
> decisions is forbidden.
> 
> I understand the intent of doing it this way, but perhaps we should 
> revisit the callers and enforce that they always take the lock?

Quite right.  Please revert this change and post a note to tech-net@
about how to resolve the issue properly if you need help, including a
statement of what code paths do or don't hold the ioctl lock and why
it's hard to address them.

>From what I recall (haven't looked recently, though) the ifioctl
locking may be broken, and may be better written with localcount(9).


Re: CVS commit: src/sys/net

2017-04-05 Thread Paul Goyette

@@ -3385,6 +3428,13 @@ if_mcast_op(ifnet_t *ifp, const unsigned
 {
int rc;
struct ifreq ifr;
+   bool need_unlock = false;
+
+   /* XXX if_ioctl_lock may or may not be held here */
+   if (ifp->if_ioctl_lock != NULL && !mutex_owned(ifp->if_ioctl_lock)) {
+   mutex_enter(ifp->if_ioctl_lock);
+   need_unlock = true;
+   }

It is my understanding that using mutex_owned() to effect locking 
decisions is forbidden.


I understand the intent of doing it this way, but perhaps we should 
revisit the callers and enforce that they always take the lock?




On Wed, 5 Apr 2017, Ryota Ozaki wrote:


Module Name:src
Committed By:   ozaki-r
Date:   Wed Apr  5 03:47:51 UTC 2017

Modified Files:
src/sys/net: if.c if.h if_ethersubr.c link_proto.c

Log Message:
Make sure to hold if_ioctl_lock when calling ifp->if_ioctl

Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.


To generate a diff of this commit:
cvs rdiff -u -r1.389 -r1.390 src/sys/net/if.c
cvs rdiff -u -r1.236 -r1.237 src/sys/net/if.h
cvs rdiff -u -r1.240 -r1.241 src/sys/net/if_ethersubr.c
cvs rdiff -u -r1.34 -r1.35 src/sys/net/link_proto.c

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.


!DSPAM:58e49ead273461966420671!




+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
oops, I was wrong here, please disregard :-)


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote:
> @@ -803,32 +790,14 @@ tunread(dev_t dev, struct uio *uio, int 
>   goto out;
>   }
>   tp->tun_flags |= TUN_RWAIT;
> - if (mtsleep((void *)tp, PZERO|PCATCH|PNORELOCK,
> - "tunread", 0, >tun_lock) != 0) {
> + if (cv_wait_sig(>tun_cv, >tun_lock)) {
>   error = EINTR;
> - goto out_nolock;
> - } else {
> - /*
> -  * Maybe the interface was destroyed while
> -  * we were sleeping, so let's ensure that
> -  * we're looking at the same (valid) tun
> -  * interface before looping.
> -  */
> - tp = tun_find_unit(dev);
> - if (tp == NULL) {
> - error = ENXIO;
> - goto out_nolock;
> - }
> - if (tp->tun_if.if_index != index) {
> - error = ENXIO;
> - goto out;
> - }
> + goto out;
>   }
>   }

If you goto out if tp is NULL, it will dereference it trying to
mutex_exit(>tun_lock);


Re: CVS commit: src/sys/net

2017-01-26 Thread coypu
Hi,

On Thu, Jan 26, 2017 at 09:13:19PM +, Nick Hudson wrote:
> @@ -534,14 +527,12 @@ tun_output(struct ifnet *ifp, struct mbu
>  const struct rtentry *rt)
>  {
>   struct tun_softc *tp = ifp->if_softc;
> - int s;
>   int error;
>  #if defined(INET) || defined(INET6)
>   int mlen;
>   uint32_t*af;
>  #endif
>  
> - s = splnet();
>   mutex_enter(>tun_lock);
>   TUNDEBUG ("%s: tun_output\n", ifp->if_xname);
>  
> @@ -551,6 +542,8 @@ tun_output(struct ifnet *ifp, struct mbu
>   error = EHOSTDOWN;
>   goto out;
>   }
> + // XXXrmind
> + mutex_exit(>tun_lock);
>  
>   /*
>* if the queueing discipline needs packet classification,
> @@ -576,7 +569,7 @@ tun_output(struct ifnet *ifp, struct mbu
>   error = ENOBUFS;
>   goto out;
>   }
> - bcopy(dst, mtod(m0, char *), dst->sa_len);
> + memcpy(mtod(m0, char *), dst, dst->sa_len);
>   }
>  
>   if (tp->tun_flags & TUN_IFHEAD) {
> @@ -617,9 +610,10 @@ tun_output(struct ifnet *ifp, struct mbu
>   goto out;
>   }
>  
> + mutex_enter(>tun_lock);

you call goto out; with >tun-lock not held, it then
mutex_exits it.


Re: CVS commit: src/sys/net

2016-06-27 Thread Kengo NAKAHARA
Hi,

On 2016/06/27 18:55, Roy Marples wrote:
> On 27/06/2016 09:58, Kengo NAKAHARA wrote:
>> Module Name: src
>> Committed By:knakahara
>> Date:Mon Jun 27 08:58:50 UTC 2016
>>
>> Modified Files:
>>  src/sys/net: if.c if.h
>>
>> Log Message:
>> reduce link state changing softint if it is not required
>>
>> ok by ozaki-r@n.o
> 
> There is a spelling mistake, an n is missing.
> 
> if_is_link_state_chageable(ifp)
> 
> should be
> 
> if_is_link_state_changeable(ifp)

Thank you for your point out. I fix it if.c:r1.347 and if.h:r1.217.


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2016-06-27 Thread Roy Marples
On 27/06/2016 09:58, Kengo NAKAHARA wrote:
> Module Name:  src
> Committed By: knakahara
> Date: Mon Jun 27 08:58:50 UTC 2016
> 
> Modified Files:
>   src/sys/net: if.c if.h
> 
> Log Message:
> reduce link state changing softint if it is not required
> 
> ok by ozaki-r@n.o

There is a spelling mistake, an n is missing.

if_is_link_state_chageable(ifp)

should be

if_is_link_state_changeable(ifp)

Roy


Re: CVS commit: src/sys/net

2016-04-11 Thread Taylor R Campbell
   Date: Mon, 11 Apr 2016 14:33:41 +0900
   From: Ryota Ozaki 

   On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell
    wrote:
   > Nice, thanks for doing this!  Have you tried subjecting it to load,
   > with options DIAGNOSTIC?

   Yes. It passes ATF tests (that enable DIAGNOSTIC).

OK, great!  However, I wonder how much load they subject it to, if
they failed to catch the missing membar_producer in the list code
bfeore.

   >-   LIST_REMOVE(bif, bif_next);
   >+   PSLIST_WRITER_REMOVE(bif, bif_next);
   >+   PSLIST_ENTRY_DESTROY(bif, bif_next);
   >
   >BRIDGE_PSZ_PERFORM(sc);
   >
   > This order is incorrect.  You cannot destroy the entry until you have
   > confirmed all readers are done with it.  You must pserialize_perform
   > before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
   > page about this:
   >
   > [...]

   I see. (I should be claimed RTFM :-/)

I clarified some wording early on in the DESCRIPTION section of the
man page.  Please let me know if there's any verbiage in there that's
confusing or hard to follow!

   >@@ -972,7 +984,8 @@ retry:
   >}
   >
   >i = 0;
   >-   LIST_FOREACH(bif, >sc_iflist, bif_next) {
   >+   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   >+   bif_next) {
   >
   > PSLIST_WRITER_FOREACH

   Hm, so we need to choose a macro not what we do in the loop but how the loop
   is protected. Indeed in terms of use of membar.

Correct.  This is why PSLIST_READER_FOREACH is in the section `READER
OPERATIONS' of the man page, which is opened by `The following
operations may be performed on list heads and entries when the caller
is in a passively serialized read section.'.


Re: CVS commit: src/sys/net

2016-04-10 Thread Ryota Ozaki
On Mon, Apr 11, 2016 at 12:31 PM, Taylor R Campbell
 wrote:
>Date: Mon, 11 Apr 2016 02:04:14 +
>From: Ryota Ozaki 
>
>Module Name:src
>Committed By:   ozaki-r
>Date:   Mon Apr 11 02:04:14 UTC 2016
>
>Modified Files:
>src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h
>
>Log Message:
>Use pslist(9) in bridge(4)
>
>This adds missing memory barriers to list operations for pserialize.
>
> Nice, thanks for doing this!  Have you tried subjecting it to load,
> with options DIAGNOSTIC?

Yes. It passes ATF tests (that enable DIAGNOSTIC).

>
>Index: src/sys/net/bridgestp.c
>diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20
>--- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016
>+++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016
>@@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg
> {
>struct bridge_iflist *bif;
>
>-   LIST_FOREACH(bif, >sc_iflist, bif_next) {
>+   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
>+   bif_next) {
>
> I'm pretty sure everything in bridgestp.c runs under a writer lock, so
> you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH.
>
> (PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the
> statement of intent that this is in an exclusive write section, not a
> shared read section.)
>
>@@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc
>
>BRIDGE_LOCK(sc);
>
>-   LIST_FOREACH(bif, >sc_iflist, bif_next) {
>+   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
>+   bif_next) {
>
> For example, this one is definitely done under a writer lock!

Oh yes. I think I already forgot bridgestp.c at all!

>
>Index: src/sys/net/if_bridge.c
>diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112
>--- src/sys/net/if_bridge.c:1.111   Mon Mar 28 04:38:04 2016
>+++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016
>@@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp)
>bridge_stop(ifp, 1);
>
>BRIDGE_LOCK(sc);
>-   while ((bif = LIST_FIRST(>sc_iflist)) != NULL)
>+   for (;;) {
>+   bif = PSLIST_WRITER_FIRST(>sc_iflist, struct 
> bridge_iflist,
>+   bif_next);
>+   if (bif == NULL)
>+   break;
>bridge_delete_member(sc, bif);
>+   }
>+   PSLIST_DESTROY(>sc_iflist);
>
> Any particular reason you switched from `while ((bif = ...) != NULL)'
> to `for (;;) { bif = ...; if (bif == NULL) break;'?   I find the while
> construct clearer, myself.

Just because it's too long (for me) to put it in the condition.

>
>@@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc
>ifs->if_bridge = NULL;
>ifs->if_bridgeif = NULL;
>
>-   LIST_REMOVE(bif, bif_next);
>+   PSLIST_WRITER_REMOVE(bif, bif_next);
>+   PSLIST_ENTRY_DESTROY(bif, bif_next);
>
>BRIDGE_PSZ_PERFORM(sc);
>
> This order is incorrect.  You cannot destroy the entry until you have
> confirmed all readers are done with it.  You must pserialize_perform
> before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
> page about this:
>
> `Either _element_ must never have been inserted into a list, or it
> must have been inserted and removed, and the caller must have waited
> for all parallel readers to finish reading it first.'
>
> The reason is that PSLIST_WRITER_REMOVE only changes the next pointer
> of the *previous* element so that no new readers can discover the
> current element, but leaves the next pointer of the *current* element
> intact so that readers that are still active can get to the next
> element.  Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the
> current element and kasserts that you removed the entry.

I see. (I should be claimed RTFM :-/)

>
> Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null
> poison pointer like QUEUEDEBUG does, so that anyone trying to use it
> afterward will actually crash instead of just thinking they hit the
> end of the list.
>
>@@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s
> retry:
>BRIDGE_LOCK(sc);
>count = 0;
>-   LIST_FOREACH(bif, >sc_iflist, bif_next)
>+   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
>+   bif_next)
>
> Use PSLIST_WRITER_FOREACH here too.
>
>@@ -959,7 +970,8 @@ retry:
>BRIDGE_LOCK(sc);
>
>i = 0;
>-   LIST_FOREACH(bif, >sc_iflist, bif_next)
>+   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
>+   bif_next)
>
> PSLIST_WRITER_FOREACH
>
>i++;
>if (i > count) {
>/*

Re: CVS commit: src/sys/net

2016-04-10 Thread Taylor R Campbell
   Date: Mon, 11 Apr 2016 02:04:14 +
   From: Ryota Ozaki 

   Module Name:src
   Committed By:   ozaki-r
   Date:   Mon Apr 11 02:04:14 UTC 2016

   Modified Files:
   src/sys/net: bridgestp.c if_bridge.c if_bridgevar.h

   Log Message:
   Use pslist(9) in bridge(4)

   This adds missing memory barriers to list operations for pserialize.

Nice, thanks for doing this!  Have you tried subjecting it to load,
with options DIAGNOSTIC?

   Index: src/sys/net/bridgestp.c
   diff -u src/sys/net/bridgestp.c:1.19 src/sys/net/bridgestp.c:1.20
   --- src/sys/net/bridgestp.c:1.19Mon Feb 15 01:11:41 2016
   +++ src/sys/net/bridgestp.c Mon Apr 11 02:04:14 2016
   @@ -341,7 +341,8 @@ bstp_config_bpdu_generation(struct bridg
{
   struct bridge_iflist *bif;

   -   LIST_FOREACH(bif, >sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   +   bif_next) {

I'm pretty sure everything in bridgestp.c runs under a writer lock, so
you should use PSLIST_WRITER_FOREACH here, not PSLIST_READER_FOREACH.

(PSLIST_WRITER_FOREACH is faster on alpha -- but more important is the
statement of intent that this is in an exclusive write section, not a
shared read section.)

   @@ -828,7 +833,8 @@ bstp_initialization(struct bridge_softc 

   BRIDGE_LOCK(sc);

   -   LIST_FOREACH(bif, >sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   +   bif_next) {

For example, this one is definitely done under a writer lock!

   Index: src/sys/net/if_bridge.c
   diff -u src/sys/net/if_bridge.c:1.111 src/sys/net/if_bridge.c:1.112
   --- src/sys/net/if_bridge.c:1.111   Mon Mar 28 04:38:04 2016
   +++ src/sys/net/if_bridge.c Mon Apr 11 02:04:14 2016
   @@ -447,8 +447,14 @@ bridge_clone_destroy(struct ifnet *ifp)
   bridge_stop(ifp, 1);

   BRIDGE_LOCK(sc);
   -   while ((bif = LIST_FIRST(>sc_iflist)) != NULL)
   +   for (;;) {
   +   bif = PSLIST_WRITER_FIRST(>sc_iflist, struct 
bridge_iflist,
   +   bif_next);
   +   if (bif == NULL)
   +   break;
   bridge_delete_member(sc, bif);
   +   }
   +   PSLIST_DESTROY(>sc_iflist);

Any particular reason you switched from `while ((bif = ...) != NULL)'
to `for (;;) { bif = ...; if (bif == NULL) break;'?   I find the while
construct clearer, myself.

   @@ -705,7 +712,8 @@ bridge_delete_member(struct bridge_softc
   ifs->if_bridge = NULL;
   ifs->if_bridgeif = NULL;

   -   LIST_REMOVE(bif, bif_next);
   +   PSLIST_WRITER_REMOVE(bif, bif_next);
   +   PSLIST_ENTRY_DESTROY(bif, bif_next);

   BRIDGE_PSZ_PERFORM(sc);

This order is incorrect.  You cannot destroy the entry until you have
confirmed all readers are done with it.  You must pserialize_perform
before doing PSLIST_ENTRY_DESTROY.  See the note in the pslist(9) man
page about this:

`Either _element_ must never have been inserted into a list, or it
must have been inserted and removed, and the caller must have waited
for all parallel readers to finish reading it first.'

The reason is that PSLIST_WRITER_REMOVE only changes the next pointer
of the *previous* element so that no new readers can discover the
current element, but leaves the next pointer of the *current* element
intact so that readers that are still active can get to the next
element.  Then PSLIST_ENTRY_DESTROY nullifies the next pointer of the
current element and kasserts that you removed the entry.

Maybe PSLIST_ENTRY_DESTROY should really set it to some non-null
poison pointer like QUEUEDEBUG does, so that anyone trying to use it
afterward will actually crash instead of just thinking they hit the
end of the list.

   @@ -939,7 +949,8 @@ bridge_ioctl_gifs(struct bridge_softc *s
retry:
   BRIDGE_LOCK(sc);
   count = 0;
   -   LIST_FOREACH(bif, >sc_iflist, bif_next)
   +   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   +   bif_next)

Use PSLIST_WRITER_FOREACH here too.

   @@ -959,7 +970,8 @@ retry:
   BRIDGE_LOCK(sc);

   i = 0;
   -   LIST_FOREACH(bif, >sc_iflist, bif_next)
   +   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   +   bif_next)

PSLIST_WRITER_FOREACH

   i++;
   if (i > count) {
   /*
   @@ -972,7 +984,8 @@ retry:
   }

   i = 0;
   -   LIST_FOREACH(bif, >sc_iflist, bif_next) {
   +   PSLIST_READER_FOREACH(bif, >sc_iflist, struct bridge_iflist,
   +   bif_next) {

PSLIST_WRITER_FOREACH


re: CVS commit: src/sys/net

2016-04-06 Thread matthew green
> And add an assertion of if_addrlen and ll_addr.

thanks.  i almost did this myself :-)


Re: CVS commit: src/sys/net

2016-02-21 Thread Ryota Ozaki
On Fri, Feb 19, 2016 at 12:16 PM, Roy Marples  wrote:
> On 18/02/2016 11:29, Ryota Ozaki wrote:
>> Okay, I've changed my mind. Let's commit your patch anyway and
>> keep an eye on the lossage messages to know how often and how
>> many lossage happens. If there are issues, we can improve then.
>> It's not too late.
>
> The irony is that overnight I changed my mind also!
> The challenge was to achieve a longer queue in the same size or less.
> According to size(1), the end result of this patch is exactly the same
> size as my previous one :)
>
> This has been achieved with bitmask array macros to densely pack a 2 bit
> integer into a larger integer as a backing store.
> I decided to use a uint16_t which allows for 8 events in the queue.
> Simply changing the integral type will magically expand the queue, so it
> could be changed to a uint64_t to allow 32 events in the queue but this
> is probably overkill.
> For really size challenged systems, this could be changed to a uint8_t
> to allow 4 events in the queue ... which should still be big enough
> normally, but you wanted 10 so the closest match was 8.
>
> I've managed to keep the assignment of the link state to the interface
> within the initial if_link_state_change() call which my prior patch
> moved out.
>
> Also, only one queued event is processed at a time, subsequent ones have
> to wait for another softint(9) to occur.
> After all, the host may have >1 interface and we don't want to hog the
> network processing a larger queue.
>
>> I'm sorry for disturbing you.
>
> No, no. Thankyou for challenging me :)
>
> Maybe the macros here could be put to a more generic use (or there are
> already generic macros for this which I missed?) but unsure how right now.
>
> All being said, I learned a lot about bit shifting doing this and I
> might have made a mistake, but sample test cases for my macros appear to
> work fine. I would appreciate an OK from someone before I commit this.

Wow, I must be convinced by the new design :) I don't understand the macros
very much though, coverity would warn us if there are issues.

Thank you for your effort!
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 15:12:31 +0900
   From: Ryota Ozaki 

   On Wed, Feb 17, 2016 at 2:13 PM, Taylor R Campbell
    wrote:
   > Note that this queueing takes effect only if the link state changes
   > multiple times within maybe a few microseconds, before the softint can
   > run.  If your link state is changing that many times so quickly,
   > losing an event or two is probably the least of your worries

   Yeah, it's nitpicking, but for that reason, I think it's better to pass
   events as they are to userland.

   > -- but
   > you're probably more interested in seeing something like ...down...up
   > than ...up/up/up.

   Yes. (up/up/up events are eliminated in the first place though.)

So what events would you choose to skip, if not the scheme that Roy
described?

I bet that, whatever events you would choose to skip, we can still
prove that the resulting queues need be no longer than, say, three
elements, and we'd still usefully report link flapping to userland --
as long as we can make enough progress to run softints and userland
processes at all.

Here are some example reductions that intuitively sound reasonable to
me:

down/down = down
up/up = up
down/unknown/up = down/up
down/up/down = down

What other sequences of events would you simplify?


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 1:40 PM, Roy Marples  wrote:
> On 2016-02-17 02:49, Ryota Ozaki wrote:
>>>
>>> If you don't read the patch, here is the comment I added to the
>>> if_link_state_change() function:
>>>
>>> * Queue a change in the interface link state.
>>> *
>>> * The queue itself is very limited:
>>> *   no state can be queued more than once
>>> *   a higher priority state will remove lower priority states
>>> *
>>> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
>>> * Rationale is that if the kernel can't process the link state change
>>> queue
>>> * fast enough then userland has no chance of catching up.
>>> * Any lossage is logged to the console.
>>> * The queue state priority is ordered so that link state aware programs
>>> * will still have the correct end result regardless of any lossage.
>>>
>>> Any comments or objections?
>>
>>
>> I worry about the priority order, where it comes from? I feel that
>> the kernel does too much decision; which event is important. I think
>> it depends on applications.
>
>
> Fair enough.
> Can you state a use case for an application needing the full ordering?

Event tracking of link states for example.

>
> Let us be more specific
> UP: DOWN - UP
> Is passed fully
> DOWN: UP - DOWN
> UP is not passed to userland.
> What do you want to achieve in userland during the UP when the kernel
> already considers the link DOWN?

Even if a UP state is transient, it's an event that may provide us a
hint of network conditions for diagnostic. We may be able to get it
from the console output, but it's not so convenient; we need to track
events via two different facilities.

>
>> And the priority provides asymmetric event deliveries; when the state
>> repeats up and down, a down event is delivered if the final state is down
>> while a down event and a up event are delivered if the final state is up.
>> It's confusable to me.
>
>
> It's not that confusing :)
> If you're an application, what benefit do you have of processing an UP state
> on the link when a DOWN state is about to follow?
> Aside from logging it (which we still do, just not via a route message just
> a console diagnostic) what would you do you with it?
> Because a DOWN state is coming, it cannot possibly do anything over the
> interface, so why bother?

Applications can ignore the UP event. IMO we should delegate such a decision
to applications because we don't know all requirements of all applications.

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Wed, 17 Feb 2016 11:49:48 +0900
   From: Ryota Ozaki 

   And the priority provides asymmetric event deliveries; when the state
   repeats up and down, a down event is delivered if the final state is down
   while a down event and a up event are delivered if the final state is up.
   It's confusable to me.

This is the part I'm still not sure about: Roy's patch reduces up/down
to just down, but leaves down/up alone.  I'm guessing that Roy expects
applications like dhcpcd to want to clear some state if the link ever
goes down, but to be uninterested in knowing that the link went up for
a microsecond and then back down again.

   Can we pass events as they are as much as possible? I don't complain that
   event reductions in the kernel, but I think it should be down based on time
   series manner (e.g., pick latest three events), not based on some priority
   things. If we accept event reductions, we can do it with bit-encoding
   (w/o a linked list (memory allocations)), for example represent the state
   as 2 bits and encode event series into a variable (say 16 events in int).

Note that this queueing takes effect only if the link state changes
multiple times within maybe a few microseconds, before the softint can
run.  If your link state is changing that many times so quickly,
losing an event or two is probably the least of your worries -- but
you're probably more interested in seeing something like ...down...up
than ...up/up/up.


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples

On 2016-02-17 02:49, Ryota Ozaki wrote:

If you don't read the patch, here is the comment I added to the
if_link_state_change() function:

* Queue a change in the interface link state.
*
* The queue itself is very limited:
*   no state can be queued more than once
*   a higher priority state will remove lower priority states
*
* The queue state priority in descending order is DOWN, UNKNOWN, UP.
* Rationale is that if the kernel can't process the link state change 
queue

* fast enough then userland has no chance of catching up.
* Any lossage is logged to the console.
* The queue state priority is ordered so that link state aware 
programs

* will still have the correct end result regardless of any lossage.

Any comments or objections?


I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.


Fair enough.
Can you state a use case for an application needing the full ordering?

Let us be more specific
UP: DOWN - UP
Is passed fully
DOWN: UP - DOWN
UP is not passed to userland.
What do you want to achieve in userland during the UP when the kernel 
already considers the link DOWN?



And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is 
down
while a down event and a up event are delivered if the final state is 
up.

It's confusable to me.


It's not that confusing :)
If you're an application, what benefit do you have of processing an UP 
state on the link when a DOWN state is about to follow?
Aside from logging it (which we still do, just not via a route message 
just a console diagnostic) what would you do you with it?
Because a DOWN state is coming, it cannot possibly do anything over the 
interface, so why bother?


Can we pass events as they are as much as possible? I don't complain 
that
event reductions in the kernel, but I think it should be down based on 
time
series manner (e.g., pick latest three events), not based on some 
priority

things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the 
state
as 2 bits and encode event series into a variable (say 16 events in 
int).


Of course we could.
But until we understand a reason why this is needed why should we?

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Ryota Ozaki
On Wed, Feb 17, 2016 at 10:46 AM, Roy Marples  wrote:
> On Tuesday 16 February 2016 22:20:37 Taylor R Campbell wrote:
>> Except for an issue with queueing discussed privately (scheduling a
>> softint that is already scheduled won't cause it to run again, so
>> if_link_state_change_si needs to process the whole queue in one go),
>> that approach looks fine to me, although as we also discussed
>> privately we can easily compact it into a three-bit mask with a
>> trivial update instead of a whole array of states.
>
> Patch attached to solve change from a priority array into a bit mask approach
> where we swallow the whole queue in a softint.
> Thanks to Taylor for some reviews and suggestions.
>
>> This has the consequence that if the link goes up/down in quick
>> succession, and then up/down in quick succession, , with the queue
>> processed after each up/down transition, userland will never be
>> notified.  However, if the link goes down/up, then down/up, , the
>> userland will be notified of all the transitions.  Roy claims that
>> that's OK, and I'm inclined to believe the author of dhcpcd about
>> this.
>
> If you don't read the patch, here is the comment I added to the
> if_link_state_change() function:
>
> * Queue a change in the interface link state.
> *
> * The queue itself is very limited:
> *   no state can be queued more than once
> *   a higher priority state will remove lower priority states
> *
> * The queue state priority in descending order is DOWN, UNKNOWN, UP.
> * Rationale is that if the kernel can't process the link state change queue
> * fast enough then userland has no chance of catching up.
> * Any lossage is logged to the console.
> * The queue state priority is ordered so that link state aware programs
> * will still have the correct end result regardless of any lossage.
>
> Any comments or objections?

I worry about the priority order, where it comes from? I feel that
the kernel does too much decision; which event is important. I think
it depends on applications.

And the priority provides asymmetric event deliveries; when the state
repeats up and down, a down event is delivered if the final state is down
while a down event and a up event are delivered if the final state is up.
It's confusable to me.

Can we pass events as they are as much as possible? I don't complain that
event reductions in the kernel, but I think it should be down based on time
series manner (e.g., pick latest three events), not based on some priority
things. If we accept event reductions, we can do it with bit-encoding
(w/o a linked list (memory allocations)), for example represent the state
as 2 bits and encode event series into a variable (say 16 events in int).

  ozaki-r


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 13:24:15 +
   From: Roy Marples 

   I found the time to work on this, here is the patch I've been running
   for an hour or so upping and downing interfaces.

The rule in Roy's patch is that each new state changes kicks out all
higher-priority transitions, and is queued after all lower-priority
transitions, with priorities DOWN = -1, UNKNOWN = 0, UP = 1.

This has the effect that each queue has the regular expression

down? unknown? up?

with the following equivalences:

down/down = down
down/unknown = down/unknown
down/up = down/up
unknown/down = down
unknown/unknown = unknown
unknown/up = unknown/up
up/down = down
up/unknown = unknown
up/up = up

Except for an issue with queueing discussed privately (scheduling a
softint that is already scheduled won't cause it to run again, so
if_link_state_change_si needs to process the whole queue in one go),
that approach looks fine to me, although as we also discussed
privately we can easily compact it into a three-bit mask with a
trivial update instead of a whole array of states.

This has the consequence that if the link goes up/down in quick
succession, and then up/down in quick succession, , with the queue
processed after each up/down transition, userland will never be
notified.  However, if the link goes down/up, then down/up, , the
userland will be notified of all the transitions.  Roy claims that
that's OK, and I'm inclined to believe the author of dhcpcd about
this.


Re: CVS commit: src/sys/net

2016-02-16 Thread Taylor R Campbell
   Date: Tue, 16 Feb 2016 16:30:54 +0900
   From: Ryota Ozaki 

   On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell
    wrote:
   > void
   > if_link_state_change(struct ifnet *ifp, int link_state)
   > {
   > int s;
   >
   > s = splnet();
   > if (ifp->if_link_state_pending == link_state)
   > goto out;
   >
   > if (ifp->if_link_state_pending != ifp->if_link_state) {

   What does the conditional intend?

I don't remember!  It doesn't make sense now that I look at it again.
This was just a sketch -- obviously it was missing some parts, such as
pool_cache_put, and some parts seem to make no sense.  I would just
take it out.

   Anyway I made a patch based on the above code:
 http://www.netbsd.org/~ozaki-r/link_state_change.diff

   What I did are to make your code compilable and workable,
   and limit the number of pending events.

Looks reasonable to me.  We should have some automatic tests for all
this, though!


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
> On Tuesday 16 February 2016 13:37:28 you wrote:
>>> Oh, I see. We shouldn't drop any events of link state changes.
>>
>> i don't see any point in keeping a huge series of link changes
>> if they can be collapsed into an equivalent sequence.  with VMs
>> and other user-controlled systems it's possible to flood such a
>> link state checking engine.
> 
> The queue can be kept quite short because we don't care about any prior 
> entries each time state changes to LINK_STATE_DOWN.
> Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
> setting LINK_STATE_UNKNOWN.
> Finally we set LINK_STATE_UP.
> 
> So the queue should only ever hold a maximum of 3 items. Could even work it 
> like so

I found the time to work on this, here is the patch I've been running
for an hour or so upping and downing interfaces.

OK to commit?

Roy
Index: sys/net/if.c
===
RCS file: /cvsroot/src/sys/net/if.c,v
retrieving revision 1.324
diff -u -r1.324 if.c
--- sys/net/if.c15 Feb 2016 08:08:04 -  1.324
+++ sys/net/if.c16 Feb 2016 13:22:43 -
@@ -603,7 +603,6 @@
ifp->if_broadcastaddr = 0; /* reliably crash if used uninitialized */
 
ifp->if_link_state = LINK_STATE_UNKNOWN;
-   ifp->if_old_link_state = LINK_STATE_UNKNOWN;
 
ifp->if_capenable = 0;
ifp->if_csum_flags_tx = 0;
@@ -1562,41 +1561,73 @@
}
 }
 
+/* Priority list for link state changes */
+static const int
+if_link_state_prio[LINK_STATE_MAX] = {
+   0,  /* LINK_STATE_UNKNOWN */
+   -1, /* LINK_STATE_DOWN */
+   1,  /* LINK_STATE_UP */
+};
+
 /*
  * Handle a change in the interface link state.
- * XXX: We should listen to the routing socket in-kernel rather
- * than calling in6_if_link_* functions directly from here.
  */
 void
 if_link_state_change(struct ifnet *ifp, int link_state)
 {
-   int s;
+   int s, i;
+
+   KASSERT(link_state >= 0 && link_state < LINK_STATE_MAX);
 
s = splnet();
-   if (ifp->if_link_state == link_state) {
-   splx(s);
-   return;
+   if (ifp->if_link_state == link_state && ifp->if_link_queue_len == 0)
+   goto out;
+
+   for (i = 0; i < ifp->if_link_queue_len; i++) {
+   if (if_link_state_prio[link_state] <=
+   if_link_state_prio[ifp->if_link_queue[i]])
+   {
+   ifp->if_link_queue[i] = link_state;
+   ifp->if_link_queue_len = i + 1;
+   /* Because we are trimming the queue,
+* there is no need to call softint_schedule again. */
+   goto out;
+   }
}
 
-   ifp->if_link_state = link_state;
+   KASSERT(ifp->if_link_queue_len + 1 < __arraycount(ifp->if_link_queue));
+   ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_si);
 
+out:
splx(s);
 }
 
-
 static void
 if_link_state_change_si(void *arg)
 {
struct ifnet *ifp = arg;
-   int s;
+   int s, i;
int link_state, old_link_state;
struct domain *dp;
 
s = splnet();
-   link_state = ifp->if_link_state;
-   old_link_state = ifp->if_old_link_state;
-   ifp->if_old_link_state = ifp->if_link_state;
+   KASSERT(ifp->if_link_queue_len >= 0); /* unlikely */
+   if (ifp->if_link_queue_len == 0)
+   goto out;
+
+   /* Pop the link state change from the queue */
+   link_state = ifp->if_link_queue[0];
+   ifp->if_link_queue_len--;
+   for (i = 0; i < ifp->if_link_queue_len; i++)
+   ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];
+
+   /* Ensure link state is actually changing */
+   if (ifp->if_link_state == link_state)
+   goto out;
+
+   old_link_state = ifp->if_link_state;
+   ifp->if_link_state = link_state;
 
 #ifdef DEBUG
log(LOG_DEBUG, "%s: link state %s (was %s)\n", ifp->if_xname,
@@ -1640,6 +1671,7 @@
dp->dom_if_link_state_change(ifp, link_state);
}
 
+out:
splx(s);
 }
 
Index: sys/net/if.h
===
RCS file: /cvsroot/src/sys/net/if.h,v
retrieving revision 1.197
diff -u -r1.197 if.h
--- sys/net/if.h16 Feb 2016 01:31:26 -  1.197
+++ sys/net/if.h16 Feb 2016 13:22:43 -
@@ -191,10 +191,12 @@
 
 /*
  * Values for if_link_state.
+ * When changing these values, consider if_link_state_prio in if.c.
  */
 #defineLINK_STATE_UNKNOWN  0   /* link invalid/unknown */
 #defineLINK_STATE_DOWN 1   /* link is down */
 #defineLINK_STATE_UP   2   /* link is up */
+#defineLINK_STATE_MAX  3   /* size of link states */
 
 /*
  * Structure defining a queue for a network interface.
@@ 

Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On 16/02/2016 08:38, Roy Marples wrote:
>   for (i = 0; i < ifp->if_link_queue_len; i++) {
>   if (link_state <= ifp->if_link_queue[i]) {
>   ifp->if_link_queue[i] = link_state;
>   ifp->if_link_queue_len = i;

This should be ifp->if_link_queue_len = i + 1;

>   /* Because we are trimming the queue,
>* there is no need to call softint_schedule again. */
>   goto out;
>   }
>   }

Roy


Re: CVS commit: src/sys/net

2016-02-16 Thread Roy Marples
On Tuesday 16 February 2016 13:37:28 you wrote:
> > Oh, I see. We shouldn't drop any events of link state changes.
> 
> i don't see any point in keeping a huge series of link changes
> if they can be collapsed into an equivalent sequence.  with VMs
> and other user-controlled systems it's possible to flood such a
> link state checking engine.

The queue can be kept quite short because we don't care about any prior 
entries each time state changes to LINK_STATE_DOWN.
Also, we don't care about any prior entries (other than LINK_STATE_DOWN) when 
setting LINK_STATE_UNKNOWN.
Finally we set LINK_STATE_UP.

So the queue should only ever hold a maximum of 3 items. Could even work it 
like so

void
if_link_state_change(struct ifnet *ifp, int link_state)
{
int s, i;

s = splnet();

if (ifp->if_link_queue_len == 0 && ifp->link_state == link_state)
goto out;

for (i = 0; i < ifp->if_link_queue_len; i++) {
if (link_state <= ifp->if_link_queue[i]) {
ifp->if_link_queue[i] = link_state;
ifp->if_link_queue_len = i;
/* Because we are trimming the queue,
 * there is no need to call softint_schedule again. */
goto out;
}
}

/* Add bounds check here for sanity */
ifp->if_link_queue[ifp->if_link_queue_len++] = link_state;
softint_schedule(ifp->if_link_state_softint);

out:
splx(s);
}


static void
if_link_state_change_si(void *arg)
{
int s, link_state, i;

s = splnet();
if (ifp->if_link_queue_len == 0)
goto out;

link_state = ifp->if_link_queue[0];
ifp->if_link_queue_len--;
for (i = 0; i < ifp->link_queue_len; i++)
ifp->if_link_queue[i] = ifp->if_link_queue[i + 1];

/* process state change as normal */

out:
splx(s);
}

Thoughts?

Roy


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Tue, Feb 16, 2016 at 6:16 AM, Taylor R Campbell
 wrote:
>Date: Mon, 15 Feb 2016 11:06:52 +
>From: Roy Marples 
>
>On 15/02/2016 10:32, Ryota Ozaki wrote:
>> I think we can fix by returning from if_link_state_change_si
>> if ifp->if_link_state == ifp->if_old_link_state. Is it correct?
>
>Then we're not doing what we should because the protocol actions may not
>fire.
>
>Can we not extend softint_schedule to take some user data? That way,
>each call (which I'm assuming would be sequential) can carry the link
>state change in the user data which can then be applied to the interface.
>
> Changing softint is not really a good idea -- it is a significant
> semantic change that most current users don't require, because they
> already have existing queue mechanisms, e.g. pktq.
>
> If it is necessary to report every link state transition, then we need
> to create a queueing mechanism -- and possibly drop some transitions
> in the case of memory shortage, as we already do in route_enqueue.
>
> Maybe something like this, with a queue and a link_state_pending field
> to avoid creating extra queue entries unnecessarily but guaranteeing
> that the most recent link state change will take effect, and be
> preceded by LINK_STATE_DOWN if anything was lost:

Oops. Our (IIJ) local changes have a similar code (softint + queuing)
and it works for long time. So I concur the design.

>
> struct link_state_change {
> SIMPLEQ_ENTRY(link_state_change)lsc_entry;
> int lsc_state;
> boollsc_lost;
> };
>
> static pool_cache_t link_state_change_pc __read_mostly;
>
> void
> if_link_state_change(struct ifnet *ifp, int link_state)
> {
> int s;
>
> s = splnet();
> if (ifp->if_link_state_pending == link_state)
> goto out;
>
> if (ifp->if_link_state_pending != ifp->if_link_state) {

What does the conditional intend?

> struct link_state_change *lsc;
>
> lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT);
> if (lsc == NULL) {
> ifp->if_link_state_lost = true;
> goto change;
> }
>
> lsc->lsc_state = link_state;
> lsc->lsc_lost = ifp->if_link_state_lost;
> SIMPLEQ_INSERT_TAIL(>if_link_state_changes, lsc,
> lsc_entry);
> }
>
> ifp->if_link_state_lost = false;
> change: ifp->if_link_state_pending = link_state;
> softint_schedule(ifp->if_link_state_softint);
> out:splx(s);
> }
>
> static void
> if_link_state_change_softintr(void *cookie)
> {
> struct ifnet *ifp = cookie;
> struct link_state_change *lsc;
> int s;
>
> s = splnet();
> while (!SIMPLEQ_EMPTY(>if_link_state_changes)) {
> lsc = SIMPLEQ_HEAD(>if_link_state_changes);
> SIMPLEQ_REMOVE_HEAD(>if_link_state_changes, lsc_entry);
> if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost);
> }
>
> if (ifp->if_link_state_pending != ifp->if_link_state)
> if_link_state_change_1(ifp, ifp->if_link_state_pending,
> ifp->if_link_state_lost);
> ifp->if_link_state_lost = false;
> splx(s);
> }
>
> static void
> if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost)
> {
>
> if (lost ||
> (ifp->if_link_state == LINK_STATE_UP &&
> link_state == LINK_STATE_UNKNOWN)) {
> /* pretend LINK_STATE_DOWN first */
> }
>
> ifp->if_link_state = link_state;
> /* call domain dom_if_link_state_change callbacks */
> /* call rt_ifmsg */
> }

Anyway I made a patch based on the above code:
  http://www.netbsd.org/~ozaki-r/link_state_change.diff

What I did are to make your code compilable and workable,
and limit the number of pending events.

Thanks,
  ozaki-r


Re: CVS commit: src/sys/net

2016-02-15 Thread Ryota Ozaki
On Tue, Feb 16, 2016 at 11:37 AM, matthew green  wrote:
>> Oh, I see. We shouldn't drop any events of link state changes.
>
> i don't see any point in keeping a huge series of link changes
> if they can be collapsed into an equivalent sequence.  with VMs
> and other user-controlled systems it's possible to flood such a
> link state checking engine.

Sorry for my bad wording. I meant we shouldn't drop events _in normal
cases_; the number of events should be several. If a huge number of
events happen in a short period (e.g., due to hardware troubles,
abnormal VMMs, etc.), we should eliminate events to protect the system.

  ozaki-r


re: CVS commit: src/sys/net

2016-02-15 Thread matthew green
> Oh, I see. We shouldn't drop any events of link state changes.

i don't see any point in keeping a huge series of link changes
if they can be collapsed into an equivalent sequence.  with VMs
and other user-controlled systems it's possible to flood such a
link state checking engine.


.mrg.


Re: CVS commit: src/sys/net

2016-02-15 Thread Taylor R Campbell
   Date: Mon, 15 Feb 2016 11:06:52 +
   From: Roy Marples 

   On 15/02/2016 10:32, Ryota Ozaki wrote:
   > I think we can fix by returning from if_link_state_change_si
   > if ifp->if_link_state == ifp->if_old_link_state. Is it correct?

   Then we're not doing what we should because the protocol actions may not
   fire.

   Can we not extend softint_schedule to take some user data? That way,
   each call (which I'm assuming would be sequential) can carry the link
   state change in the user data which can then be applied to the interface.

Changing softint is not really a good idea -- it is a significant
semantic change that most current users don't require, because they
already have existing queue mechanisms, e.g. pktq.

If it is necessary to report every link state transition, then we need
to create a queueing mechanism -- and possibly drop some transitions
in the case of memory shortage, as we already do in route_enqueue.

Maybe something like this, with a queue and a link_state_pending field
to avoid creating extra queue entries unnecessarily but guaranteeing
that the most recent link state change will take effect, and be
preceded by LINK_STATE_DOWN if anything was lost:

struct link_state_change {
SIMPLEQ_ENTRY(link_state_change)lsc_entry;
int lsc_state;
boollsc_lost;
};

static pool_cache_t link_state_change_pc __read_mostly;

void
if_link_state_change(struct ifnet *ifp, int link_state)
{
int s;

s = splnet();
if (ifp->if_link_state_pending == link_state)
goto out;

if (ifp->if_link_state_pending != ifp->if_link_state) {
struct link_state_change *lsc;

lsc = pool_cache_get(link_state_change_pc, PR_NOWAIT);
if (lsc == NULL) {
ifp->if_link_state_lost = true;
goto change;
}

lsc->lsc_state = link_state;
lsc->lsc_lost = ifp->if_link_state_lost;
SIMPLEQ_INSERT_TAIL(>if_link_state_changes, lsc,
lsc_entry);
}

ifp->if_link_state_lost = false;
change: ifp->if_link_state_pending = link_state;
softint_schedule(ifp->if_link_state_softint);
out:splx(s);
}

static void
if_link_state_change_softintr(void *cookie)
{
struct ifnet *ifp = cookie;
struct link_state_change *lsc;
int s;

s = splnet();
while (!SIMPLEQ_EMPTY(>if_link_state_changes)) {
lsc = SIMPLEQ_HEAD(>if_link_state_changes);
SIMPLEQ_REMOVE_HEAD(>if_link_state_changes, lsc_entry);
if_link_state_change_1(ifp, lsc->lsc_state, lsc->lsc_lost);
}

if (ifp->if_link_state_pending != ifp->if_link_state)
if_link_state_change_1(ifp, ifp->if_link_state_pending,
ifp->if_link_state_lost);
ifp->if_link_state_lost = false;
splx(s);
}

static void
if_link_state_change_1(struct ifnet *ifp, int link_state, bool lost)
{

if (lost ||
(ifp->if_link_state == LINK_STATE_UP &&
link_state == LINK_STATE_UNKNOWN)) {
/* pretend LINK_STATE_DOWN first */
}

ifp->if_link_state = link_state;
/* call domain dom_if_link_state_change callbacks */
/* call rt_ifmsg */
}


Re: CVS commit: src/sys/net

2016-02-15 Thread Martin Husemann
On Mon, Feb 15, 2016 at 11:06:52AM +, Roy Marples wrote:
> Can we not extend softint_schedule to take some user data? That way,
> each call (which I'm assuming would be sequential) can carry the link
> state change in the user data which can then be applied to the interface.

Sounds like a workqueue(9) then...

Martin


Re: CVS commit: src/sys/net

2016-02-15 Thread Roy Marples
On 15/02/2016 10:32, Ryota Ozaki wrote:
> On Mon, Feb 15, 2016 at 6:35 PM, Roy Marples  wrote:
>> On 15/02/2016 08:08, Ryota Ozaki wrote:
>>> Module Name:  src
>>> Committed By: ozaki-r
>>> Date: Mon Feb 15 08:08:04 UTC 2016
>>>
>>> Modified Files:
>>>   src/sys/net: if.c if.h
>>>
>>> Log Message:
>>> Run if_link_state_change in softint
>>>
>>> if_link_state_change can execute the network stack that is expected to
>>> not run in hardware interrupt (at least now), however network drivers
>>> may call it in hardware interrupt. Avoid that by introducing a new
>>> softint for if_link_state_change.
>>
>> I don't know anything about softints, so this could be a silly question,
>> but this change looks racey to me. Is there a guarantee anywhere that
>> if_link_state_change() won't be called again before
>> if_link_state_change_si() has been called?
> 
> Not guaranteed. Hmm, you're right. There is a race condition;
> ifp->if_link_state can be overwritten before calling
> if_link_state_change_si and it does matter if ifp->if_link_state
> has been back to ifp->if_old_link_state. (If ifp->if_link_state
> is different from ifp->if_old_link_state, it doesn't matter
> because it's just that the later state wins.)
> 
> I think we can fix by returning from if_link_state_change_si
> if ifp->if_link_state == ifp->if_old_link_state. Is it correct?

Then we're not doing what we should because the protocol actions may not
fire.

Can we not extend softint_schedule to take some user data? That way,
each call (which I'm assuming would be sequential) can carry the link
state change in the user data which can then be applied to the interface.

Roy


Re: CVS commit: src/sys/net

2016-01-13 Thread Kengo NAKAHARA
Hi,

On 2016/01/13 7:10, David Laight wrote:
> On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote:
>> # Sorry, I forgot to subscribe source-changes-d ml, I reply as
>> # a new mail.
>>
>> Hi,
>>
>>> In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
>>> Kengo NAKAHARA  wrote:
 -=-=-=-=-=-

 Module Name:   src
 Committed By:  knakahara
 Date:  Thu Dec 10 08:11:03 UTC 2015

 Modified Files:
src/sys/net: if_gif.c

 Log Message:
 kmem_zalloc(, KM_SLEEP) must not return NULL.
>>>
>>> I would like to solicit opinions about this change and form a general
>>> policy.
>>>
>>> 1. I would like to reduce the use of KASSERT in the kernel, specially
>>> in situations like thee above where the test can be centralized (inside
>>> kmem_alloc) and avoided without being fatal.
>>
>> OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
> 
> There is no real point using KASSERT() when the immediate
> dereference of NULL will have the same effect and is about as
> easy to debug.
> 
> Whether kmem_alloc(KM_SLEEP) can return NULL is another matter.
> Seems wrong to me - maybe even for impossible allications.
> ISTR a problem waiting for KVA and phys-mem being 'difficult'.
> I know that the Linux equivalent will return NULL, not sure when.
> It would be useful to define that allocation for non-huge
> (maybe even several MB) amounts will always sleep.
> 
>   David

Have you read this thread?
http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019762.html

I think Mr. Chuck Silvers points out a similar issue.
http://mail-index.netbsd.org/tech-kern/2015/12/11/msg019772.html


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2016-01-12 Thread David Laight
On Fri, Dec 11, 2015 at 11:37:29AM +0900, Kengo NAKAHARA wrote:
> # Sorry, I forgot to subscribe source-changes-d ml, I reply as
> # a new mail.
> 
> Hi,
> 
> > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> > Kengo NAKAHARA  wrote:
> >>-=-=-=-=-=-
> >>
> >>Module Name:src
> >>Committed By:   knakahara
> >>Date:   Thu Dec 10 08:11:03 UTC 2015
> >>
> >>Modified Files:
> >>src/sys/net: if_gif.c
> >>
> >>Log Message:
> >>kmem_zalloc(, KM_SLEEP) must not return NULL.
> > 
> > I would like to solicit opinions about this change and form a general
> > policy.
> > 
> > 1. I would like to reduce the use of KASSERT in the kernel, specially
> > in situations like thee above where the test can be centralized (inside
> > kmem_alloc) and avoided without being fatal.
> 
> OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.

There is no real point using KASSERT() when the immediate
dereference of NULL will have the same effect and is about as
easy to debug.

Whether kmem_alloc(KM_SLEEP) can return NULL is another matter.
Seems wrong to me - maybe even for impossible allications.
ISTR a problem waiting for KVA and phys-mem being 'difficult'.
I know that the Linux equivalent will return NULL, not sure when.
It would be useful to define that allocation for non-huge
(maybe even several MB) amounts will always sleep.

David

-- 
David Laight: da...@l8s.co.uk


Re: CVS commit: src/sys/net

2015-12-10 Thread Kengo NAKAHARA
# Sorry, I forgot to subscribe source-changes-d ml, I reply as
# a new mail.

Hi,

> In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> Kengo NAKAHARA  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:  src
>>Committed By: knakahara
>>Date: Thu Dec 10 08:11:03 UTC 2015
>>
>>Modified Files:
>>  src/sys/net: if_gif.c
>>
>>Log Message:
>>kmem_zalloc(, KM_SLEEP) must not return NULL.
> 
> I would like to solicit opinions about this change and form a general
> policy.
> 
> 1. I would like to reduce the use of KASSERT in the kernel, specially
> in situations like thee above where the test can be centralized (inside
> kmem_alloc) and avoided without being fatal.

OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.


> 2. Static analyzer models understand allocators, but they are not
> smart enough to determine under which situations they can fail. I
> believe even kmem_alloc with KM_SLEEP can fail when the size is
> large enough.

I have a question. The man of kmem(9) says:

 kmflags  Either of the following:

  KM_SLEEPIf the allocation cannot be satisfied immediately,
  sleep until enough memory is available.

Is this manual incorrect?
I'm confused... Could you tell me easily comprehensible manual?


> So I propose to always check the return value of allocators with
> an 'if' and not a KASSERT.

There are some codes like "foo = kmem_alloc(size, KM_SLEEP); KASSERT(foo != 
NULL)".
Should the codes be unified to use not KASSERT' but if'?


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA 


Re: CVS commit: src/sys/net

2015-12-10 Thread Christos Zoulas
On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
-- Subject: Re: CVS commit: src/sys/net

| Hi,
| 
| > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
| > Kengo NAKAHARA <source-changes-d%NetBSD.org@localhost> wrote:
| >>-=-=-=-=-=-
| >>
| >>Module Name:src
| >>Committed By:   knakahara
| >>Date:   Thu Dec 10 08:11:03 UTC 2015
| >>
| >>Modified Files:
| >>src/sys/net: if_gif.c
| >>
| >>Log Message:
| >>kmem_zalloc(, KM_SLEEP) must not return NULL.
| > 
| > I would like to solicit opinions about this change and form a general
| > policy.
| > 
| > 1. I would like to reduce the use of KASSERT in the kernel, specially
| > in situations like thee above where the test can be centralized (inside
| > kmem_alloc) and avoided without being fatal.
| 
| OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
| 
| 
| > 2. Static analyzer models understand allocators, but they are not
| > smart enough to determine under which situations they can fail. I
| > believe even kmem_alloc with KM_SLEEP can fail when the size is
| > large enough.
| 
| I have a question. The man of kmem(9) says:
| 
|  kmflags  Either of the following:
| 
|   KM_SLEEPIf the allocation cannot be satisfied immediately,
|   sleep until enough memory is available.


int ret = uvm_km_kmem_alloc(kmem_va_arena,
(vsize_t)round_page(size),
((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) 
 | VM_INSTANTFIT, (vmem_addr_t *));
if (ret) {
return NULL; 
} 


| 
| Is this manual incorrect?
| I'm confused... Could you tell me easily comprehensible manual?

It is documented that it does not fail if you sleep because this
is the usual scenario. Follow the path to the rebbit hole and you
go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you
see that under certain conditions it will fail even when you sleep.
Note that bt_refill return values are not even always checked, so
this can fail in other mysterious ways.

| > So I propose to always check the return value of allocators with
| > an 'if' and not a KASSERT.
| 
| There are some codes like "foo = kmem_alloc(size, KM_SLEEP);
| KASSERT(foo != NULL)".
| Should the codes be unified to use not KASSERT' but if'?

Yes (when it is possible), and the man page for kmem_alloc should be
changed to reflect that.

christos


Re: CVS commit: src/sys/net

2015-12-10 Thread Christos Zoulas
In article <20151210081103.e0fbbf...@cvs.netbsd.org>,
Kengo NAKAHARA  wrote:
>-=-=-=-=-=-
>
>Module Name:   src
>Committed By:  knakahara
>Date:  Thu Dec 10 08:11:03 UTC 2015
>
>Modified Files:
>   src/sys/net: if_gif.c
>
>Log Message:
>kmem_zalloc(, KM_SLEEP) must not return NULL.

I would like to solicit opinions about this change and form a general
policy.

1. I would like to reduce the use of KASSERT in the kernel, specially
in situations like thee above where the test can be centralized (inside
kmem_alloc) and avoided without being fatal.

2. Static analyzer models understand allocators, but they are not
smart enough to determine under which situations they can fail. I
believe even kmem_alloc with KM_SLEEP can fail when the size is
large enough.

So I propose to always check the return value of allocators with
an 'if' and not a KASSERT.

christos



Re: CVS commit: src/sys/net

2015-12-10 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

> In article <20151210081103.e0fbbf...@cvs.netbsd.org>,
> Kengo NAKAHARA  wrote:
>>-=-=-=-=-=-
>>
>>Module Name:  src
>>Committed By: knakahara
>>Date: Thu Dec 10 08:11:03 UTC 2015
>>
>>Modified Files:
>>  src/sys/net: if_gif.c
>>
>>Log Message:
>>kmem_zalloc(, KM_SLEEP) must not return NULL.
>
> I would like to solicit opinions about this change and form a general
> policy.
>
> 1. I would like to reduce the use of KASSERT in the kernel, specially
> in situations like thee above where the test can be centralized (inside
> kmem_alloc) and avoided without being fatal.
>
> 2. Static analyzer models understand allocators, but they are not
> smart enough to determine under which situations they can fail. I
> believe even kmem_alloc with KM_SLEEP can fail when the size is
> large enough.
>
> So I propose to always check the return value of allocators with
> an 'if' and not a KASSERT.

I think that a function needs to have a contract specified in a contract
(and perhaps static analyzer markup).  Code should never KASSERT for
anything that can not be argued (statically shown) to be always true
given contracts.  So I agree with you.


signature.asc
Description: PGP signature


Re: CVS commit: src/sys/net

2015-12-10 Thread Kengo NAKAHARA
Hi,

On 2015/12/11 13:00, Christos Zoulas wrote:
> On Dec 11, 11:37am, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
> -- Subject: Re: CVS commit: src/sys/net
> 
> | Hi,
> | 
> | > In article <20151210081103.E0FBBFB83%cvs.NetBSD.org@localhost>,
> | > Kengo NAKAHARA <source-changes-d%NetBSD.org@localhost> wrote:
> | >>-=-=-=-=-=-
> | >>
> | >>Module Name:  src
> | >>Committed By: knakahara
> | >>Date: Thu Dec 10 08:11:03 UTC 2015
> | >>
> | >>Modified Files:
> | >>  src/sys/net: if_gif.c
> | >>
> | >>Log Message:
> | >>kmem_zalloc(, KM_SLEEP) must not return NULL.
> | > 
> | > I would like to solicit opinions about this change and form a general
> | > policy.
> | > 
> | > 1. I would like to reduce the use of KASSERT in the kernel, specially
> | > in situations like thee above where the test can be centralized (inside
> | > kmem_alloc) and avoided without being fatal.
> | 
> | OK, this kmem_zalloc() is not fatal. I should avoid KASSERT here.
> | 
> | 
> | > 2. Static analyzer models understand allocators, but they are not
> | > smart enough to determine under which situations they can fail. I
> | > believe even kmem_alloc with KM_SLEEP can fail when the size is
> | > large enough.
> | 
> | I have a question. The man of kmem(9) says:
> | 
> |  kmflags  Either of the following:
> | 
> |   KM_SLEEPIf the allocation cannot be satisfied immediately,
> |   sleep until enough memory is available.
> 
>   
> int ret = uvm_km_kmem_alloc(kmem_va_arena,
> (vsize_t)round_page(size),
> ((kmflags & KM_SLEEP) ? VM_SLEEP : VM_NOSLEEP) 
>  | VM_INSTANTFIT, (vmem_addr_t *));
> if (ret) {
> return NULL; 
> } 
>   
> 
> | 
> | Is this manual incorrect?
> | I'm confused... Could you tell me easily comprehensible manual?
> 
> It is documented that it does not fail if you sleep because this
> is the usual scenario. Follow the path to the rebbit hole and you
> go the uvm_km_kmem_alloc -> vmem_alloc -> bt_refill etc. and you
> see that under certain conditions it will fail even when you sleep.

I see. Thank you for your explanation.

> Note that bt_refill return values are not even always checked, so
> this can fail in other mysterious ways.

Oh...

> | > So I propose to always check the return value of allocators with
> | > an 'if' and not a KASSERT.
> | 
> | There are some codes like "foo = kmem_alloc(size, KM_SLEEP);
> | KASSERT(foo != NULL)".
> | Should the codes be unified to use not KASSERT' but if'?
> 
> Yes (when it is possible), and the man page for kmem_alloc should be
> changed to reflect that.

I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now.

> Log Message:
> Spell out that KM_SLEEP allocations can fail.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.17 -r1.18 src/share/man/man9/kmem.9

Thank you for your quickly update!


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA <k-nakah...@iij.ad.jp>


Re: CVS commit: src/sys/net

2015-12-10 Thread Christos Zoulas
On Dec 11,  1:25pm, k-nakah...@iij.ad.jp (Kengo NAKAHARA) wrote:
-- Subject: Re: CVS commit: src/sys/net


| I revert KASSERT in if_gif.c, and I use 'if' instead of KASSERT from now.

Thanks.

| Thank you for your quickly update!

Thanks for bringing this up. I've been meaning to do something about it
for a long time ;-)

christos


Re: CVS commit: src/sys/net/npf

2015-10-18 Thread Christos Zoulas
In article <20151019002957.b972...@cvs.netbsd.org>,
Christos Zoulas  wrote:
>Log Message:
>Fix the code so that it works in all 3 cases: non-modular, modular/builtin,
>modular/filesystem. In the non-modular case we initialize through attach.
>In the modular/builtin case we define the module to be class misc so it
>attaches late (after percpu is initialized) since driver modules attach
>too early.  In the modular/filesystem case we define it to be a driver
>module since we autoload it via /dev/npf open.

I still don't like my fix, perhaps jared's is better. Having a single
attach()/detach() entry point for both modular and non-modular kernels
certainly seems desirable. Also there needs a way to specify a dependency
graph for modules so that they can be loaded after their dependencies.
Finally resource management (things that the modules need to cleanup
and may be busy) need to be handled better.

christos



Re: CVS commit: src/sys/net

2015-10-05 Thread Ryota Ozaki
On Wed, Sep 30, 2015 at 11:15 PM, Christoph Badura  wrote:
> On Wed, Sep 30, 2015 at 07:12:32AM +, Ryota Ozaki wrote:
>> Modified Files:
>>   src/sys/net: if.h if_llatbl.c if_llatbl.h
>>
>> Log Message:
>> Make GATEWAY (fastforward) work again
>>
>> With GATEWAY (fastforward), the whole forwarding processing runs in
>> hardware interrupt context. So we cannot use rwlock for lltable and
>> llentry in that case.
>>
>> This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
>> when GATEWAY is enabled. We need to tweak locking only around rtree
>> in lltable_free. Other than that, what we need to do is to change macros
>> for locks.
>>
>> I hope fastforward runs in softint some day in the future...
>
> Well, the point of the fast forwarding code was that the fast forwarding
> is done in interrupt context.  If no flow exists the normal (slow)
> forwarding path running in softint is used.

(Hardware) interrupt context isn't must for the fast forwarding,
IIUC. FreeBSD doesn't run it in the interrupt context nowadays and we
already have some network drivers run if_input (ether_input normally)
in softint context, e.g., pq3etsec [*].

My hope is that all network device drivers implement softint-based RX and
run if_input (i.e., Layer 2) in softint context. Then we can make
Layer 2 components (bridge, vlan, bpf, etc.) MP-safe easier than now.

[*] 
https://wiki.netbsd.org/users/msaitoh/Comparison_of_implementations_of_Ethernet_drivers/

  ozaki-r


Re: CVS commit: src/sys/net

2015-09-30 Thread Christoph Badura
On Wed, Sep 30, 2015 at 07:12:32AM +, Ryota Ozaki wrote:
> Modified Files:
>   src/sys/net: if.h if_llatbl.c if_llatbl.h
> 
> Log Message:
> Make GATEWAY (fastforward) work again
> 
> With GATEWAY (fastforward), the whole forwarding processing runs in
> hardware interrupt context. So we cannot use rwlock for lltable and
> llentry in that case.
> 
> This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
> when GATEWAY is enabled. We need to tweak locking only around rtree
> in lltable_free. Other than that, what we need to do is to change macros
> for locks.
> 
> I hope fastforward runs in softint some day in the future...

Well, the point of the fast forwarding code was that the fast forwarding
is done in interrupt context.  If no flow exists the normal (slow)
forwarding path running in softint is used.

--chris


Re: CVS commit: src/sys/net/if_vlan.c

2015-04-03 Thread Ryota Ozaki
On Wed, Apr 1, 2015 at 5:50 AM, Christoph Badura b...@bsd.de wrote:
 On Tue, Mar 31, 2015 at 11:09:02AM +0900, Ryota Ozaki wrote:
 On Tue, Mar 31, 2015 at 1:42 AM, Matt Thomas m...@3am-software.com wrote:
  It lacks compensating for the fact the CRC has been stripped by the
  time vlan gets the packet.

 # matt@ explained well than me...

 Note that the issue was found by some strict packet capture software
 (not by me, so I don't know what it is); typical OSes don't complain
 the extra padding, so we didn't notice the issue until recently.

 Yeah.  I misunderstood your message in the PR.  I read it as if
 if_vlan.c should be sending 72 byte minimum size packets and the code
 was doing something different.  That's why I asked.

 Maybe we should update the PR.

Updated!
  ozaki-r


Re: CVS commit: src/sys/net/if_vlan.c

2015-03-31 Thread Christoph Badura
On Tue, Mar 31, 2015 at 11:09:02AM +0900, Ryota Ozaki wrote:
 On Tue, Mar 31, 2015 at 1:42 AM, Matt Thomas m...@3am-software.com wrote:
  It lacks compensating for the fact the CRC has been stripped by the
  time vlan gets the packet.

 # matt@ explained well than me...

 Note that the issue was found by some strict packet capture software
 (not by me, so I don't know what it is); typical OSes don't complain
 the extra padding, so we didn't notice the issue until recently.

Yeah.  I misunderstood your message in the PR.  I read it as if
if_vlan.c should be sending 72 byte minimum size packets and the code
was doing something different.  That's why I asked.

Maybe we should update the PR.

--chris


Re: CVS commit: src/sys/net/if_vlan.c

2015-03-30 Thread Christoph Badura
On Sun, Mar 29, 2015 at 01:30:43PM +, Ryota Ozaki wrote:
 Module Name:  src
 Committed By: ozaki-r
 Date: Sun Mar 29 13:30:43 UTC 2015
 
 Modified Files:
   src/sys/net: if_vlan.c
 
 Log Message:
 Correct frame padding length
 
 vlan pads a frame with zeros up to 68 bytes
 (ETHER_MIN_LEN + ETHER_VLAN_ENCAP_LEN). It expects
 that even if the frame is untagged, it keeps 64 bytes
 at least. However, it lacks concern about CRC
 (4 bytes). So a sending frame can be 72 (68 + 4) bytes.

I don't get it. ETHER_MIN_LEN includes the 4 bytes for the CRC of a
minimum sized packet.  Therefore ETHER_MIN_LEN + ETHER_VLAN_ENCAP_LEN is
the right size, isn't it?

BTW, what do you mean by it lacks concern about CRC?

--chris


Re: CVS commit: src/sys/net

2014-12-01 Thread Martin Husemann
On Mon, Dec 01, 2014 at 11:54:23AM +0900, Ryota Ozaki wrote:
 BTW, we may need ATF tests to ring the bell for such regressions.

This is a bit tricky to do as we do not expose a sane api to create
old binaries from source. It would boil down to copy  paste of the old
structure definition and some magic to test ioctls with them.

Testing the ioctls itself against a rump kernel is very simple.

But then you'll have to install helpers (like rump.ifconfig but for various
ioctl versions) and it quickly gets messy.

So no easy/safe/scalable way there that I can see. The other option that 
comes to mind is running a -current kernel and doing a -7 and -6 release
branch test run. Unfortunately that will include all RUMP components from that
branch as well, so not test the compatibility we are looking for either.

But I guess we should think about rearrangements in this direction to make
it easier in the future - starting with a separate rump.tgz set, or maybe
a few of them, not sure what granularity would be needed there.

Antti, Justin, can we separate the rump userland from the rump kernel
binaries easily and put them in different sets, and then run the userland
from -6 or -7 against the (rump-)kernel from -current?

Martin


Re: CVS commit: src/sys/net

2014-12-01 Thread Antti Kantee

On 01/12/14 08:01, Martin Husemann wrote:

On Mon, Dec 01, 2014 at 11:54:23AM +0900, Ryota Ozaki wrote:

BTW, we may need ATF tests to ring the bell for such regressions.


This is a bit tricky to do as we do not expose a sane api to create
old binaries from source. It would boil down to copy  paste of the old
structure definition and some magic to test ioctls with them.

Testing the ioctls itself against a rump kernel is very simple.

But then you'll have to install helpers (like rump.ifconfig but for various
ioctl versions) and it quickly gets messy.

So no easy/safe/scalable way there that I can see. The other option that
comes to mind is running a -current kernel and doing a -7 and -6 release
branch test run. Unfortunately that will include all RUMP components from that
branch as well, so not test the compatibility we are looking for either.

But I guess we should think about rearrangements in this direction to make
it easier in the future - starting with a separate rump.tgz set, or maybe
a few of them, not sure what granularity would be needed there.

Antti, Justin, can we separate the rump userland from the rump kernel
binaries easily and put them in different sets, and then run the userland
from -6 or -7 against the (rump-)kernel from -current?


Copying prebuilt userland tarballs around sounds suboptimal.  Unless you 
want to go cherry-picking every time something changes in the tests, 
you'd need all of bins/libs.  And then you'd probably need to carefully 
tune your PATH for the tests to still work.  Instead, I'd go the 
opposite route which allows for straightforward, self-contained automation:


You can use buildrump.sh (*) with a -6 or -7 toolchain to get a -current 
rump kernel.  Theoretically speaking, you can then run the -current 
tests as an additional part of -6 and -7 test runs with LD_LIBRARY_PATH 
set to buildrump.sh/rump/lib.  I doubt anyone has run /usr/tests with 
host != rump kernel since I was working on NetBSD 5.0 and testing 
-current, um, 4 years ago (and I was just installing -current to /), so 
expect some tuning to be necessary.


The downside to the above approach is that if you want to manually run 
the tests, you need a -6/-7 installation, but once in the blue moon when 
it's necessary to run the tests manually instead of relying on 
automation, you'd just run anita and wait a few minutes.


*) ./checkout cvs src HEAD ; ./buildrump.sh fullbuild
^ assuming _on_ -6/-7.  Otherwise you also need to set $CC to the 
appropriate cross compiler before running buildrump.sh.


Re: CVS commit: src/sys/net

2014-12-01 Thread Ryota Ozaki
On Mon, Dec 1, 2014 at 5:01 PM, Martin Husemann mar...@duskware.de wrote:
 On Mon, Dec 01, 2014 at 11:54:23AM +0900, Ryota Ozaki wrote:
 BTW, we may need ATF tests to ring the bell for such regressions.

 This is a bit tricky to do as we do not expose a sane api to create
 old binaries from source. It would boil down to copy  paste of the old
 structure definition and some magic to test ioctls with them.

Hmm, in this case, a test with a new userland binary (7.99.2) and
a new kernel (7.99.2) can expose the regression. So I thought we
need to add a test that does ioctl(SIOCGIFCONF) (or just run
ifconfig with no argument).

  ozaki-r


 Testing the ioctls itself against a rump kernel is very simple.

 But then you'll have to install helpers (like rump.ifconfig but for various
 ioctl versions) and it quickly gets messy.

 So no easy/safe/scalable way there that I can see. The other option that
 comes to mind is running a -current kernel and doing a -7 and -6 release
 branch test run. Unfortunately that will include all RUMP components from that
 branch as well, so not test the compatibility we are looking for either.

 But I guess we should think about rearrangements in this direction to make
 it easier in the future - starting with a separate rump.tgz set, or maybe
 a few of them, not sure what granularity would be needed there.

 Antti, Justin, can we separate the rump userland from the rump kernel
 binaries easily and put them in different sets, and then run the userland
 from -6 or -7 against the (rump-)kernel from -current?

 Martin


Re: CVS commit: src/sys/net

2014-12-01 Thread Martin Husemann
On Mon, Dec 01, 2014 at 11:21:17PM +0900, Ryota Ozaki wrote:
 Hmm, in this case, a test with a new userland binary (7.99.2) and
 a new kernel (7.99.2) can expose the regression. So I thought we
 need to add a test that does ioctl(SIOCGIFCONF) (or just run
 ifconfig with no argument).

Oh, I somehow thought the -current version had worked and only compat
got broken.

This is very simple, see for example src/tests/net/mpls - but that is a
quite complex version setting up 4 independent rump servers. In this
case, a single one would do.

Martin


Re: CVS commit: src/sys/net

2014-12-01 Thread Ryota Ozaki
On Mon, Dec 1, 2014 at 11:28 PM, Martin Husemann mar...@duskware.de wrote:
 On Mon, Dec 01, 2014 at 11:21:17PM +0900, Ryota Ozaki wrote:
 Hmm, in this case, a test with a new userland binary (7.99.2) and
 a new kernel (7.99.2) can expose the regression. So I thought we
 need to add a test that does ioctl(SIOCGIFCONF) (or just run
 ifconfig with no argument).

 Oh, I somehow thought the -current version had worked and only compat
 got broken.

 This is very simple, see for example src/tests/net/mpls - but that is a
 quite complex version setting up 4 independent rump servers. In this
 case, a single one would do.

Thanks, I'll try to do it.

  ozaki-r


 Martin


Re: CVS commit: src/sys/net

2014-12-01 Thread Christos Zoulas
On Dec 1, 11:21pm, ozak...@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: CVS commit: src/sys/net

| On Mon, Dec 1, 2014 at 5:01 PM, Martin Husemann mar...@duskware.de wrote:
|  On Mon, Dec 01, 2014 at 11:54:23AM +0900, Ryota Ozaki wrote:
|  BTW, we may need ATF tests to ring the bell for such regressions.
| 
|  This is a bit tricky to do as we do not expose a sane api to create
|  old binaries from source. It would boil down to copy  paste of the old
|  structure definition and some magic to test ioctls with them.
| 
| Hmm, in this case, a test with a new userland binary (7.99.2) and
| a new kernel (7.99.2) can expose the regression. So I thought we
| need to add a test that does ioctl(SIOCGIFCONF) (or just run
| ifconfig with no argument).

Yes, SIOCGIFCONF is still supported :-)

christos


Re: CVS commit: src/sys/net

2014-12-01 Thread Justin Cormack
On Mon, Dec 1, 2014 at 12:15 PM, Antti Kantee po...@netbsd.org wrote:
 Antti, Justin, can we separate the rump userland from the rump kernel
 binaries easily and put them in different sets, and then run the userland
 from -6 or -7 against the (rump-)kernel from -current?


 Copying prebuilt userland tarballs around sounds suboptimal.  Unless you
 want to go cherry-picking every time something changes in the tests, you'd
 need all of bins/libs.  And then you'd probably need to carefully tune your
 PATH for the tests to still work.  Instead, I'd go the opposite route which
 allows for straightforward, self-contained automation:

 You can use buildrump.sh (*) with a -6 or -7 toolchain to get a -current
 rump kernel.  Theoretically speaking, you can then run the -current tests as
 an additional part of -6 and -7 test runs with LD_LIBRARY_PATH set to
 buildrump.sh/rump/lib.  I doubt anyone has run /usr/tests with host != rump
 kernel since I was working on NetBSD 5.0 and testing -current, um, 4 years
 ago (and I was just installing -current to /), so expect some tuning to be
 necessary.

Well I have now, the buildrump snapshot (ie current a month or so ago)
on 6.1.5/amd64. There are about 110 more test fails than there are on
my install, will look into what type of thing is failing and why. A
lot (most) seem to be puffs/p2k related.

Justni


Re: CVS commit: src/sys/net

2014-11-30 Thread Ryota Ozaki
On Mon, Dec 1, 2014 at 9:27 AM, Christos Zoulas chris...@netbsd.org wrote:
 Module Name:src
 Committed By:   christos
 Date:   Mon Dec  1 00:27:05 UTC 2014

 Modified Files:
 src/sys/net: if.c

 Log Message:
 PR/49437: jmcneill: revert broken changes that broke SIOCGIFCONF (mdnsd uses 
 it)

I'm sorry for my fault. I'll fix that commit later.

BTW, we may need ATF tests to ring the bell for such regressions.

  ozaki-r



 To generate a diff of this commit:
 cvs rdiff -u -r1.300 -r1.301 src/sys/net/if.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.



Re: CVS commit: src/sys/net

2014-11-27 Thread Matt Thomas

 On Nov 26, 2014, at 11:26 PM, Masao Uebayashi uebay...@gmail.com wrote:
 
 On Thu, Nov 27, 2014 at 4:20 PM, Matt Thomas m...@3am-software.com wrote:
 Or completely hide sizeof(struct ifnet) by forcing use of if_alloc(),
 as struct device did.
 
 Doesn't work since struct ifnet is embedded in ethercom/fddicom
 
 Right.  I mean, I want to make them alloc'ed to improve modlarity in
 the long run.

Eww.  Yet more indirection.  Most ifnet's live in a ethercom which
lives in a softc allocated by autoconf.  


Re: CVS commit: src/sys/net

2014-11-27 Thread Masao Uebayashi
I predict that in the near future the network stack will act like a
batch manner; instead of moving layers for a single packet, pass a
batch of packets to the next layer; each layer processes a batch of
packets in a loop and stay longer (than now) holding a relevant
context object cached.  In such a model the relative cost of
indirection will become nothing.


Re: CVS commit: src/sys/net

2014-11-26 Thread Matt Thomas

 On Nov 26, 2014, at 1:38 AM, Ryota Ozaki ozak...@netbsd.org wrote:
 
 Module Name:  src
 Committed By: ozaki-r
 Date: Wed Nov 26 09:38:42 UTC 2014
 
 Modified Files:
   src/sys/net: if.c if.h
 
 Log Message:
 Change if_slowtimo_ch to a pointer
 
 One benefit to do so is to reduce memory used for struct callout;
 we can avoid to allocate struct callout for interfaces that don't
 use callout.
 
 Requested by uebayasi@.

This requires a kernel version bump.



Re: CVS commit: src/sys/net

2014-11-26 Thread Ryota Ozaki
On Thu, Nov 27, 2014 at 10:30 AM, Matt Thomas m...@3am-software.com wrote:

 On Nov 26, 2014, at 1:38 AM, Ryota Ozaki ozak...@netbsd.org wrote:

 Module Name:  src
 Committed By: ozaki-r
 Date: Wed Nov 26 09:38:42 UTC 2014

 Modified Files:
   src/sys/net: if.c if.h

 Log Message:
 Change if_slowtimo_ch to a pointer

 One benefit to do so is to reduce memory used for struct callout;
 we can avoid to allocate struct callout for interfaces that don't
 use callout.

 Requested by uebayasi@.

 This requires a kernel version bump.


OK, I did it. Thanks.

  ozaki-r


Re: CVS commit: src/sys/net

2014-11-26 Thread Masao Uebayashi
Maybe adding a few spare members, like 5 pointers, to reduce # of bumps ... ?


Re: CVS commit: src/sys/net

2014-11-26 Thread Masao Uebayashi
On Thu, Nov 27, 2014 at 10:51 AM, Masao Uebayashi uebay...@gmail.com wrote:
 Maybe adding a few spare members, like 5 pointers, to reduce # of bumps ... ?

Or completely hide sizeof(struct ifnet) by forcing use of if_alloc(),
as struct device did.


Re: CVS commit: src/sys/net

2014-11-26 Thread Ryota Ozaki
On Thu, Nov 27, 2014 at 10:51 AM, Masao Uebayashi uebay...@gmail.com wrote:
 Maybe adding a few spare members, like 5 pointers, to reduce # of bumps ... ?

I thought the version bump is required because a member has been changed
(a struct to a pointer). So IIUC the spare members don't help for such
a situation?

  ozaki-r


Re: CVS commit: src/sys/net

2014-11-26 Thread Ryota Ozaki
On Thu, Nov 27, 2014 at 10:55 AM, Masao Uebayashi uebay...@gmail.com wrote:
 On Thu, Nov 27, 2014 at 10:51 AM, Masao Uebayashi uebay...@gmail.com wrote:
 Maybe adding a few spare members, like 5 pointers, to reduce # of bumps ... ?

 Or completely hide sizeof(struct ifnet) by forcing use of if_alloc(),
 as struct device did.

Hmm, I'm confused. When do we have to bump the kernel version?

  ozaki-r


Re: CVS commit: src/sys/net

2014-11-26 Thread Ryota Ozaki
On Thu, Nov 27, 2014 at 11:39 AM, matthew green m...@eterna.com.au wrote:

 Ryota Ozaki writes:
 On Thu, Nov 27, 2014 at 10:51 AM, Masao Uebayashi uebay...@gmail.com wrote:
  Maybe adding a few spare members, like 5 pointers, to reduce # of bumps 
  ... ?

 I thought the version bump is required because a member has been changed
 (a struct to a pointer). So IIUC the spare members don't help for such
 a situation?

 this is correct.  you need a kernel bump when ever the kernel
 ABI changes.

 hiding stuff inside an alloc doesn't really help much when the
 kernel code needs to access the structure anyway, unless you
 also hide all accesses to the structure inside functions.

Thanks! I understood.

  ozaki-r


 that said, it's probably a good idea anyway.


 .mrg.


Re: CVS commit: src/sys/net

2014-11-26 Thread Masao Uebayashi
On Thu, Nov 27, 2014 at 11:39 AM, matthew green m...@eterna.com.au wrote:
 hiding stuff inside an alloc doesn't really help much when the
 kernel code needs to access the structure anyway, unless you
 also hide all accesses to the structure inside functions.

Right.  And some members, including this if_slowtimo_ch, are internal
to if.c, which don't affect ABI except struct size.


Re: CVS commit: src/sys/net

2014-11-26 Thread Matt Thomas

 On Nov 26, 2014, at 5:55 PM, Masao Uebayashi uebay...@gmail.com wrote:
 
 On Thu, Nov 27, 2014 at 10:51 AM, Masao Uebayashi uebay...@gmail.com wrote:
 Maybe adding a few spare members, like 5 pointers, to reduce # of bumps ... ?
 
 Or completely hide sizeof(struct ifnet) by forcing use of if_alloc(),
 as struct device did.

Doesn't work since struct ifnet is embedded in ethercom/fddicom

Re: CVS commit: src/sys/net

2014-11-26 Thread Masao Uebayashi
On Thu, Nov 27, 2014 at 4:20 PM, Matt Thomas m...@3am-software.com wrote:
 Or completely hide sizeof(struct ifnet) by forcing use of if_alloc(),
 as struct device did.

 Doesn't work since struct ifnet is embedded in ethercom/fddicom

Right.  I mean, I want to make them alloc'ed to improve modlarity in
the long run.


Re: CVS commit: src/sys/net

2014-11-20 Thread Alexander Nasonov
Christos Zoulas wrote:
 In article 20141119210214.GA8310@neva,
 Alexander Nasonov  al...@yandex.ru wrote:
 I don't think SLJIT_UMOD exists. You need to emit SLJIT_SUB after
 SLJIT_UDIV. This should be pretty straigthforward. But I don't mind
 if you make a function call for these quite rare instructions. 
 
 Can you please do it? You are the expert :-)

Sure, will do.

-- 
Alex


  1   2   >