Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Ryota Ozaki
On Fri, Dec 8, 2017 at 6:51 PM, Kengo NAKAHARA  wrote:
> Hi,
>
> On 2017/12/07 14:21, Taylor R Campbell wrote:
>> I dropped this thread on the floor a while ago and I forget what the
>> status was.  I've had a patch sitting in my tree for a while which I
>> brushed off to put the list update logic in separate functions, as I
>> think chuq requested a while ago, but still keep it all isolated to
>> subr_psref.c and avoid defining any new macros.
>>
>> If you've measured that SLIST works better -- which would make sense
>> because the typical bracketed psref_acquire/release nesting makes a
>> nice LIFO stack of the things so that SLIST_REMOVE should usually be
>> done in the first iteration -- then I'm happy to replace it by SLIST.
>> We should just make sure to fix this bug before netbsd-8 goes out!
>>
>> Thoughts?
>
> I measure IPv4 forwarding performance your patch(PSREF_LIST) version
> and SLIST version. At first, the result is quite affected by
> optimization option "-falign-functions"... Based on this, it seems
> there is almost no difference between PSREF_LIST and SLIST.
>
> However, it seems your patch has large diff... From the point of code
> stability, smaller diff SLIST version would be better for netbsd-8 branch
> to fix the bug. Because your patch causes some new ATF failures such as
> ldp_regen and route_change_ifp (reported by ozaki-r@n.o). We can probably
> fix them at once but guaranteeing its stability would take more time.

Hmm, the same failures also happen with the SLIST version. IIRC there
was no regression with it two months ago and so I didn't test it enough
days ago. Anyway I guess the failures are not because of your patch.
I'm sorry for that.

I have no idea why the failures happen. The patches may dig out a
latent bug in -current.

  ozaki-r


I2C iic_exec() and clock stretch

2017-12-11 Thread Brad Spencer

Sorry to intrude...


So...  I have a I2C device I am writing a driver for that has a read
cycle that needs one of the following:

1) Start a read with a STOP, except that before the data is sent down
the bus, clock stretch for a bit while the device is working on what is
is suppose to be working on.  When it is done, the data will be sent.

2) Start the read with no STOP cycle, but return NACK part way though
until the device is done doing what it is suppose to be doing.  When it
stops sending a NACK, or after you wait a bit, it seems that you can do
a read with a STOP and get the data.


Neither of these two situations appear to be supported with iic_exec()
as it exists today.  What I was wondering is if anyone has worked on
anything to enhance iic_exec() to deal with this sort of thing.


The device in question is the SI7021 humidity and temperature sensor.
It is a fairly inexpensive device that works pretty well and has tons of
example code out there for all kinds of OSs and devices.  The data sheet
is available at, among other places:

https://cdn-learn.adafruit.com/assets/assets/000/035/931/original/Support_Documents_TechnicalDocs_Si7021-A20.pdf


I am using a Raspberry PI 3 as the computer.  To do read type #1, I
hacked up bcm2835_bsc.c to provide a simple device property to specify
the amount of clock stretch to the BCM2835 iic that is desired and a new
flag in i2cvar.h that can be used by iic_exec() that will tell the
controller driver to use the stretch, but getting that property set from
attached children provided to be ugly.  I am thinking that something
like a iic_execv() should maybe exist that works, mechanically, a little
like sysctl_createv() so that a si70xx driver can do a I2C transaction,
but tell the controller driver, do this transaction with these
additional arguments that are more complicated then a simple flag.

To do read type #2 with the SI7021 iic_exec() needs to be able to do a 0
length read, I think and this is disallowed by bcm2835_bsc.c by a
KASSERT(dlen >= 1).  I didn't really look at any other drivers to see if
this sort of thing is allowed.  This is basically a cycle that looks
like:

START+ADDRESS+COMMAND+START+ADDRESS
NACK will be returned until the measurement is done
START+ADDRESS
You get up to three bytes back.

It was simpler to hack in support for clock stretching then to figure
out if it was possible to do 0 length reads.



In any case, I was curious if anyone had any thoughts or was working on
something like this??




-- 
Brad Spencer - b...@anduin.eldar.org - KC8VKS
http://anduin.eldar.org  - & -  http://anduin.ipv6.eldar.org [IPv6 only]


Re: getsockopt(2)

2017-12-11 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>chris...@astron.com (Christos Zoulas) wrote:
>>>In article ,
>>>
>>>So depending on the contents of the sockval we do something different?
>>
>>FreeBSD does. The calls to copy in or out the data are scattered
>>throughout the network stack.
>
>Is sockval always an int? 

Do you mean optname or optval ?

>>Linux seems to cheat and make use of the fact that their errors are
>>negative to return a positive integer word value from
>>setsockopt(). Their implementation of getsockopt() copies in and out.
>
>Hmm, that is not what the man page claims :-)

Linux doesn't really do man pages though, I looked at the source. The
FreeBSD man page is wrong too.

There is a thread on lkml describing what Linux did.

>>I would still prefer to change both getsockopt(2) and setsockopt(2).
>>
>>I would make getsockopt always copy in the supplied optval argument.
>
>Well, it has to copyout, so presumably that won't break existing code.

I have run a kernel for a while with this change to getsockopt() and
it seemed fine.

>>For setsockopt, the protocol code can update sopt_size to indicate the
>>amount of data to be copied back. I would add a line to the PRCO_SETOPT
>>handler in most protocols to set sopt_size to 0 so that nothing was
>>copied back.
>
>That could break things; grepping for sopt_size finds many checks for it.
>There is also sockopt_setmbuf that can potentially allocate larger than
>MCLBYTES socket option?

I meant that I would set sopt_size to zero after all the processing
of the input data has been finished.

The only place that I could see mbufs being used was in ipsec, which
doesn't work because it needs these changes too.

>>The syscall definition for setsockopt() would need to be changed to
>>remove the 'const' from the optval argument.
>
>That is arguably the worst part of the change. The API for it is part of
>the standard. Well, we can go the linux way and make getsockopt read and
>write, but that makes me cringe. Perhaps adding an updatesockopt() is better?

It is the FreeBSD way as well.

The operation required for getsockopt() isn't really *updating* the
socket options, it is more "get with an extra parameter".

The operation required for setsockopt() is to mostly set things but in
certain cases to return an int value that results from setting stuff.


Re: getsockopt(2)

2017-12-11 Thread Christos Zoulas
In article ,
Robert Swindells   wrote:
>
>chris...@astron.com (Christos Zoulas) wrote:
>>In article ,
>>
>>So depending on the contents of the sockval we do something different?
>
>FreeBSD does. The calls to copy in or out the data are scattered
>throughout the network stack.

Is sockval always an int? 

>Linux seems to cheat and make use of the fact that their errors are
>negative to return a positive integer word value from
>setsockopt(). Their implementation of getsockopt() copies in and out.

Hmm, that is not what the man page claims :-)

>I would still prefer to change both getsockopt(2) and setsockopt(2).
>
>I would make getsockopt always copy in the supplied optval argument.

Well, it has to copyout, so presumably that won't break existing code.

>For setsockopt, the protocol code can update sopt_size to indicate the
>amount of data to be copied back. I would add a line to the PRCO_SETOPT
>handler in most protocols to set sopt_size to 0 so that nothing was
>copied back.

That could break things; grepping for sopt_size finds many checks for it.
There is also sockopt_setmbuf that can potentially allocate larger than
MCLBYTES socket option?

>The syscall definition for setsockopt() would need to be changed to
>remove the 'const' from the optval argument.

That is arguably the worst part of the change. The API for it is part of
the standard. Well, we can go the linux way and make getsockopt read and
write, but that makes me cringe. Perhaps adding an updatesockopt() is better?

christos



Re: getsockopt(2)

2017-12-11 Thread Robert Swindells

chris...@astron.com (Christos Zoulas) wrote:
>In article ,
>Robert Swindells   wrote:
>>
>>I wrote:
>>>Does anyone know why we don't use the input 'optlen' parameter to the
>>>getsockopt(2) syscall, we do write back to it on return.
>>>
>>>In ip_output() there is this, which suggests that someone had come
>>>across this before.
>>>
>>>#if 0   /* defined(IPSEC) */
>>>case IP_IPSEC_POLICY:
>>>{
>>>struct mbuf *m = NULL;
>>>
>>>/* XXX this will return EINVAL as sopt is empty */
>>>error = ipsec4_get_policy(inp, sopt->sopt_data,
>>>sopt->sopt_size, );
>>>if (error == 0)
>>>error = sockopt_setmbuf(sopt, m);
>>>break;
>>>}   
>>>#endif /*IPSEC*/  
>>>
>>>There are also lots of places in sctp_usrreq that want to use it.
>>>
>>>I can set it with the following patch (line numbers will be slightly
>>>out), but wondered if there was a reason for the current behaviour.
>>>
>>>Index: uipc_syscalls.c
>>>===
>>>RCS file: /cvsroot/src/sys/kern/uipc_syscalls.c,v
>>>retrieving revision 1.187
>>>diff -u -r1.187 uipc_syscalls.c
>>>--- uipc_syscalls.c  20 Jun 2017 20:34:49 -  1.187
>>>+++ uipc_syscalls.c  14 Oct 2017 21:33:09 -
>>>@@ -1235,7 +1240,7 @@
>>> if ((error = fd_getsock1(SCARG(uap, s), , )) != 0)
>>> return (error);
>>> 
>>>-sockopt_init(, SCARG(uap, level), SCARG(uap, name), 0);
>>>+sockopt_init(, SCARG(uap, level), SCARG(uap, name), valsize);
>>> 
>>> if (fp->f_flag & FNOSIGPIPE)
>>> so->so_options |= SO_NOSIGPIPE;
>>
>>Has anyone had any other thoughts on this ?
>
>Yes, if you don't protect valsize against MCLBYTES like setsockopt does,
>it can be used to DoS kernel. Otherwise is can be done. Is 2K enough?

Yes, 2K would be enough. Have added the check in my tree.

>>I'm thinking of adding an extra socket option, maybe SO_PARAM, that you
>>would use with setsockopt(2) to set a selector to be used by a following
>>getsockopt(2) call.
>
>This would require some state keeping in the kernel and might not be
>reliable.

I know, it is the only alternative that I can think of to changing
setsockopt(2).

>>This wouldn't fix the IPSEC usage without changing userland though.
>>
>>The alternative is to make both setsockopt(2) and getsockopt(2)
>>bidirectional.
>
>So depending on the contents of the sockval we do something different?

FreeBSD does. The calls to copy in or out the data are scattered
throughout the network stack.

Linux seems to cheat and make use of the fact that their errors are
negative to return a positive integer word value from
setsockopt(). Their implementation of getsockopt() copies in and out.

I would still prefer to change both getsockopt(2) and setsockopt(2).

I would make getsockopt always copy in the supplied optval argument.

For setsockopt, the protocol code can update sopt_size to indicate the
amount of data to be copied back. I would add a line to the PRCO_SETOPT
handler in most protocols to set sopt_size to 0 so that nothing was
copied back.

The syscall definition for setsockopt() would need to be changed to
remove the 'const' from the optval argument.

>>I haven't checked in the userland implementation of sctp_opt_info(3)
>>yet. I took the one from FreeBSD but can modify it to work with a
>>different API.
>
>Ok.

This still requires an alternative API to use. I haven't found an
example of an OS that hides one in its version of sctp_opt_info().


Re: kernel condvars: how to use?

2017-12-11 Thread Taylor R Campbell
> Date: Mon, 11 Dec 2017 11:13:16 -0500 (EST)
> From: Mouse 
> 
> > If read does
> 
> > while (sc->sc_foo < sc->sc_bar)
> > cv_wait(>sc_cv, >sc_lock);
> 
> > then whoever changes sc_foo or sc_bar might test whether they changed
> > from sc->sc_foo < sc->sc_bar to !(sc->sc_foo < sc->sc_bar) before
> > they cv_broadcast.
> 
> ...then you have the same test in two places, leaving them vulnerable
> to one but not the other being changed.  (Of course, only some changes
> will break functionality, but that's a separate issue.)
> 
> Furthermore, they and LPT_RF_WAITING are testing/describing different
> things: "is anyone actually waiting?" versus "if anyone does happen to
> be waiting, could this make a difference?".

Correct.

Your responsibility as a user of condvar(9) is to make sure that if
you ever make a difference for a waiter, then you wake.  That's why I
advise either issuing wakeups for any changes to sc_foo/bar, or if
you're going to conditionalize them to avoid unnecessary wakeups, then
conditionalizing them only on changes to sc_foo/bar that transition
from a state that blocks waiters to a state that doesn't block
waiters.

On the other side, condvar(9) handles skipping some potentially
expensive logic if there isn't actually anyone waiting.  You shouldn't
worry about testing for the presence of waiters unless you actually
observe a performance impact.  That's why I advise _against_ using
LPT_RF_WAITING.

(One exception: It's OK to KASSERT(!cv_has_waiters(cv)) in cases when
you need to be sure there are no waiters.  Aside: Maybe cv_destroy
should do this itself anyway.)

> > In any case, neither __insn_barrier nor volatile is sufficient in the
> > multiprocessor model,
> 
> Not sufficient, true, but necessary (though, depending on the
> implementations of things like C and mutexes, possibly implicitly
> provided).

mutex_enter/exit guarantees the appropriate memory barriers to make
the mutual exclusion sections for a single lock object appear globally
ordered on all CPUs, in thread context and in interrupt context.


Re: kernel condvars: how to use?

2017-12-11 Thread Mouse
>>> If anything, you should just test rather whether you changed from a
>>> state that would block a read to a state in which a read can make
>>> progress.
>> That's kind-of what LPT_RF_WAITING records.
> What I meant is the condition that read actually loops on.  [...]

Yes, that's what I thought you meant.  But...

> If read does

>   while (sc->sc_foo < sc->sc_bar)
>   cv_wait(>sc_cv, >sc_lock);

> then whoever changes sc_foo or sc_bar might test whether they changed
> from sc->sc_foo < sc->sc_bar to !(sc->sc_foo < sc->sc_bar) before
> they cv_broadcast.

...then you have the same test in two places, leaving them vulnerable
to one but not the other being changed.  (Of course, only some changes
will break functionality, but that's a separate issue.)

Furthermore, they and LPT_RF_WAITING are testing/describing different
things: "is anyone actually waiting?" versus "if anyone does happen to
be waiting, could this make a difference?".  (Each one is a useful test
under some circumstances; it just seems to me that, while each can be a
useful test to apply there, confusing the tested-for conditions is
likely to lead to muddy thinking and thence muddy code.)

>> Looking at the implementation of __insn_barrier(), it looks like
>> overkill: is forces _everything_ out of registers, [etc].  Using
>> volatile does so only for the things for which it actually matters
> Yes, that's right.  On the other hand, volatile also applies to every
> access to fields in question, whereas __insn_barrier applies a
> barrier only in one specific place.

That's why I write

struct lpt_softc *sc;
volatile struct lpt_softc *vsc;

vsc = sc;

and use vsc-> when volatile is appropriate and sc-> when not.

It would be better if C provided a way to mark a sequence point as
volatile-synchronizing, relaxed (as compared to the current rules) so
that object accesses are synchronized not as of the next sequence point
but rather as of the next _marked_ sequence point.  Possibly, instead,
a way to name objects and force their values to be synchronized.  But
we don't have either, and, with no namespace for sequence points (many
of which have no explicit textual representation in the source), I'm
not sure how doable the former would be.

> In any case, neither __insn_barrier nor volatile is sufficient in the
> multiprocessor model,

Not sufficient, true, but necessary (though, depending on the
implementations of things like C and mutexes, possibly implicitly
provided).

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Merging ugen into the usb stack

2017-12-11 Thread Greg Troxel

Martin Husemann  writes:

> On Mon, Dec 11, 2017 at 08:24:00AM -0500, Greg Troxel wrote:
>> I wonder if we should be attaching drivers to endpoints, rather than
>> devices.
>
> This is the drivers decision (we have drivers that do it).
>
> However, ugen is not able to attach to a single interface itself (currently).

Well, I guess I think it's better to allow drivers to attach to single
interfaces in general, than to make ugen special and try to integrate
it.  But I haven't looked at things enough to justify that opinion.



signature.asc
Description: PGP signature


Re: Merging ugen into the usb stack

2017-12-11 Thread Martin Husemann
On Mon, Dec 11, 2017 at 08:24:00AM -0500, Greg Troxel wrote:
> I wonder if we should be attaching drivers to endpoints, rather than
> devices.

This is the drivers decision (we have drivers that do it).

However, ugen is not able to attach to a single interface itself (currently).

Martin


Re: Merging ugen into the usb stack

2017-12-11 Thread Greg Troxel

Martin Husemann  writes:

> However, it can not work with the way NetBSD uses ugen devices:
>
> uftdi0 at uhub3 port 2
> uftdi0: FTDI (0x9e88) SheevaPlug JTAGKey FT2232D B (0x9e8f), rev 2.00/5.00, 
> addr 3
> ucom0 at uftdi0 portno 1
> ucom1 at uftdi0 portno 2
>
> I can disable the ucom at uftdi0 portno 1, but there is no way to get a ugen
> device to attach there.
>
> The uftdi chip itself offers a separate interface for each of the ports,
> at that layer there should not be a problem.

I wonder if we should be attaching drivers to endpoints, rather than
devices.  It seems fairly common to have multiple endpoints doing
different things (among more-than-one-thing devices), rather than
multiple devices behind a hub.

Letting ugen attach to endpoints that drivers don't deal with seems like
an entirely reasonable solution, and it seems to have lower risk of
problems from things we can't predict.

I also wonder what other operating systems do here (beyond point
solutions that they've had to do).


signature.asc
Description: PGP signature


Re: Merging ugen into the usb stack

2017-12-11 Thread Taylor R Campbell
> Date: Mon, 11 Dec 2017 12:44:32 +0100
> From: Martin Husemann 
> 
> I could hack uftdi to allow attaching a ugen child, but that sounds
> like a very special hack. Jared suggested to instead make ugen not a
> separate device, but globaly fold it into the usb stack. That would
> also solve similar issues we have seen with usb scanner devices and
> ulpts.

Sounds reasonable as long as we clearly define the semantics of
opening the ugen device as it affects any other usage of the device,
and as long as Someone^TM takes the time to carefully implement that
semantics.


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Taylor R Campbell
> Date: Mon, 11 Dec 2017 11:55:27 +0100
> From: Edgar Fuß 
> 
> >  struct psref {
> > -   LIST_ENTRY(psref)   psref_entry;
> > +   SLIST_ENTRY(psref)  psref_entry;
> > +   /* To keep ABI with LIST_ENTRY(psref) version. */
> > +   void*psref_unused0;
> Isn't it somewhat fishy to manally pad this, knowing how big a [S]LIST_ENTRY 
> is? Wouldn't it be cleaner to use a union?

Also fine by me, but a little more work to implement.  Feel free to
add some ctasserts to raise your confidence in this.  I don't expect
LIST_ENTRY or SLIST_ENTRY to change, but I don't mind assertions to
add additional confidence.


Merging ugen into the usb stack

2017-12-11 Thread Martin Husemann
Hi folks,

I have a Guruplug JTAG device (and right now need to use it to unbrick
one of my Guruplugs after some ... stupid u-boot experiment).

The device is supported by openocd, which we have in pkgsrc. The driver
used is:

http://openocd.org/doc/html/Debug-Adapter-Configuration.html#Debug-Adapter-Configuration
(search for ftdi):

Interface Driver: ftdi

This driver is for adapters using the MPSSE
(Multi-Protocol Synchronous Serial Engine) mode
built into many FTDI chips, such as the FT2232,
FT4232 and FT232H.

The driver is using libusb-1.0 in asynchronous
mode to talk to the FTDI device, bypassing
intermediate libraries like libftdi or D2XX. 


For the recovery procedure I need access to the serial console at the
same time. The uftdi jtag luckily offers that as well.

However, it can not work with the way NetBSD uses ugen devices:

uftdi0 at uhub3 port 2
uftdi0: FTDI (0x9e88) SheevaPlug JTAGKey FT2232D B (0x9e8f), rev 2.00/5.00, 
addr 3
ucom0 at uftdi0 portno 1
ucom1 at uftdi0 portno 2

I can disable the ucom at uftdi0 portno 1, but there is no way to get a ugen
device to attach there.

The uftdi chip itself offers a separate interface for each of the ports,
at that layer there should not be a problem.

I could hack uftdi to allow attaching a ugen child, but that sounds like a
very special hack. Jared suggested to instead make ugen not a separate device,
but globaly fold it into the usb stack. That would also solve similar issues
we have seen with usb scanner devices and ulpts.

So, how should we proceed here?

Martin


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Edgar Fuß
>  struct psref {
> - LIST_ENTRY(psref)   psref_entry;
> + SLIST_ENTRY(psref)  psref_entry;
> + /* To keep ABI with LIST_ENTRY(psref) version. */
> + void*psref_unused0;
Isn't it somewhat fishy to manally pad this, knowing how big a [S]LIST_ENTRY 
is? Wouldn't it be cleaner to use a union?