Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Matt Thomas

> On Jun 21, 2015, at 12:02 PM, Reinoud Zandijk  wrote:
> 
> Hi Matt,
> 
> On Sun, Jun 21, 2015 at 08:01:47AM -0700, Matt Thomas wrote:
>> IMO, softints are an abberation and should really be thread priorities and
>> dealt by the thread scheduler.
> 
> Each level of softint as a kernel thread that gets woken up by condition
> variables?

I envision them being hard realtime kernel threads that would preempt 
lower priority threads.

> Could in a virtualisation context those threads also be used and be woken by
> signalling the relevant condition variable on reception of say an virtio push?

Could be.

But my goal is something intrinsically different.  In the interrupt, you signal
a condition variable or some other method of making a thread runnable.
A run though the scheduler happens and new thread is selected to run.

In addition to exceptions and interrupts using a common trapframe, 
cpu_switchto should also need to use a trapframe to store the lwp’s
context.  When restoring a trapframe, switchto will need to know which type
of trapframe it was.

When the interrupt is about to restore the trapframe, if the scheduler decided
to switch to another lwp, it will do so using that lwp’s trapframe.  

The goal is to have near instant context switching without the hackery of the
current preeemption code.



Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Kamil Rytarowski
On 21.06.2015 17:01, Matt Thomas wrote:
> 
>> On Jun 21, 2015, at 7:30 AM, Kamil Rytarowski  wrote:
>>
>> I have got few questions regarding the interrupt flow in the kernel.
>> Please tell whether my understanding is correct.
> 
> You are confusing interrupts with exceptions.  Interrupts are 
> asynchronous events.  Exceptions are (usually) synchronous and
> are the result of an instruction.
> 

Thank you for your clarification!


Re: bottom half

2015-06-21 Thread Rhialto
On Fri 19 Jun 2015 at 11:45:40 +0200, Edgar Fuß wrote:
> To eleborate, as I seem to have been too cryptic in my references: I learned 
> the terms top and bottom half from "The Design and Implementation of the 
> 4.4BSD Operating System" by McKusick, Bostic, Karels and Quaterman. The 
> text on page 51 explains "The bottom half of the kernel comprises routines 
> that are invoked to handle hardware interrupts." and the figure 3.1 above 
> explains "Never scheduled, cannot block. Runs on kernel stack in kernel 
> address space."

For reference:

- the same book with 4.3 in the title has the same Figure 3.1, but on
  page 44.
- the same book with FreeBSD in the title has the same Figure 3.1, also
  on page 51.

-Olaf.
-- 
___ Olaf 'Rhialto' Seibert  -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl-- 'this bath is too hot.'


pgpLPiDZFVRHO.pgp
Description: PGP signature


Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Reinoud Zandijk
Hi Matt,

On Sun, Jun 21, 2015 at 08:01:47AM -0700, Matt Thomas wrote:
> IMO, softints are an abberation and should really be thread priorities and
> dealt by the thread scheduler.

Each level of softint as a kernel thread that gets woken up by condition
variables?

Could in a virtualisation context those threads also be used and be woken by
signalling the relevant condition variable on reception of say an virtio push?

With regards,
Reinoud



pgp1eORmeOJe_.pgp
Description: PGP signature


Re: netbsd32: race condition in swapctl()

2015-06-21 Thread Maxime Villard

Le 21/06/2015 11:47, Martin Husemann a écrit :

Should we make the "native" code use swapsys_lock()/swapsys_unlock() as
well, for consistency?


In fact, many comments around there refer to 'swap_syscall_lock',
and I didn't want to pollute the patch and update them all.



Martin



Re: Interrupt flow in the NetBSD kernel

2015-06-21 Thread Matt Thomas

> On Jun 21, 2015, at 7:30 AM, Kamil Rytarowski  wrote:
> 
> I have got few questions regarding the interrupt flow in the kernel.
> Please tell whether my understanding is correct.

You are confusing interrupts with exceptions.  Interrupts are 
asynchronous events.  Exceptions are (usually) synchronous and
are the result of an instruction.

> There are software and hardware interrupts.
> Part of the hardware interrupts are maskable with the spl(9) levels.
> Some are unmaskable and must be handled unconditionally, like the
> exception data abort from ARM.

data abort is a synchronous exception, not an interrupt.

> Hardware interrupts are handled by the hardware interrupt handler.
> System calls (syscalls) and softint(9) are software interrupts handled
> by the same software interrupt handler.

syscalls are synchronous exceptions, softint can be either a real
interrupt (like mips or VAX) or emulated in the SPL code (ARM).

> Syscalls come from the userland with the user address space context,

Currently, only syscalls from user mode are handled.

> softint(9) come from the kernel with kernel address space context.

But softint(9) use interrups as a mechanism, they don’t require them.
In fact, I’d like to see that die.

> The spl(9) calls mask maskable interrupts, both software and hardware
> ones - with the exception to the unmaskable ones -- like data abort on ARM.

Again, data abort is an exception, not an interrupt.

> There are three contexts in the kernel:
> - hardware interrupt (within hardware interrupt handler),
> - software interrupt (within software interrupt handler) for syscalls
> and softint(9),
> - thread context for LWP (lightweight processes).
> 
> Bottom half (BSD naming) is responsible for the hardware interrupts, top
> half (BSD naming) is responsible for the software and thread contexts.

Bottom half talks to the hardware and processor.  Top deal with requests
from userland.  The pmap is bottom half even though it’s only invoked by
the top half (UVM).

> Process is heavy with user address space oneness running in the
> user-space, thread is lightweight with shared kernel address space for
> all threads. Kernel can access the whole physical memory, but doesn't
> know the user address mapping. There is one process running in the
> kernel address space -- proc0 = swapper.

A process is a collection of threads sharing the same address space.
That address space be a user address space or a kernel address space.

> How physically works the spl(9) interrupt masking for software
> interrupts? On ARM svc (or monitors) aren't maskable, like IRQ
> (exception), a type of (ARM naming) exception and (kernel naming)
> hardware interrupt.

That depends on the implementation and the underlying hardware.

> I'm trying to get the big picture first, before getting to details.
> 
> When I look into details, I don't get the things, like the line 268
> here:
> http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/arm/arm32/exception.S?annotate=1.17.2.2
> Is it a leftover from line 252 and should be erased?

It’s gone now.

> Back to the big picture. How technically works IPL_SOFT, does it mask
> syscalls and softint(9) the same way? If it's not maskable (to my
> understanding) are we scheduling it in some sort of queue or stack
> waiting for the spl(9) level change?

IMO, softints are an abberation and should really be thread priorities and
dealt by the thread scheduler.


re: 32bit compat NFS server and PMC syscalls

2015-06-21 Thread matthew green

i've got an in-progress patch to split nfssvc to avoid as much
code-copying as possible.  lots of tangles, so lots of things
to tease out, but hopefully this will be done soon..


.mrg.


Interrupt flow in the NetBSD kernel

2015-06-21 Thread Kamil Rytarowski
I have got few questions regarding the interrupt flow in the kernel.
Please tell whether my understanding is correct.

There are software and hardware interrupts.
Part of the hardware interrupts are maskable with the spl(9) levels.
Some are unmaskable and must be handled unconditionally, like the
exception data abort from ARM.
Hardware interrupts are handled by the hardware interrupt handler.
System calls (syscalls) and softint(9) are software interrupts handled
by the same software interrupt handler.
Syscalls come from the userland with the user address space context,
softint(9) come from the kernel with kernel address space context.

The spl(9) calls mask maskable interrupts, both software and hardware
ones - with the exception to the unmaskable ones -- like data abort on ARM.

There are three contexts in the kernel:
- hardware interrupt (within hardware interrupt handler),
- software interrupt (within software interrupt handler) for syscalls
and softint(9),
- thread context for LWP (lightweight processes).

Bottom half (BSD naming) is responsible for the hardware interrupts, top
half (BSD naming) is responsible for the software and thread contexts.

Process is heavy with user address space oneness running in the
user-space, thread is lightweight with shared kernel address space for
all threads. Kernel can access the whole physical memory, but doesn't
know the user address mapping. There is one process running in the
kernel address space -- proc0 = swapper.

How physically works the spl(9) interrupt masking for software
interrupts? On ARM svc (or monitors) aren't maskable, like IRQ
(exception), a type of (ARM naming) exception and (kernel naming)
hardware interrupt.

I'm trying to get the big picture first, before getting to details.

When I look into details, I don't get the things, like the line 268
here:
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/arm/arm32/exception.S?annotate=1.17.2.2
Is it a leftover from line 252 and should be erased?

Back to the big picture. How technically works IPL_SOFT, does it mask
syscalls and softint(9) the same way? If it's not maskable (to my
understanding) are we scheduling it in some sort of queue or stack
waiting for the spl(9) level change?


Re: 32bit compat NFS server and PMC syscalls

2015-06-21 Thread Martin Husemann
On Sun, Jun 21, 2015 at 09:46:26PM +1000, matthew green wrote:
> i recall this being more ugly than i should be, but i don't think
> we should have to make it need 64 bit.  it really isn't doing much
> and should be a case we can handle.

Yeah, it doesn't matter a lot what the userland process is overall,
but it is not trivial to handle w/o copypasting the whole syscall
code.

Martin


re: 32bit compat NFS server and PMC syscalls

2015-06-21 Thread matthew green

Martin Husemann writes:
> I've been looking at adding the missing things to compat/netbsd32
> recently, mainly to make the default N32 userland on mips64 more usefull
> and able to run our full test suite.
> 
> Most missing pieces are just oversights/lazyness and easy to fill in.
> 
> However, I wonder about
> 
>  a) nfssvc (i.e. the nfsd kernel part) - does it make sense to use that?
> Alternatively we could build the nfsd binary as N64 on mips64 (like
> we need to do for some of the essential kvm grovelers)

i recall this being more ugly than i should be, but i don't think
we should have to make it need 64 bit.  it really isn't doing much
and should be a case we can handle.

>  b) pmc_get_info/pmc_control
> These are undocumented syscalls apparently used by the pmc(1) tool.
> Is this interface (still) usefull and MI enough to be worth supporting?

these are only supported on x86 currently so i think you should
just ignore this.

thanks!


.mrg.


Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr

2015-06-21 Thread Manuel Bouyer
On Sun, Jun 21, 2015 at 12:41:07PM +0200, Emmanuel Dreyfus wrote:
> Manuel Bouyer  wrote:
> 
> > "tstile" is a generic wait channel used by cv_wait().
> > Any code using cv_wait() could end up stuck here, so your problem
> > may be completely unrelated to vnodes ... or even if it's related
> > to vnodes, is may be a different issue.
> 
> The case I describe with a NFS server that has gone can also happen with
> failed harware: ioflush awaits completion forever while holding a vnode
> lock, and any other process that wants a lock on the vnode gets stuck in
> tstile.

Yes, what I mean is that a process stuck in tstile is not necesserely
related to ioflush, or not even to vnode.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr

2015-06-21 Thread Emmanuel Dreyfus
Manuel Bouyer  wrote:

> "tstile" is a generic wait channel used by cv_wait().
> Any code using cv_wait() could end up stuck here, so your problem
> may be completely unrelated to vnodes ... or even if it's related
> to vnodes, is may be a different issue.

The case I describe with a NFS server that has gone can also happen with
failed harware: ioflush awaits completion forever while holding a vnode
lock, and any other process that wants a lock on the vnode gets stuck in
tstile.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: netbsd32: race condition in swapctl()

2015-06-21 Thread Martin Husemann
Should we make the "native" code use swapsys_lock()/swapsys_unlock() as
well, for consistency?

Martin


Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr

2015-06-21 Thread Manuel Bouyer
On Sat, Jun 20, 2015 at 07:56:29PM -0500, Don Lee wrote:
> FWIW, I have had a problem with my server getting stuck in "tstile". I could 
> not reproduce the problem easily, but I saw it in production often enough 
> that it was a headache.  The Intel port (as opposed to PPC) seems not to have 
> the problem.
> 
> If there is no timeout on this loop, and it theoretically only has a problem 
> on HW errors, I have doubts. The machine with the hangs does not have any 
> other symptoms of HW errors. HOWEVER, I have a persistent suspicion that the 
> PPC port drops interrupts on occasion. Just sayin.
> 
> If this hang happens, I think a panic is far better than a hang. What I would 
> see is the machine lock up hard, with zillions of processes "stuck" in 
> tstile, and no new procs could start. If I caught this early, I could get a 
> couple of ps outputs done. Otherwise, I could get into the kernel debugger - 
> sometimes.

"tstile" is a generic wait channel used by cv_wait().
Any code using cv_wait() could end up stuck here, so your problem
may be completely unrelated to vnodes ... or even if it's related
to vnodes, is may be a different issue.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


netbsd32: race condition in swapctl()

2015-06-21 Thread Maxime Villard

Hi,
the 32bit version of swapctl() calls uvm_swap_stats() without locking
'swap_syscall_lock'. Which means that if you perform a swapctl() call
and at the same time update/add/delete a swap device, you may end up
with a memory corruption.

Here is a patch. Tested on amd64 (with a 32bit binary).

Ok?

Index: compat/netbsd32/netbsd32_netbsd.c
===
RCS file: /cvsroot/src/sys/compat/netbsd32/netbsd32_netbsd.c,v
retrieving revision 1.195
diff -u -r1.195 netbsd32_netbsd.c
--- compat/netbsd32/netbsd32_netbsd.c   16 Jun 2015 10:42:38 -  1.195
+++ compat/netbsd32/netbsd32_netbsd.c   21 Jun 2015 09:13:03 -
@@ -1747,11 +1747,16 @@

if (count < 0)
return EINVAL;
-   if (count == 0 || uvmexp.nswapdev == 0)
-   return 0;
-   /* Make sure userland cannot exhaust kernel memory */
+
+   swapsys_lock(RW_WRITER);
+
if ((size_t)count > (size_t)uvmexp.nswapdev)
count = uvmexp.nswapdev;
+   if (count == 0) {
+   /* No swap device */
+   swapsys_unlock();
+   return 0;
+   }

ksep_len = sizeof(*ksep) * count;
ksep = kmem_alloc(ksep_len, KM_SLEEP);
@@ -1760,6 +1765,8 @@
uvm_swap_stats(SWAP_STATS, ksep, count, retval);
count = *retval;

+   swapsys_unlock();
+
for (i = 0; i < count; i++) {
se32.se_dev = ksep[i].se_dev;
se32.se_flags = ksep[i].se_flags;
Index: uvm/uvm_swap.c
===
RCS file: /cvsroot/src/sys/uvm/uvm_swap.c,v
retrieving revision 1.172
diff -u -r1.172 uvm_swap.c
--- uvm/uvm_swap.c  25 Jul 2014 08:10:40 -  1.172
+++ uvm/uvm_swap.c  21 Jun 2015 09:13:03 -
@@ -430,6 +430,15 @@
return NULL;
 }

+void swapsys_lock(krw_t op)
+{
+   rw_enter(&swap_syscall_lock, op);
+}
+
+void swapsys_unlock(void)
+{
+   rw_exit(&swap_syscall_lock);
+}

 /*
  * sys_swapctl: main entry point for swapctl(2) system call
@@ -741,6 +750,8 @@
struct swapdev *sdp;
int count = 0;

+   KASSERT(rw_lock_held(&swap_syscall_lock));
+
LIST_FOREACH(spp, &swap_priority, spi_swappri) {
TAILQ_FOREACH(sdp, &spp->spi_swapdev, swd_next) {
int inuse;
Index: uvm/uvm_swap.h
===
RCS file: /cvsroot/src/sys/uvm/uvm_swap.h,v
retrieving revision 1.20
diff -u -r1.20 uvm_swap.h
--- uvm/uvm_swap.h  3 Feb 2014 13:20:21 -   1.20
+++ uvm/uvm_swap.h  21 Jun 2015 09:13:03 -
@@ -48,7 +48,10 @@
 void   uvm_swap_free(int, int);
 void   uvm_swap_markbad(int, int);
 bool   uvm_swapisfull(void);
+void   swapsys_lock(krw_t);
+void   swapsys_unlock(void);
 void   uvm_swap_stats(int, struct swapent *, int, register_t *);
+
 #else /* defined(VMSWAP) */
 #defineuvm_swapisfull()true
 #define uvm_swap_stats(c, sep, count, retval) { *retval = 0; }


32bit compat NFS server and PMC syscalls

2015-06-21 Thread Martin Husemann
I've been looking at adding the missing things to compat/netbsd32
recently, mainly to make the default N32 userland on mips64 more usefull
and able to run our full test suite.

Most missing pieces are just oversights/lazyness and easy to fill in.

However, I wonder about

 a) nfssvc (i.e. the nfsd kernel part) - does it make sense to use that?
Alternatively we could build the nfsd binary as N64 on mips64 (like
we need to do for some of the essential kvm grovelers)

 b) pmc_get_info/pmc_control
These are undocumented syscalls apparently used by the pmc(1) tool.
Is this interface (still) usefull and MI enough to be worth supporting?


Martin


Re: VOP_PUTPAGE ignores mount_nfs -o soft,intr

2015-06-21 Thread Emmanuel Dreyfus
Christos Zoulas  wrote:

> Well, I think that this is a powerpc specific problem. Unfortunately it
> has been elusive... The tstile in the cv_wait() of the ioflush thread
> is common to all platforms (when the filesystem is unable to handle
> flushing a vnode).

Here is a script that reproduces the problem:

#!/bin/sh -ex

mkdir -p /nfstest/tmp
chmod 1777 /nfstest/tmp
grep '^/nfstest' /etc/exports ||
echo "/nfstest localhost" >> /etc/exports

/etc/rc.d/mountd forcestart || true
/etc/rc.d/nfsd forcestart || true

mount -t nfs -o rw,soft,intr,tcp,-R=2 localhost:/nfstest /mnt
dd if=/dev/zero of=/mnt/tmp/test bs=1024k &
sleep 1
/etc/rc.d/nfsd onestop || true

umount -f -R /mnt &
ps -axlp $!




-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org