Re: Solving the last piece of the uvm_pageqlock problem

2019-12-24 Thread Joerg Sonnenberger
On Tue, Dec 24, 2019 at 10:08:01PM +, Andrew Doran wrote:
> This is a diff against a tree containing the allocator patch I posted
> previously:
> 
>   http://www.netbsd.org/~ad/2019/pdpol.diff

I wanted to give this a spin before travelling, but it doesn't survive
very long here. I get NULL pointer derefs in
uvmpdpol_pageactivate_locked, coming from uvmpdpool_pageintent_realize.
That's within seconds of the scan phase of a bulk build.

Joerg


Solving the last piece of the uvm_pageqlock problem

2019-12-24 Thread Andrew Doran
This is a diff against a tree containing the allocator patch I posted
previously:

http://www.netbsd.org/~ad/2019/pdpol.diff

The idea here is to buffer updates to the page's status (active, inactive,
dequeued) and then sync those to the pdpolicy / pagedaemon queues regularly,
a bit like the way the file system syncer works.  Notes:

- Since uvm_pageqlock was replaced with pg->interlock & a private lock for
  the pdpolicy code, pages can occasionally appear on a pdpolicy queue when
  they shouldn't be considered for pageout & reclaim (if the pagedaemon and
  object owner race), but it's not a problem because the pagedaemon can take
  pg->interlock and determine that a page is wired or in a state of flux or
  whatever, and so should be ignored because it'll be gone from the queues
  soon.

- This patch takes it a little further.  The pdpolicy code gets a dedicated
  TAILQ_ENTRY in struct vm_page so it doesn't need to share with the page
  allocator.  A page can be PG_FREE and still on a pdpolicy queue (but not
  for long).  We set an intended state for the page on pg->pqflags using
  atomics (active, inactive, dequeued) and then those pages are queued in a
  per-CPU buffer for their status updates to be purged and made real in the
  pdpol code's global state at some point in the near future.

- The pagedaemon can also see those updates in real time by inspecting
  pg->pqflags and make real the page's status.  So basically what I'm doing
  is batching the updates, trying to not let the global state fall too far
  behind, and always give the pagedaemon enough information to know the true
  picture for individual pages when it does its labourious scan of the
  queues, even if viewed globally the queues are a little bit behind.

This seems to work really well, I think because a page can have multiple
state transitions while it's in a queue waiting for its intended status
change to be purged and made global.

Shortly before composing this e-mail it occurred to me that FreeBSD may do
something similar but to be honest I didn't dig into their code.

I need to tweak this to allocate a smaller buffer for uniprocessor systems
and maybe consider using prefetching instructions when purging, and want to
re-run the tests because I changed a couple of things but I'm basically
happy with it.

Results on my kernel build test:

72.66 real  1653.86 user   593.19 sys   new allocator
71.26 real  1671.13 user   502.94 sys   new allocator + pdpol.diff

Lock contention before and after:

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
 28.86 44056935  77553.77 pdpol_state
 15.62 22177251  41978.93 pdpol_stateuvmpdpol_pageactivate+36
 13.12 21656129  35251.99 pdpol_stateuvmpdpol_pagedequeue+18
  0.12  223482322.77 pdpol_stateuvmpdpol_pagedeactivate+18
  0.00  73  0.07 pdpol_stateuvmpdpol_pageenqueue+18

Total%  Count   Time/ms  Lock   Caller
-- --- - -- --
  0.23   11301362.35 pdpol_stateuvmpdpol_pageintent_set+b9

Andrew


Re: Synchronization protocol around TODR / time-related variables

2019-12-24 Thread Andrew Doran
On Mon, Dec 23, 2019 at 08:06:04PM -0800, Jason Thorpe wrote:

> Now let's talk about settime1() in kern_time.c.  It does a few things that
> seem dicey vis a vis modifying global state without proper serialization,
> and just about the entire function is wrapped in an splclock()/splx()
> pair.
>
> First of all, it's not clear to me at all that splclock()/splx() is really
> needed here...  *maybe* across the call to tc_setclock()?  But the mutex
> it takes internally is an IPL_HIGH spin mutex, so splclock() seems
> superfluous, just a legacy hold-over. 

The splclock() doesn't buy us anything as far as I can tell.

IPL_HIGH for the timecounter lock irks me given that the clock interrupt is
IPL_SCHED but I have nothing to back that up.  In any case I suppose it's a
bit like kernel printf(), people are going to call from wherever they like
even if it's somewhere we'd prefer not and that's understandable.

> The only thing that requires some synchronization is modifying "boottime",
> but maybe it's not super critical (all of the consumers of this variable
> would need significant cleanup.

Looks like it wouldn't be too hard to wrap this in a function to return it
and at least keep the problem in one place.  Ah ha!  When searching for
boottime on OpenGrok I found a FreeBSDism in the NFS code that we pulled in:
getboottime().  Digging into it a bit further, it seems they have managed
this using the same concurrency strategy used for timecounters:

http://fxr.watson.org/fxr/source/kern/kern_tc.c#L506

Andrew


Re: Adaptec 6445 not supported?

2019-12-24 Thread Dima Veselov

21.12.2019 15:49, Andrew Doran пишет:

Using NetBSD 8.1, the NetBSD kernel boots up with this:

Dec 21 10:53:56 unix /netbsd: vendor 9005 product 028b (RAID mass
storage, revision 0x01) at pci1 dev 0 function 0 not configured


Some Adaptec adapters are supported but not listed in pcidevs.
I believe 6-series is not supported.


https://storage.microsemi.com/en-us/support/raid/sas_raid/sas-6445/

Under "Unix", there's no "NetBSD', only "FreeBSD".

Is that driver supplied to FreeBSD as a binary or can it be ported?


It looks to me like the "aacraid" driver in FreeBSD handles this, and that
looks to be an evolution of the "aac" driver.  The driver would not be too
hard to port I reckon.


I am the one waiting for that for long time, because Adaptec is a great
piece of hardware. I remember most problem of porting was the second
register used in modern Adaptec controllers, which is completely unhandled
in NetBSD code.


We used to get engineering samples from Mark Salyzyn at DPT/Adaptec every
time they released something significant but he has long left the company.


I will set up development box or send any hardware to a person feeling
ready to accomplish this task.

--
Dima Veselov
Physics R Establishment of Saint-Petersburg University


Re: [PATCH v5 3/4] Combine x86 register tests into unified test function

2019-12-24 Thread Christos Zoulas
In article <20191224114717.302420-3-mgo...@gentoo.org>,
MichaŠ Górny   wrote:
>Reduce the code duplication and improve maintainability of x86 register
>tests by combining all of them to a single base function.
>---
> tests/lib/libc/sys/t_ptrace_amd64_wait.h |  406 +---
> tests/lib/libc/sys/t_ptrace_i386_wait.h  |  335 +--
> tests/lib/libc/sys/t_ptrace_x86_wait.h   | 2417 ++
> 3 files changed, 1103 insertions(+), 2055 deletions(-)

That's a good start, but I'd also start using loops for some of the
repetitive code for brevity.

christos



[PATCH v5 3/4] Combine x86 register tests into unified test function

2019-12-24 Thread Michał Górny
Reduce the code duplication and improve maintainability of x86 register
tests by combining all of them to a single base function.
---
 tests/lib/libc/sys/t_ptrace_amd64_wait.h |  406 +---
 tests/lib/libc/sys/t_ptrace_i386_wait.h  |  335 +--
 tests/lib/libc/sys/t_ptrace_x86_wait.h   | 2417 ++
 3 files changed, 1103 insertions(+), 2055 deletions(-)

diff --git a/tests/lib/libc/sys/t_ptrace_amd64_wait.h 
b/tests/lib/libc/sys/t_ptrace_amd64_wait.h
index 1ea17ea1ec1a..0f410f3600d0 100644
--- a/tests/lib/libc/sys/t_ptrace_amd64_wait.h
+++ b/tests/lib/libc/sys/t_ptrace_amd64_wait.h
@@ -111,415 +111,11 @@ ATF_TC_BODY(x86_64_regs1, tc)
TWAIT_REQUIRE_FAILURE(ECHILD, wpid = TWAIT_GENERIC(child, , 0));
 }
 
-ATF_TC(x86_64_regs_gp_read);
-ATF_TC_HEAD(x86_64_regs_gp_read, tc)
-{
-   atf_tc_set_md_var(tc, "descr",
-   "Set general-purpose reg values from debugged program and read "
-   "them via PT_GETREGS, comparing values against expected.");
-}
-
-ATF_TC_BODY(x86_64_regs_gp_read, tc)
-{
-   const int exitval = 5;
-   pid_t child, wpid;
-#if defined(TWAIT_HAVE_STATUS)
-   const int sigval = SIGTRAP;
-   int status;
-#endif
-   struct reg gpr;
-
-   const uint64_t rax = 0x0001020304050607;
-   const uint64_t rbx = 0x1011121314151617;
-   const uint64_t rcx = 0x2021222324252627;
-   const uint64_t rdx = 0x3031323334353637;
-   const uint64_t rsi = 0x4041424344454647;
-   const uint64_t rdi = 0x5051525354555657;
-   const uint64_t rsp = 0x6061626364656667;
-   const uint64_t rbp = 0x7071727374757677;
-
-   DPRINTF("Before forking process PID=%d\n", getpid());
-   SYSCALL_REQUIRE((child = fork()) != -1);
-   if (child == 0) {
-   DPRINTF("Before calling PT_TRACE_ME from child %d\n", getpid());
-   FORKEE_ASSERT(ptrace(PT_TRACE_ME, 0, NULL, 0) != -1);
-
-   DPRINTF("Before running assembly from child\n");
-
-   __asm__ __volatile__(
-   /* rbp & rbp are a bit tricky, we must not clobber them 
*/
-   "movq%%rsp, %%r8\n\t"
-   "movq%%rbp, %%r9\n\t"
-   "movq%6, %%rsp\n\t"
-   "movq%7, %%rbp\n\t"
-   "\n\t"
-   "int3\n\t"
-   "\n\t"
-   "movq%%r8, %%rsp\n\t"
-   "movq%%r9, %%rbp\n\t"
-   :
-   : "a"(rax), "b"(rbx), "c"(rcx), "d"(rdx), "S"(rsi), 
"D"(rdi),
- "i"(rsp), "i"(rbp)
-   : "%r8", "%r9"
-   );
-
-   DPRINTF("Before exiting of the child process\n");
-   _exit(exitval);
-   }
-   DPRINTF("Parent process PID=%d, child's PID=%d\n", getpid(), child);
-
-   DPRINTF("Before calling %s() for the child\n", TWAIT_FNAME);
-   TWAIT_REQUIRE_SUCCESS(wpid = TWAIT_GENERIC(child, , 0), child);
-
-   validate_status_stopped(status, sigval);
-
-   DPRINTF("Call GETREGS for the child process\n");
-   SYSCALL_REQUIRE(ptrace(PT_GETREGS, child, , 0) != -1);
-
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RAX], rax);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RBX], rbx);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RCX], rcx);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RDX], rdx);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RSI], rsi);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RDI], rdi);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RSP], rsp);
-   ATF_CHECK_EQ((uint64_t)gpr.regs[_REG_RBP], rbp);
-
-   DPRINTF("Before resuming the child process where it left off and "
-   "without signal to be sent\n");
-   SYSCALL_REQUIRE(ptrace(PT_CONTINUE, child, (void *)1, 0) != -1);
-
-   DPRINTF("Before calling %s() for the child\n", TWAIT_FNAME);
-   TWAIT_REQUIRE_SUCCESS(wpid = TWAIT_GENERIC(child, , 0), child);
-
-   validate_status_exited(status, exitval);
-
-   DPRINTF("Before calling %s() for the child\n", TWAIT_FNAME);
-   TWAIT_REQUIRE_FAILURE(ECHILD, wpid = TWAIT_GENERIC(child, , 0));
-}
-
-ATF_TC(x86_64_regs_gp_write);
-ATF_TC_HEAD(x86_64_regs_gp_write, tc)
-{
-   atf_tc_set_md_var(tc, "descr",
-   "Set general-purpose reg values into a debugged program via "
-   "PT_SETREGS and compare the result against expected.");
-}
-
-ATF_TC_BODY(x86_64_regs_gp_write, tc)
-{
-   const int exitval = 5;
-   pid_t child, wpid;
-#if defined(TWAIT_HAVE_STATUS)
-   const int sigval = SIGTRAP;
-   int status;
-#endif
-   struct reg gpr;
-
-   const uint64_t rax = 0x0001020304050607;
-   const uint64_t rbx = 0x1011121314151617;
-   const uint64_t rcx = 0x2021222324252627;
-   const uint64_t rdx = 0x3031323334353637;
-   const uint64_t rsi = 0x4041424344454647;
-   const uint64_t rdi = 0x5051525354555657;
-   

[PATCH v5 1/4] Include XSTATE note in x86 core dumps

2019-12-24 Thread Michał Górny
Introduce a simple COREDUMP_MACHDEP_LWP_NOTES logic to provide machdep
API for injecting per-LWP notes into coredumps, and use it to append
PT_GETXSTATE note.

Since the XSTATE block uses the same format on i386 and amd64, the code
does not have to conditionalize between 32-bit and 64-bit ELF format
on that.  However, it does need to distinguish between 32-bit and 64-bit
PT_* values.  In order to do that, it reuses PT32_* constant already
present for ptrace(), and adds a matching PT64_GETXSTATE to satisfy
the cpp logic.
---
 sys/arch/amd64/include/ptrace.h | 16 
 sys/arch/i386/include/ptrace.h  | 12 
 sys/kern/core_elf32.c   |  6 +-
 3 files changed, 33 insertions(+), 1 deletion(-)

Changes in v5:
- used roundup() in patch 2

diff --git a/sys/arch/amd64/include/ptrace.h b/sys/arch/amd64/include/ptrace.h
index f9a98a63c2b1..987a13923b85 100644
--- a/sys/arch/amd64/include/ptrace.h
+++ b/sys/arch/amd64/include/ptrace.h
@@ -105,6 +105,22 @@ int process_machdep_validfpu(struct proc *);
 MODULE_HOOK(netbsd32_process_doxmmregs_hook, int,
 (struct lwp *, struct lwp *, void *, bool));
 
+#ifdef EXEC_ELF32
+#include 
+#endif
+#define PT64_GETXSTATE PT_GETXSTATE
+#define COREDUMP_MACHDEP_LWP_NOTES(l, ns, name)
\
+{  \
+   struct xstate xstate;   \
+   memset(, 0, sizeof(xstate)); \
+   if (!process_read_xstate(l, ))   \
+   {   \
+   ELFNAMEEND(coredump_savenote)(ns,   \
+   CONCAT(CONCAT(PT, ELFSIZE), _GETXSTATE), name,  \
+   , sizeof(xstate));   \
+   }   \
+}
+
 #endif /* _KERNEL */
 
 #ifdef _KERNEL_OPT
diff --git a/sys/arch/i386/include/ptrace.h b/sys/arch/i386/include/ptrace.h
index 41ad7a119f0b..561be6440ac0 100644
--- a/sys/arch/i386/include/ptrace.h
+++ b/sys/arch/i386/include/ptrace.h
@@ -159,6 +159,18 @@
{ DT_REG, N("xmmregs"), Pmachdep_xmmregs,   \
  procfs_machdep_validxmmregs },
 
+#define COREDUMP_MACHDEP_LWP_NOTES(l, ns, name)
\
+{  \
+   struct xstate xstate;   \
+   memset(, 0, sizeof(xstate)); \
+   if (!process_read_xstate(l, ))   \
+   {   \
+   ELFNAMEEND(coredump_savenote)(ns,   \
+   CONCAT(CONCAT(PT, ELFSIZE), _GETXSTATE), name,  \
+   , sizeof(xstate));   \
+   }   \
+}
+
 struct xmmregs;
 
 /* Functions used by both ptrace(2) and procfs. */
diff --git a/sys/kern/core_elf32.c b/sys/kern/core_elf32.c
index 7db43d0cc10b..95e5658053c8 100644
--- a/sys/kern/core_elf32.c
+++ b/sys/kern/core_elf32.c
@@ -506,7 +506,11 @@ ELFNAMEEND(coredump_note)(struct lwp *l, struct note_state 
*ns)
 
ELFNAMEEND(coredump_savenote)(ns, PT_GETFPREGS, name, , freglen);
 #endif
-   /* XXX Add hook for machdep per-LWP notes. */
+
+#ifdef COREDUMP_MACHDEP_LWP_NOTES
+   COREDUMP_MACHDEP_LWP_NOTES(l, ns, name);
+#endif
+
return (0);
 }
 
-- 
2.24.1



[PATCH v5 4/4] Add tests for reading registers from x86 core dumps

2019-12-24 Thread Michał Górny
---
 tests/lib/libc/sys/t_ptrace_x86_wait.h | 187 +++--
 1 file changed, 141 insertions(+), 46 deletions(-)

diff --git a/tests/lib/libc/sys/t_ptrace_x86_wait.h 
b/tests/lib/libc/sys/t_ptrace_x86_wait.h
index 4572e3eb8b23..7867f1e8dd1c 100644
--- a/tests/lib/libc/sys/t_ptrace_x86_wait.h
+++ b/tests/lib/libc/sys/t_ptrace_x86_wait.h
@@ -2225,7 +2225,8 @@ enum x86_test_registers {
 
 enum x86_test_regmode {
TEST_GETREGS,
-   TEST_SETREGS
+   TEST_SETREGS,
+   TEST_COREDUMP
 };
 
 static __inline void get_gp32_regs(union x86_test_register out[])
@@ -2687,8 +2688,10 @@ x86_register_test(enum x86_test_regset regset, enum 
x86_test_registers regs,
 #endif
struct xstate xst;
struct iovec iov;
-   struct fxsave* fxs;
+   struct fxsave* fxs = NULL;
uint64_t xst_flags = 0;
+   char core_path[] = "/tmp/core.XX";
+   int core_fd;
 
const union x86_test_register expected[] __aligned(32) = {
{{ 0x0706050403020100, 0x0F0E0D0C0B0A0908,
@@ -2796,6 +2799,7 @@ x86_register_test(enum x86_test_regset regset, enum 
x86_test_registers regs,
DPRINTF("Before running assembly from child\n");
switch (regmode) {
case TEST_GETREGS:
+   case TEST_COREDUMP:
switch (regs) {
case GPREGS_32:
set_gp32_regs(expected);
@@ -2969,40 +2973,7 @@ x86_register_test(enum x86_test_regset regset, enum 
x86_test_registers regs,
 
validate_status_stopped(status, sigval);
 
-   switch (regset) {
-   case TEST_GPREGS:
-   ATF_REQUIRE(regs < FPREGS_MM);
-   DPRINTF("Call GETREGS for the child process\n");
-   SYSCALL_REQUIRE(ptrace(PT_GETREGS, child, , 0) != -1);
-   break;
-   case TEST_XMMREGS:
-#if defined(__i386__)
-   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_YMM);
-   DPRINTF("Call GETXMMREGS for the child process\n");
-   SYSCALL_REQUIRE(ptrace(PT_GETXMMREGS, child, , 0) != -1);
-   fxs = 
-   break;
-#else
-   /*FALLTHROUGH*/
-#endif
-   case TEST_FPREGS:
-#if defined(__x86_64__)
-   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_YMM);
-   fxs = 
-#else
-   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_XMM);
-#endif
-   DPRINTF("Call GETFPREGS for the child process\n");
-   SYSCALL_REQUIRE(ptrace(PT_GETFPREGS, child, , 0) != -1);
-   break;
-   case TEST_XSTATE:
-   ATF_REQUIRE(regs >= FPREGS_MM);
-   iov.iov_base = 
-   iov.iov_len = sizeof(xst);
-
-   DPRINTF("Call GETXSTATE for the child process\n");
-   SYSCALL_REQUIRE(ptrace(PT_GETXSTATE, child, , 0) != -1);
-
+   if (regset == TEST_XSTATE) {
switch (regs) {
case FPREGS_MM:
xst_flags |= XCR0_X87;
@@ -3020,21 +2991,117 @@ x86_register_test(enum x86_test_regset regset, enum 
x86_test_registers regs,
__unreachable();
break;
}
+   }
 
-   ATF_REQUIRE((xst.xs_rfbm & xst_flags) == xst_flags);
-   switch (regmode) {
-   case TEST_SETREGS:
-   xst.xs_rfbm = xst_flags;
-   xst.xs_xstate_bv = xst_flags;
+   switch (regmode) {
+   case TEST_GETREGS:
+   case TEST_SETREGS:
+   switch (regset) {
+   case TEST_GPREGS:
+   ATF_REQUIRE(regs < FPREGS_MM);
+   DPRINTF("Call GETREGS for the child process\n");
+   SYSCALL_REQUIRE(ptrace(PT_GETREGS, child, , 0)
+   != -1);
break;
-   case TEST_GETREGS:
+   case TEST_XMMREGS:
+#if defined(__i386__)
+   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_YMM);
+   DPRINTF("Call GETXMMREGS for the child process\n");
+   SYSCALL_REQUIRE(ptrace(PT_GETXMMREGS, child, , 0)
+   != -1);
+   fxs = 
+   break;
+#else
+   /*FALLTHROUGH*/
+#endif
+   case TEST_FPREGS:
+#if defined(__x86_64__)
+   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_YMM);
+   fxs = 
+#else
+   ATF_REQUIRE(regs >= FPREGS_MM && regs < FPREGS_XMM);
+#endif
+   DPRINTF("Call GETFPREGS for the child process\n");
+   SYSCALL_REQUIRE(ptrace(PT_GETFPREGS, child, , 0)
+   != -1);
+   break;
+   case TEST_XSTATE:
+   ATF_REQUIRE(regs >= FPREGS_MM);
+   iov.iov_base = 
+ 

[PATCH v5 2/4] Fix alignment when reading core notes

2019-12-24 Thread Michał Górny
Both desc and note header needs to be aligned.  Therefore, we need
to realign after skipping past desc as well.

While at it, fix the other alignment fix to use roundup() macro.
---
 tests/lib/libc/sys/t_ptrace_wait.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Changes in v5:
- use roundup() to align offsets.

diff --git a/tests/lib/libc/sys/t_ptrace_wait.c 
b/tests/lib/libc/sys/t_ptrace_wait.c
index 6fc43572e015..b20c94295326 100644
--- a/tests/lib/libc/sys/t_ptrace_wait.c
+++ b/tests/lib/libc/sys/t_ptrace_wait.c
@@ -7590,8 +7590,7 @@ static ssize_t core_find_note(const char *core_path,
 
offset += note_hdr.n_namesz;
/* fix to alignment */
-   offset = ((offset + core_hdr.p_align - 1)
-   / core_hdr.p_align) * core_hdr.p_align;
+   offset = roundup(offset, core_hdr.p_align);
 
/* if name & type matched above */
if (ret != -1) {
@@ -7603,6 +7602,8 @@ static ssize_t core_find_note(const char *core_path,
}
 
offset += note_hdr.n_descsz;
+   /* fix to alignment */
+   offset = roundup(offset, core_hdr.p_align);
}
}
 
-- 
2.24.1