Re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread Paul Goyette
The commit that introduced the new symbol should also have bumped the 
kernel version...  That's how we keep modules and kernel in sync...



On Tue, 28 May 2019, m...@netbsd.org wrote:


Found the commit - looks like newer modules than kernel.
https://v4.freshbsd.org/commit/netbsd/src/IH8Jag0YCI3N6boB

!DSPAM:5cedc4b046711245568741!




++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread maya
Found the commit - looks like newer modules than kernel.
https://v4.freshbsd.org/commit/netbsd/src/IH8Jag0YCI3N6boB


Re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread Paul Goyette

On Tue, 28 May 2019, m...@netbsd.org wrote:


On Tue, May 28, 2019 at 08:27:20PM +0200, Petr Topiarz wrote:

May 28 18:55:46 poweredge /netbsd: [ 236.3881944] kobj_checksyms, 988:
[zfs]: linker error: symbol `disk_rename' not found


Usually this happens if kernel and modules are mismatched.
I'm not sure what happened to cause it, but building your kernel and
modules from scratch from the same source will fix it.

It's also possible something else is worng (but less likely).


More likely there's a module dependency that is not recorded in the zfs 
module.


See if you can figure out where disk_rename would normally be defined, 
and determine if it is included in your kernel, or in some other 
non-built-in module.



++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread J. Hannken-Illjes
Petr,

your kernel is elder than your ZFS module.

Please update to a current kernel and try again.

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig



> On 28. May 2019, at 20:27, Petr Topiarz  wrote:
> 
> Hi Tech-kern,
> 
> I run two machines with NetBSD amd64 with ZFS, one is with 8.99.34 kernel 
> from february,
> 
> the other is the latest today, 201905260520Z,
> 
> It all runs fine with the first one, but as I upgraded the other, ZFS does 
> not load and tels me:
> 
> modload: zfs: Exec format error
> 
> and to /var/log messages it writes:
> 
> May 28 18:55:46 poweredge /netbsd: [ 236.3881944] kobj_checksyms, 988: [zfs]: 
> linker error: symbol `disk_rename' not found
> May 28 18:55:46 poweredge /netbsd: [ 236.4833169] WARNING: module error: 
> unable to affix module `zfs', error 8
> May 28 18:55:50 poweredge /netbsd: [ 240.2655954] kobj_checksyms, 988: [zfs]: 
> linker error: symbol `disk_rename' not found
> May 28 18:55:50 poweredge /netbsd: [ 240.3599823] WARNING: module error: 
> unable to affix module `zfs', error 8
> May 28 18:56:18 poweredge /netbsd: [ 268.0810981] kobj_checksyms, 988: [zfs]: 
> linker error: symbol `disk_rename' not found
> May 28 18:56:18 poweredge /netbsd: [ 268.1715047] WARNING: module error: 
> unable to affix module `zfs', error 8
> 
> considering configuration I got:
> 
> cat /etc/modules.conf
> solaris
> zfs
> 
> and in /etc/rc.conf I got
> 
> modules=YES
> 
> Any hint where to look or what to reconfigure in the kernel? I am using 
> standard netbsd kernel in both cases.
> 
> thanks
> 
> Petr
> 



signature.asc
Description: Message signed with OpenPGP


Re: ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread maya
On Tue, May 28, 2019 at 08:27:20PM +0200, Petr Topiarz wrote:
> May 28 18:55:46 poweredge /netbsd: [ 236.3881944] kobj_checksyms, 988:
> [zfs]: linker error: symbol `disk_rename' not found

Usually this happens if kernel and modules are mismatched.
I'm not sure what happened to cause it, but building your kernel and
modules from scratch from the same source will fix it.

It's also possible something else is worng (but less likely).


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Kamil Rytarowski
On 28.05.2019 20:08, Martin Husemann wrote:
> On Tue, May 28, 2019 at 07:50:34PM +0200, Micha? Górny wrote:
>> Well, if we are only to consider new registers, then we're talking about
>> 16 'pure' ymm registers + 32 zmm registers + 8 kN registers + 1 state
>> register, multiply by two... 114 PT_* requests?
> 
> Integers are plenty, but the core file format issue makes this aproach
> unusable anyway.
> 
> Still I think we should not create too many random processor register groups.
> 

For performance reasons PT_GETREGS should be grouped into a single
system call and separated from FPU/DB/similar/ones.

FPU registers can be handled separately. I defer the decision to x86
maintainer how to process them.

As mentioned, there is no performance issue here (1-2-3 calls make no
difference).

> We already have very strange ones (XMMREGS and VECREGS). Maybe we should just
> have one ALLREGS thing (identical to the core note) and then discuss how
> to properly make that sanely versioned and self describing?
> 

In core there is no ALLREGS, but separated register sets. ALLREGS has
some issues as e.g. DB registers have different policy with accessing
them from userland, it's not part of core(5) file, not part of mcontext.

Accessing individual registers is similar to what we had originally in
pre-NetBSD times.

There was a struct shared between userland debugger and kernel (so
called userdata). There were dedicated calls to access the registers
that struct with PT_READ_U/PT_WRITE_U, but as it didn't match modern
kernels it was abandoned back in the days.

Something similar with per-register access can still be achieved with
reading /proc/*/*regs (but likely it has 0 users.. and might not work).

> Martin
> 




signature.asc
Description: OpenPGP digital signature


ZFS works on 8.99.34 but fails on 201905260520Z

2019-05-28 Thread Petr Topiarz

Hi Tech-kern,

I run two machines with NetBSD amd64 with ZFS, one is with 8.99.34 
kernel from february,


the other is the latest today, 201905260520Z,

It all runs fine with the first one, but as I upgraded the other, ZFS 
does not load and tels me:


modload: zfs: Exec format error

and to /var/log messages it writes:

May 28 18:55:46 poweredge /netbsd: [ 236.3881944] kobj_checksyms, 988: 
[zfs]: linker error: symbol `disk_rename' not found
May 28 18:55:46 poweredge /netbsd: [ 236.4833169] WARNING: module error: 
unable to affix module `zfs', error 8
May 28 18:55:50 poweredge /netbsd: [ 240.2655954] kobj_checksyms, 988: 
[zfs]: linker error: symbol `disk_rename' not found
May 28 18:55:50 poweredge /netbsd: [ 240.3599823] WARNING: module error: 
unable to affix module `zfs', error 8
May 28 18:56:18 poweredge /netbsd: [ 268.0810981] kobj_checksyms, 988: 
[zfs]: linker error: symbol `disk_rename' not found
May 28 18:56:18 poweredge /netbsd: [ 268.1715047] WARNING: module error: 
unable to affix module `zfs', error 8


considering configuration I got:

cat /etc/modules.conf
solaris
zfs

and in /etc/rc.conf I got

modules=YES

Any hint where to look or what to reconfigure in the kernel? I am using 
standard netbsd kernel in both cases.


thanks

Petr



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Jason Thorpe



> On May 28, 2019, at 11:16 AM, Michał Górny  wrote:
> 
>> We already have very strange ones (XMMREGS and VECREGS). Maybe we should just
>> have one ALLREGS thing (identical to the core note) and then discuss how
>> to properly make that sanely versioned and self describing?
>> 
> 
> That is somewhat the idea of option b., and what Intel seem to be
> aiming it.  With the exception of the two old register groups being left
> separate, PT_*XSTATE would cover everything else.

At the very least, the #ifdef around FP regs in the core dump code should 
probably be made into a proper MD hook in any case, so that the individual 
platform can decide which and how many of such notes to write out.

-- thorpej



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Michał Górny
On Tue, 2019-05-28 at 20:08 +0200, Martin Husemann wrote:
> On Tue, May 28, 2019 at 07:50:34PM +0200, Micha? Górny wrote:
> > Well, if we are only to consider new registers, then we're talking about
> > 16 'pure' ymm registers + 32 zmm registers + 8 kN registers + 1 state
> > register, multiply by two... 114 PT_* requests?
> 
> Integers are plenty, but the core file format issue makes this aproach
> unusable anyway.
> 
> Still I think we should not create too many random processor register groups.
> 
> We already have very strange ones (XMMREGS and VECREGS). Maybe we should just
> have one ALLREGS thing (identical to the core note) and then discuss how
> to properly make that sanely versioned and self describing?
> 

That is somewhat the idea of option b., and what Intel seem to be
aiming it.  With the exception of the two old register groups being left
separate, PT_*XSTATE would cover everything else.


-- 
Best regards,
Michał Górny



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Martin Husemann
On Tue, May 28, 2019 at 07:50:34PM +0200, Micha? Górny wrote:
> Well, if we are only to consider new registers, then we're talking about
> 16 'pure' ymm registers + 32 zmm registers + 8 kN registers + 1 state
> register, multiply by two... 114 PT_* requests?

Integers are plenty, but the core file format issue makes this aproach
unusable anyway.

Still I think we should not create too many random processor register groups.

We already have very strange ones (XMMREGS and VECREGS). Maybe we should just
have one ALLREGS thing (identical to the core note) and then discuss how
to properly make that sanely versioned and self describing?

Martin


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Martin Husemann
On Tue, May 28, 2019 at 10:54:45AM -0700, Jason Thorpe wrote:
> The registers are dumped in an ELF note in the same format that
> ptrace gets.  We don't currently handle anything other than integer
> registers and basic FP registers in core files at the moment.  Look for
> "coredump_note" in sys/kern/core_elf32.c

Ugh - that makes the problem a bit different.

Martin


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Jason Thorpe



> On May 28, 2019, at 10:48 AM, Martin Husemann  wrote:
>> It would make things a bit awkward for core files.
> 
> Please excuse my ignorance, but how is ptrace(2) related to core files?

The registers are dumped in an ELF note in the same format that ptrace gets.  
We don't currently handle anything other than integer registers and basic FP 
registers in core files at the moment.  Look for "coredump_note" in 
sys/kern/core_elf32.c

-- thorpej



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Michał Górny
On Tue, 2019-05-28 at 19:37 +0200, Martin Husemann wrote:
> Stupid question: since this is all very rare and non-performance critical,
> why isn't it done as a single register per call? Adding more registers
> when thy arrive in newer cpu variants, and not worrying about how they
> are saved (XSAVE or similar) nor what format is used in the kernel?
> 
> So a debugger would do a single PT_GETREG_$REGISTER call from userland
> for each register it is interested in. The current state with register
> groups (regs, fpregs, dbregs, xmmmregs, vecregs, ...) looks like a hack
> to me.
> 
> The first two make some sense (across various cpu architectures), and
> maybe performance and debug registers could be a third reasonable group
> (especially given the security implications), but everything else is
> arbitrary and may just as well be individual registers.
> 

Well, if we are only to consider new registers, then we're talking about
16 'pure' ymm registers + 32 zmm registers + 8 kN registers + 1 state
register, multiply by two... 114 PT_* requests?

-- 
Best regards,
Michał Górny



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Martin Husemann
On Tue, May 28, 2019 at 10:46:44AM -0700, Jason Thorpe wrote:
> 
> > On May 28, 2019, at 10:37 AM, Martin Husemann  wrote:
> > 
> > Stupid question: since this is all very rare and non-performance critical,
> > why isn't it done as a single register per call? Adding more registers
> > when thy arrive in newer cpu variants, and not worrying about how they
> > are saved (XSAVE or similar) nor what format is used in the kernel?
> 
> It would make things a bit awkward for core files.

Please excuse my ignorance, but how is ptrace(2) related to core files?

Martin


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Jason Thorpe


> On May 28, 2019, at 10:37 AM, Martin Husemann  wrote:
> 
> Stupid question: since this is all very rare and non-performance critical,
> why isn't it done as a single register per call? Adding more registers
> when thy arrive in newer cpu variants, and not worrying about how they
> are saved (XSAVE or similar) nor what format is used in the kernel?

It would make things a bit awkward for core files.

-- thorpej



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Martin Husemann
Stupid question: since this is all very rare and non-performance critical,
why isn't it done as a single register per call? Adding more registers
when thy arrive in newer cpu variants, and not worrying about how they
are saved (XSAVE or similar) nor what format is used in the kernel?

So a debugger would do a single PT_GETREG_$REGISTER call from userland
for each register it is interested in. The current state with register
groups (regs, fpregs, dbregs, xmmmregs, vecregs, ...) looks like a hack
to me.

The first two make some sense (across various cpu architectures), and
maybe performance and debug registers could be a third reasonable group
(especially given the security implications), but everything else is
arbitrary and may just as well be individual registers.

Martin


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Michał Górny
On Tue, 2019-05-28 at 19:26 +0200, Kamil Rytarowski wrote:
> On 28.05.2019 18:34, Michał Górny wrote:
> > There is no difference in internal layout or logic between b. and c.
> > In either case, we need to perform XSAVE, process it and copy the data
> > into internal structure.  The only difference is that in b. we handle it
> > all in one request, and in c. we do three requests copying different
> > parts of XSAVE to three different buffers.
> > 
> 
> I see. So (b) and (c) are the same except that XSAVE is a struct of a
> dynamic size with normalized registers instead of explicit AVX, AVX512
> calls.
> 

The only real difference is the amount of work when adding new register
types.

With b., you just add a new field to the struct and some code to x86
that copies data to that field.

With c., you need to add the new PT_* request with all associated
functions.


-- 
Best regards,
Michał Górny



Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Kamil Rytarowski
On 28.05.2019 18:34, Michał Górny wrote:
> There is no difference in internal layout or logic between b. and c.
> In either case, we need to perform XSAVE, process it and copy the data
> into internal structure.  The only difference is that in b. we handle it
> all in one request, and in c. we do three requests copying different
> parts of XSAVE to three different buffers.
> 

I see. So (b) and (c) are the same except that XSAVE is a struct of a
dynamic size with normalized registers instead of explicit AVX, AVX512
calls.

I'm in favor of (c).

In the past I tried to normalize debug registers in the kernel (I know,
different thing to FPU), but it didn't work well. Instead of making it
safer and easier, it created x86 UB that was difficult to deal with in
the kernel space. It's just easier to just handle such things in
userland and export raw register sets as they are.



signature.asc
Description: OpenPGP digital signature


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Paul Goyette

I'm hoping that whatever solution is arrived at, it does not introduce
any new #ifdef ... #endif variations of structures that might get passed
between kernel and module code.  Such variations create dependencies in
the modules which are at best "difficult" to deal with at run-time.

(We currently don't have a mechanism to syncrhonize building of modules
on a per-kernel-configuration basis...)


++--+---+
| Paul Goyette   | PGP Key fingerprint: | E-mail addresses: |
| (Retired)  | FA29 0E3B 35AF E8AE 6651 | p...@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoye...@netbsd.org   |
++--+---+


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Michał Górny
On Tue, 2019-05-28 at 18:08 +0200, Kamil Rytarowski wrote:
> On 28.05.2019 15:20, Michał Górny wrote:
> > Hi,
> > 
> > After implementing most of PT_GETXSTATE/PT_SETXSTATE and getting some
> > comments requiring major changes anyway, I'm starting to wonder whether
> > the approach I've followed is actually the best one.  This is especially
> > important now that I'm pretty sure that we can't rely on fixed offsets
> > in XSAVE area (courtesy of CPUID data from AVX512 CPU).  I'd like to
> > query your opinion on what would be the best API to provide.
> > 
> > I see three main options:
> > 
> > a. 'raw' XSAVE data,
> > 
> > b. 'normalized' XSAVE data,
> > 
> > c. split calls for individual components.
> > 
> > So far I've followed idea a.  I'll describe each one shortly.
> > 
> > 
> > The problem
> > ---
> > I'm trying to expose additional register types supported by newer x86
> > processors to the debugger.  Currently we're limited to SSE registers,
> > however extended XSAVE area provides support for AVX, AVX-512
> > and possible future extensions.
> > 
> > The XSAVE area is dynamically sized, with controllable extensions to
> > include.  We need to perform a bunch of CPUID calls to determine
> > the appropriate buffer size and offsets/sizes of individual data
> > components inside it.  The exact offsets depend on the processor
> > in question, and may be different depending on interim extensions
> > implemented.
> > 
> > 
> > a. 'raw' XSAVE data
> > ---
> > This is the solution used by Linux and FreeBSD.  It adds new PT_*XSTATE
> > requests that expose and operate on raw data used by XSAVE/XRSTOR
> > instructions.
> > 
> > Since it uses the processor's data format, userland can reuse the code
> > for other platforms.  In fact, if new XSAVE components are added,
> > the support in userland can be developed in parallel to kernel
> > development.
> > 
> > On the minus side, it shifts the whole burden of implementation
> > on userland.  This didn't go well for Linux and FreeBSD -- both of those
> > implementations did not account for providing component offsets via API,
> > and so require userland to establish them manually by CPUID.  This isn't
> > going to work well for storing XSAVE data in coredumps.
> > 
> > In my opinion, to implement this correctly we'd need another
> > dynamically-sized PT_* request exposing component offsets (and sizes?)
> > from CPUID.  However, at this point I start doubting whether the added
> > complexity is worth it given no clear advantage of sticking to pure
> > machine format.  Hence, option b.
> > 
> > 
> > b. 'normalized' XSAVE data
> > --
> > This is similar to the previous option, except that rather than exposing
> > the processor's raw XSAVE structure we use a fixed 'struct xstate'. 
> > Kernel is reponsible for establishing correct offsets, and moving data
> > to and from 'struct xstate'.  Userland can trivially access the struct
> > fields, and does not need to be concerned about internals.  Core dumps
> > use fixed offsets.
> > 
> > Similarly to the previous option, I'd keep the dynamic size for
> > the structure -- make it possible to add more fields in the future,
> > and expose them to userland when it provides a large enough buffer.
> > 
> > I don't see any significant disadvantages to option a., and the little
> > performance loss does not seem to be significant here.  We lose
> > the option to add support for new XSAVE types in userland before
> > the kernel but it's probably a moot point anyway.
> > 
> > 
> > c. Split calls for individual components
> > 
> > This is a completely different option which will probably make
> > the implementation a bit simpler at the cost of more repeated code.
> > In this model, we add PT_* requests for each interesting XSAVE
> > component.  At the moment, this would mean 4 new requests:
> > 
> > - PT_*AVXREGS, for the AVX component,
> > 
> > - PT_*AVX512REGS for the 3 AVX-512 components.
> > 
> > Unlike the other options, we don't use extensible buffer but just fixed
> > structs.  We need to add new requests when new components are added. 
> > Userland applications need to issue multiple PT_* calls if they need
> > specific data.  For example, to get zmm0 register value, you'd need to
> > call PT_GETFPREGS (or PT_GETXMMREGS on i386), PT_GETAVXREGS
> > and PT_GETAVX512REGS to get all the split parts of it.
> > 
> > 
> > Side point: non-aligned PT_*FPREGS on i386/amd64
> > 
> > As a side point, the current interfaces for getting FPU regs on i386
> > and amd64 are not aligned.  PT_*FPREGS uses old FSAVE struct on i386,
> > while (newer) FXSAVE struct is used on amd64.  The former lacks SSE
> > registers.
> > 
> > This is somewhat circumvented by adding PT_GETXMMREGS on i386 which uses
> > the FXSAVE format and adds missing registers to i386.  However, our
> > current implementation of compat32 does 

Re: [PATCH 1/2] compat32: translate userland PT_* request values into kernel

2019-05-28 Thread Kamil Rytarowski
On 27.05.2019 21:03, Michał Górny wrote:
> Currently, the compat32 passes PT_* request values to kernel functions
> without translation.  This works fine for low PT_* requests that happen
> to have the same values both on i386 and amd64.  However, for requests
> higher than PT_SETFPREGS, the value passed from userland (matching i386
> const) does not match the correct kernel (amd64) request.  As a result,
> e.g. when compat32 process calls PT_GETDBREGS, kernel actually processes
> it as PT_SETSTEP.
> 
> To resolve this, introduce support for compat32 PT_* request
> translation.  The interface is based on PTRACE_TRANSLATE_REQUEST32 macro
> that is defined to a mapping function on architectures needing it.
> In case of amd64, this function maps userland i386 PT_* values into
> appropriate amd64 PT_* values.
> 
> For the time being, the two additional PT_GETXMMREGS and PT_SETXMMREGS
> requests are unsupported due to lack of matching free amd64 constant.


Both patches look good to me.



signature.asc
Description: OpenPGP digital signature


Re: [RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Kamil Rytarowski
On 28.05.2019 15:20, Michał Górny wrote:
> Hi,
> 
> After implementing most of PT_GETXSTATE/PT_SETXSTATE and getting some
> comments requiring major changes anyway, I'm starting to wonder whether
> the approach I've followed is actually the best one.  This is especially
> important now that I'm pretty sure that we can't rely on fixed offsets
> in XSAVE area (courtesy of CPUID data from AVX512 CPU).  I'd like to
> query your opinion on what would be the best API to provide.
> 
> I see three main options:
> 
> a. 'raw' XSAVE data,
> 
> b. 'normalized' XSAVE data,
> 
> c. split calls for individual components.
> 
> So far I've followed idea a.  I'll describe each one shortly.
> 
> 
> The problem
> ---
> I'm trying to expose additional register types supported by newer x86
> processors to the debugger.  Currently we're limited to SSE registers,
> however extended XSAVE area provides support for AVX, AVX-512
> and possible future extensions.
> 
> The XSAVE area is dynamically sized, with controllable extensions to
> include.  We need to perform a bunch of CPUID calls to determine
> the appropriate buffer size and offsets/sizes of individual data
> components inside it.  The exact offsets depend on the processor
> in question, and may be different depending on interim extensions
> implemented.
> 
> 
> a. 'raw' XSAVE data
> ---
> This is the solution used by Linux and FreeBSD.  It adds new PT_*XSTATE
> requests that expose and operate on raw data used by XSAVE/XRSTOR
> instructions.
> 
> Since it uses the processor's data format, userland can reuse the code
> for other platforms.  In fact, if new XSAVE components are added,
> the support in userland can be developed in parallel to kernel
> development.
> 
> On the minus side, it shifts the whole burden of implementation
> on userland.  This didn't go well for Linux and FreeBSD -- both of those
> implementations did not account for providing component offsets via API,
> and so require userland to establish them manually by CPUID.  This isn't
> going to work well for storing XSAVE data in coredumps.
> 
> In my opinion, to implement this correctly we'd need another
> dynamically-sized PT_* request exposing component offsets (and sizes?)
> from CPUID.  However, at this point I start doubting whether the added
> complexity is worth it given no clear advantage of sticking to pure
> machine format.  Hence, option b.
> 
> 
> b. 'normalized' XSAVE data
> --
> This is similar to the previous option, except that rather than exposing
> the processor's raw XSAVE structure we use a fixed 'struct xstate'. 
> Kernel is reponsible for establishing correct offsets, and moving data
> to and from 'struct xstate'.  Userland can trivially access the struct
> fields, and does not need to be concerned about internals.  Core dumps
> use fixed offsets.
> 
> Similarly to the previous option, I'd keep the dynamic size for
> the structure -- make it possible to add more fields in the future,
> and expose them to userland when it provides a large enough buffer.
> 
> I don't see any significant disadvantages to option a., and the little
> performance loss does not seem to be significant here.  We lose
> the option to add support for new XSAVE types in userland before
> the kernel but it's probably a moot point anyway.
> 
> 
> c. Split calls for individual components
> 
> This is a completely different option which will probably make
> the implementation a bit simpler at the cost of more repeated code.
> In this model, we add PT_* requests for each interesting XSAVE
> component.  At the moment, this would mean 4 new requests:
> 
> - PT_*AVXREGS, for the AVX component,
> 
> - PT_*AVX512REGS for the 3 AVX-512 components.
> 
> Unlike the other options, we don't use extensible buffer but just fixed
> structs.  We need to add new requests when new components are added. 
> Userland applications need to issue multiple PT_* calls if they need
> specific data.  For example, to get zmm0 register value, you'd need to
> call PT_GETFPREGS (or PT_GETXMMREGS on i386), PT_GETAVXREGS
> and PT_GETAVX512REGS to get all the split parts of it.
> 
> 
> Side point: non-aligned PT_*FPREGS on i386/amd64
> 
> As a side point, the current interfaces for getting FPU regs on i386
> and amd64 are not aligned.  PT_*FPREGS uses old FSAVE struct on i386,
> while (newer) FXSAVE struct is used on amd64.  The former lacks SSE
> registers.
> 
> This is somewhat circumvented by adding PT_GETXMMREGS on i386 which uses
> the FXSAVE format and adds missing registers to i386.  However, our
> current implementation of compat32 does not allow it to work without
> either hacks or major design changes.
> 
> Both options a. and b. use struct compatible with FXSAVE and including
> both x87 FPU and SSE registers.  Backwards compatibility with FSAVE
> and FXSAVE is easy to do with the existing kernel code.  

[RFC] Design considerations for XSAVE Extended Area getters/setters

2019-05-28 Thread Michał Górny
Hi,

After implementing most of PT_GETXSTATE/PT_SETXSTATE and getting some
comments requiring major changes anyway, I'm starting to wonder whether
the approach I've followed is actually the best one.  This is especially
important now that I'm pretty sure that we can't rely on fixed offsets
in XSAVE area (courtesy of CPUID data from AVX512 CPU).  I'd like to
query your opinion on what would be the best API to provide.

I see three main options:

a. 'raw' XSAVE data,

b. 'normalized' XSAVE data,

c. split calls for individual components.

So far I've followed idea a.  I'll describe each one shortly.


The problem
---
I'm trying to expose additional register types supported by newer x86
processors to the debugger.  Currently we're limited to SSE registers,
however extended XSAVE area provides support for AVX, AVX-512
and possible future extensions.

The XSAVE area is dynamically sized, with controllable extensions to
include.  We need to perform a bunch of CPUID calls to determine
the appropriate buffer size and offsets/sizes of individual data
components inside it.  The exact offsets depend on the processor
in question, and may be different depending on interim extensions
implemented.


a. 'raw' XSAVE data
---
This is the solution used by Linux and FreeBSD.  It adds new PT_*XSTATE
requests that expose and operate on raw data used by XSAVE/XRSTOR
instructions.

Since it uses the processor's data format, userland can reuse the code
for other platforms.  In fact, if new XSAVE components are added,
the support in userland can be developed in parallel to kernel
development.

On the minus side, it shifts the whole burden of implementation
on userland.  This didn't go well for Linux and FreeBSD -- both of those
implementations did not account for providing component offsets via API,
and so require userland to establish them manually by CPUID.  This isn't
going to work well for storing XSAVE data in coredumps.

In my opinion, to implement this correctly we'd need another
dynamically-sized PT_* request exposing component offsets (and sizes?)
from CPUID.  However, at this point I start doubting whether the added
complexity is worth it given no clear advantage of sticking to pure
machine format.  Hence, option b.


b. 'normalized' XSAVE data
--
This is similar to the previous option, except that rather than exposing
the processor's raw XSAVE structure we use a fixed 'struct xstate'. 
Kernel is reponsible for establishing correct offsets, and moving data
to and from 'struct xstate'.  Userland can trivially access the struct
fields, and does not need to be concerned about internals.  Core dumps
use fixed offsets.

Similarly to the previous option, I'd keep the dynamic size for
the structure -- make it possible to add more fields in the future,
and expose them to userland when it provides a large enough buffer.

I don't see any significant disadvantages to option a., and the little
performance loss does not seem to be significant here.  We lose
the option to add support for new XSAVE types in userland before
the kernel but it's probably a moot point anyway.


c. Split calls for individual components

This is a completely different option which will probably make
the implementation a bit simpler at the cost of more repeated code.
In this model, we add PT_* requests for each interesting XSAVE
component.  At the moment, this would mean 4 new requests:

- PT_*AVXREGS, for the AVX component,

- PT_*AVX512REGS for the 3 AVX-512 components.

Unlike the other options, we don't use extensible buffer but just fixed
structs.  We need to add new requests when new components are added. 
Userland applications need to issue multiple PT_* calls if they need
specific data.  For example, to get zmm0 register value, you'd need to
call PT_GETFPREGS (or PT_GETXMMREGS on i386), PT_GETAVXREGS
and PT_GETAVX512REGS to get all the split parts of it.


Side point: non-aligned PT_*FPREGS on i386/amd64

As a side point, the current interfaces for getting FPU regs on i386
and amd64 are not aligned.  PT_*FPREGS uses old FSAVE struct on i386,
while (newer) FXSAVE struct is used on amd64.  The former lacks SSE
registers.

This is somewhat circumvented by adding PT_GETXMMREGS on i386 which uses
the FXSAVE format and adds missing registers to i386.  However, our
current implementation of compat32 does not allow it to work without
either hacks or major design changes.

Both options a. and b. use struct compatible with FXSAVE and including
both x87 FPU and SSE registers.  Backwards compatibility with FSAVE
and FXSAVE is easy to do with the existing kernel code.  Therefore,
PT_*XSTATE can easily replace PT_*FPREGS/PT_*XMMREGS, providing
an aligned i386/amd64 API for getting FPU registers.


Summary
---
In my opinion, the best solution here is b.  It's extensible to future
register types, easy to use from userland 

Re: ehci: fix error handling?

2019-05-28 Thread Julian Coleman
Hi all,

> attach always 'succeeds' in the sense that after attach has been called, 
> detach will always be called. The detach routine should tear down 
> everything that needs tearing down and not do things that will fail. 
> Perhaps the init could simply be done before the attach routine gets the 
> chance to fail?

Some drivers mark the progress through attach so that detach will work
correctly in all cases.  Can we do the same here?  For example, see
sc->sc_att_stage in:

  https://nxr.netbsd.org/xref/src/sys/dev/ic/gem.c

and also how gem_partial_detach() is called when the attach fails part way
through.

Regards,

Julian