Re: Re-arrange contents of struct cpuinfo - kern/52919

2018-07-15 Thread Maxime Villard

Le 16/07/2018 à 00:27, Paul Goyette a écrit :

Since maxv@ has already done some rearranging, but so far has not bumped the
system version, I would like to do some more re-arrangement.  This will
group all the XEN stuff together, as well as move all of the conditional
parts of sstruct cpuinfo to the end, following all of the non-conditional
parts.


Yes, please proceed.


Re-arrange contents of struct cpuinfo - kern/52919

2018-07-15 Thread Paul Goyette
Since maxv@ has already done some rearranging, but so far has not bumped 
the system version, I would like to do some more re-arrangement.  This 
will group all the XEN stuff together, as well as move all of the 
conditional parts of sstruct cpuinfo to the end, following all of the 
non-conditional parts.


Please review the attached patch and let me know if there any serious 
objections.  I'd like to commit within the next day or two, along with a 
kernel rev bump...



+--+--++
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:  |
| (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+--+--++Index: cpu.h
===
RCS file: /cvsroot/src/sys/arch/x86/include/cpu.h,v
retrieving revision 1.95
diff -u -p -r1.95 cpu.h
--- cpu.h   15 Jul 2018 08:47:43 -  1.95
+++ cpu.h   15 Jul 2018 22:23:37 -
@@ -127,9 +127,6 @@ struct cpu_info {
uint64_t ci_scratch;
uintptr_t ci_pmap_data[128 / sizeof(uintptr_t)];
 
-#ifdef XEN
-   u_long ci_evtmask[NR_EVENT_CHANNELS]; /* events allowed on this CPU */
-#endif
struct intrsource *ci_isources[MAX_INTR_SOURCES];
 
volatile intci_mtx_count;   /* Negative count of spin mutexes */
@@ -174,6 +171,44 @@ struct cpu_info {
u_int ci_cflush_lsize;  /* CLFLUSH insn line size */
struct x86_cache_info ci_cinfo[CAI_COUNT];
 
+   device_tci_frequency;   /* Frequency scaling technology */
+   device_tci_padlock; /* VIA PadLock private storage */
+   device_tci_temperature; /* Intel coretemp(4) or equivalent */
+   device_tci_vm;  /* Virtual machine guest driver */
+
+   /*
+* Segmentation-related data.
+*/
+   union descriptor *ci_gdt;
+   struct cpu_tss  *ci_tss;/* Per-cpu TSSes; shared among LWPs */
+   int ci_tss_sel; /* TSS selector of this cpu */
+
+   /*
+* The following two are actually region_descriptors,
+* but that would pollute the namespace.
+*/
+   uintptr_t   ci_suspend_gdt;
+   uint16_tci_suspend_gdt_padding;
+   uintptr_t   ci_suspend_idt;
+   uint16_tci_suspend_idt_padding;
+
+   uint16_tci_suspend_tr;
+   uint16_tci_suspend_ldt;
+   uintptr_t   ci_suspend_fs;
+   uintptr_t   ci_suspend_gs;
+   uintptr_t   ci_suspend_kgs;
+   uintptr_t   ci_suspend_efer;
+   uintptr_t   ci_suspend_reg[12];
+   uintptr_t   ci_suspend_cr0;
+   uintptr_t   ci_suspend_cr2;
+   uintptr_t   ci_suspend_cr3;
+   uintptr_t   ci_suspend_cr4;
+   uintptr_t   ci_suspend_cr8;
+
+   /* The following must be in a single cache line. */
+   int ci_want_resched __aligned(64);
+   int ci_padout __aligned(64);
+
 #ifndef __HAVE_DIRECT_MAP
 #define VPAGE_SRC 0
 #define VPAGE_DST 1
@@ -201,42 +236,24 @@ struct cpu_info {
vaddr_t ci_svs_utls;
 #endif
 
-#if defined(XEN) && (defined(PAE) || defined(__x86_64__))
+#if defined(XEN)
+#if defined(PAE) || defined(__x86_64__)
/* Currently active user PGD (can't use rcr3() with Xen) */
pd_entry_t *ci_kpm_pdir;/* per-cpu PMD (va) */
paddr_t ci_kpm_pdirpa;  /* per-cpu PMD (pa) */
kmutex_tci_kpm_mtx;
+#endif /* defined(PAE) || defined(__x86_64__) */
+
 #if defined(__x86_64__)
/* per-cpu version of normal_pdes */
pd_entry_t *ci_normal_pdes[3]; /* Ok to hardcode. only for x86_64 
&& XEN */
paddr_t ci_xen_current_user_pgd;
-#endif /* __x86_64__ */
-#endif /* XEN et.al */
-
-#ifdef XEN
-   size_t  ci_xpq_idx;
-#endif
+#endif /* defined(__x86_64__) */
 
-#ifndef XEN
-   struct evcnt ci_ipi_events[X86_NIPI];
-#else   /* XEN */
+   u_long ci_evtmask[NR_EVENT_CHANNELS]; /* events allowed on this CPU */
struct evcnt ci_ipi_events[XEN_NIPIS];
evtchn_port_t ci_ipi_evtchn;
-#endif  /* XEN */
-
-   device_tci_frequency;   /* Frequency scaling technology */
-   device_tci_padlock; /* VIA PadLock private storage */
-   device_tci_temperature; /* Intel coretemp(4) or equivalent */
-   device_tci_vm;  /* Virtual machine guest driver */
-
-   /*
-* Segmentation-related data.
-*/
-   union descriptor *ci_gdt;
-   struct cpu_tss  *ci_tss;/* Per-cpu TSSes; shared among LWPs */
-   int ci_tss_sel; /* TSS selector of this cpu */
-
-#ifdef XEN
+   size_t  ci_xpq_idx;
/* Xen raw system time at which we last ran hardclock.  

Re: aarch64 gcc kernel compilation

2018-07-15 Thread Kamil Rytarowski
On 15.07.2018 20:08, Christos Zoulas wrote:
> Hi,
> 
> Gcc is now working on aarch64 but the kernel does not compile because of
> some idiomatic clang code that is not supported by gcc (at least gcc-6)
> 
> To define constants, it uses:
> 
> static const uintmax_t
> FOO = __BIT(9),
> BAR = FOO;
> 
> While this is nice, specially for the debugger, it produces an error
> in gcc. While fixing these is easy, gcc also complains about using the
> constants as switch labels. Thus it is better to just nukem all and
> rewrite them as:
> 
> #define FOO __BIT(9)
> #define BAR FOO
> 
> Should I go ahead and do it, or there is a smarter solution?
> 
> christos
> 

I used to have problems to build rumpkernel aarch64 on Linux with GCC
(some years ago) due to usage __uint128_t in reg.h.

Can we drop it? The __uint128_t type is not used anywhere else in
aarch64 subdirs.

It's used in assembly in FPREG_Q0-FPREQ_Q31 in cpuswitch.S. The same
optimization can be done without the usage of __uint128_t, probably just
need for proper alignment of fp_reg (15).

There is also some mysterious fallout that General Purpose Registers in
core files are shipped with 128bit containers. It's not compatible with
LLDB and requires needless generic work for no purpose.

I can try to prepare a patch blindly and share with aarch64 owners.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-07-15 Thread Jared McNeill

On Sun, 15 Jul 2018, Maxime Villard wrote:


Now I want to move:

arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c

into

dev/tprof/tprof_intel.c
dev/tprof/tprof_amd.c

I guess people are fine? I think it is better to gather all the pieces in
one dir.


I don't really have an opinion here, but I've just committed a new 
backend as dev/tprof/tprof_armv8.c. So I guess that's a vote for the 
latter :)


Cheers,
Jared


aarch64 gcc kernel compilation

2018-07-15 Thread Christos Zoulas


Hi,

Gcc is now working on aarch64 but the kernel does not compile because of
some idiomatic clang code that is not supported by gcc (at least gcc-6)

To define constants, it uses:

static const uintmax_t
FOO = __BIT(9),
BAR = FOO;

While this is nice, specially for the debugger, it produces an error
in gcc. While fixing these is easy, gcc also complains about using the
constants as switch labels. Thus it is better to just nukem all and
rewrite them as:

#define FOO __BIT(9)
#define BAR FOO

Should I go ahead and do it, or there is a smarter solution?

christos


Re: Too many PMC implementations

2018-07-15 Thread Maxime Villard

Le 11/07/2018 à 18:22, Maxime Villard a écrit :

Right now we have three (or more?) different implementations for Performance
Monitoring Counters:

  * PMC: this one is MI. It is used only on one ARM model (xscale I think).
There used to be an x86 code for it, but it was broken, and I removed it.
The implementation comes with libpmc, a library we provide. The code
hasn't moved these last 15 years. I don't like this implementation, it is
really invasive (see the numerous pmc.h files that are all empty).

  * X86PMC: this one is MD, and only available for x86. I wrote it myself.
The code is small (x86/pmc.c), and functional. The PMCs are system-wide,
and retrieved on a per-cpu basis. But this implementation does not
support tracking, that is, we get numbers (about the cache misses for
example), but we don't know where they happened.

  * TPROF: this one is MI, but only x86 support is present. TPROF provides
the backend needed to support tracking: via a device, that userland can
read from, in order to absorb the event samples produced by the kernel.
The backend is pretty good, but the frontend (where the user chooses
which PMC etc) is inexistent - the CPU/event detection is not there
either. The backend is MI (/dev/tprof/tprof.c), and can be used on other
architectures. The module already exists to dynamically modload.

I think it would be good to:

  * Remove PMC entirely. Then remove libpmc too.

  * Merge X86PMC into the x86 part of TPROF. That is to say, into
x86/tprof_*. Then remove X86PMC.

  * Later, maybe, someone will want to add other architectures in TPROF, like
all the recent ARMs.

Maxime


Now I want to move:

arch/x86/x86/tprof_pmi.c
arch/x86/x86/tprof_amdpmi.c

into

dev/tprof/tprof_intel.c
dev/tprof/tprof_amd.c

I guess people are fine? I think it is better to gather all the pieces in
one dir.