Re: TSC improvement

2020-06-10 Thread Taylor R Campbell
> Date: Wed, 10 Jun 2020 23:11:32 +
> From: Andrew Doran 
> 
> On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:
> 
> > At the same time, I wonder whether we should _also_:
> > 
> > 1. Modify the tsc timecounter so that it uses a global atomic to
> >ensure that there is a global view of time as counted by the tsc.
> 
> A long time ago we used to do something quite like this for x86 but it
> worked poorly which is somewhat understandable because it's really difficult
> to get something like that right.  (Which reminds me someone posted a patch
> for kern_cctr.c for alpha a couple of years ago.)

It's not hard to get global monotonicity right.  We do that for Xen
these days, in arch/xen/xen/xen_clock.c.  The Alpha patch was probably
this thread:

https://mail-index.netbsd.org/port-alpha/2017/11/01/msg000873.html

What's trickier is synchronizing per-CPU timecounters so that they all
give a reasonably consistent view of absolute wall clock time -- and
so it's not just one CPU that leads while the others play catchup
every time they try to read the clock.  (In other words, adding atomic
catchup logic certainly does not obviate the need to synchronize
per-CPU timecounters!)

But neither synchronization nor global monotonicity is always
necessary -- e.g., for rusage we really only need a local view of time
since we're only measuring relative time durations spent on the
current CPU anyway.

> >This is what the timecounter(9) API per se expects of timecounters,
> >and right now tsc (along with various other per-CPU cycle counters)
> >fails to guarantee that.
> 
> Howso, do you see a bug?  I think it's okay.  The TSC is only used for the
> timecounter where it's known that it's insensitive to core speed variations
> and is driven by PLL related to the bus clock.  Fortunately that means most
> x86 systems, excepting a window of some years from roughly around the time
> of the Pentium 4 onwards.

If tc_get_timecount goes backward by a little, e.g. because you
queried it on cpu0 the first time and on cpu1 the second time,
kern_tc.c will interpret that to mean that it has instead jumped
forward by a lot -- nothing in the timecounter abstraction copes with
a timecounter that goes backwards at all.

(There's also an issue where the `monotonic' clock goes backwards
sometimes, as reported by sched_pstats.  I'm not sure anyone has
tracked down where that's coming from -- it seems unlikely to be
related to cross-CPU tsc synchronization because lwp rtime should
generally be computed from differences between samples on a single CPU
at a time, but I don't know.)



Re: TSC improvement

2020-06-10 Thread Masanobu SAITOH
Hi.

On 2020/06/10 4:37, Joerg Sonnenberger wrote:
> On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:
>> It's great to see improvements to our calibration of the TSC (and I
>> tend to agree that cpu_counter should be serializing, so that, e.g.,
>> cpu_counter(); ...; cpu_counter() reliably measures time taken in the
>> ellipsis).
> 
> I'm pretty sure we want to have both variants.
> 
> Joerg

You mean we should have both cpu_counter() and cpu_counter_serializing()
(or cpu_counter() and cpu_counter_unserializing())? If so, why?

I have no strong opinion about this. When I wrote that change, I added
cpu_counter_serializing(). We already have tsc_get_timecount(), cpu_counter()
and cpu_counter32() as MI API. In addition, x86 also has rdtsc(and will
have some serializing functions). There are so many similar functions,
so I thought it would be good to avoid code duplication and to be simple.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: TSC improvement

2020-06-10 Thread Andrew Doran
On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:

> It's great to see improvements to our calibration of the TSC (and I
> tend to agree that cpu_counter should be serializing, so that, e.g.,
> cpu_counter(); ...; cpu_counter() reliably measures time taken in the
> ellipsis).
> 
> At the same time, I wonder whether we should _also_:
> 
> 1. Modify the tsc timecounter so that it uses a global atomic to
>ensure that there is a global view of time as counted by the tsc.

A long time ago we used to do something quite like this for x86 but it
worked poorly which is somewhat understandable because it's really difficult
to get something like that right.  (Which reminds me someone posted a patch
for kern_cctr.c for alpha a couple of years ago.)

>This is what the timecounter(9) API per se expects of timecounters,
>and right now tsc (along with various other per-CPU cycle counters)
>fails to guarantee that.

Howso, do you see a bug?  I think it's okay.  The TSC is only used for the
timecounter where it's known that it's insensitive to core speed variations
and is driven by PLL related to the bus clock.  Fortunately that means most
x86 systems, excepting a window of some years from roughly around the time
of the Pentium 4 onwards.

Cheers,
Andrew

> 2. Introduce an API for a local timecounter -- a per-CPU timecounter
>that never goes backwards on a single CPU, but:
>(a) measures units of wall clock time, unlike cpu_counter();
>(b) need not be synchronized between CPUs; and
>(c) may be cheaper to read than a global timecounter.
>We could then use that for, e.g., rusage calculations.


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Radoslaw Kujawa

On 6/10/20 4:30 PM, Radoslaw Kujawa wrote:


It is more likely that something mis-compiles now (I see a ton of 
warnings trying to build it now), or that there is a bug in 
implementation of test device emulation, that somehow went unnoticed 
back in 2012.


It seems that in 2018 cobalt port was switched to common MIPS bus space 
implementation. As a part of this implementation, it started to be more 
strict about verification of addresses used by device during bus_space_map.


I earlier wrote that gxemul does not emulate Cobalt firmware and that is 
true, however it seems that it does place PCI device at certain 
addresses (0x1011) upon boot.


NetBSD attempts to reconfigure the device to use address 0x1200. As 
a part of switch to common MIPS bus space, for cobalt, window 
0x1200-0x1400 was dedicated to PCI memory BARs.


For reasons still not researched thoroughly, this reconfiguration fails, 
and the device stays at original address.


So when the pci_mapreg_map is called, a MIPS MI bus_space_map is 
executed. As a part of this, translation takes place, however sanity 
check in __BS(translate) throws EINVAL, due to device being outside of 
window range.


I've worked the problem around by placing the device in gxemul at an 
exact address expected by NetBSD:


[   1.000] faa0 at pci0 dev 12 function 0: vendor fabc product 0001 
(rev. 0x01)

[   1.000] faa0: registers at 0x1200
[   1.000] faa0: just checking: 1 + 2 = 3

Please try the attached gxemul patch.

Best regards,
Radoslaw
diff --git a/gxemul-current/src/devices/bus_pci.cc b/gxemul-current/src/devices/bus_pci.cc
index ce64aa0..8e7827c 100644
--- a/gxemul-current/src/devices/bus_pci.cc
+++ b/gxemul-current/src/devices/bus_pci.cc
@@ -1436,6 +1436,7 @@ PCIINIT(ati_radeon_9200_2)
 PCIINIT(faa)
 {
uint64_t port, memaddr;
+   memaddr = 0x1200;
char tmpstr[200];
 
PCI_SET_DATA(PCI_ID_REG, PCI_ID_CODE(PCI_VENDOR_FAKECARDS,
@@ -1444,7 +1445,9 @@ PCIINIT(faa)
PCI_SET_DATA(PCI_CLASS_REG, PCI_CLASS_CODE(PCI_CLASS_PROCESSOR,
PCI_SUBCLASS_PROCESSOR_COPROC, 0) + 0x01);
 
-   allocate_device_space(pd, 0x0, 0x100, , );
+   PCI_SET_DATA(0x10, memaddr);
+
+// allocate_device_space(pd, 0x0, 0x100, , );
 
snprintf(tmpstr, sizeof(tmpstr), "faa addr=0x%llx "
"pci_little_endian=1", (long long)memaddr);



Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Radoslaw Kujawa




On 6/10/20 4:19 PM, Martin Husemann wrote:

On Wed, Jun 10, 2020 at 04:12:03PM +0200, Radoslaw Kujawa wrote:

I suspect there are two options: either something has bit-rotted in
Cobalt-specific PCI code, or the ancient gxemul from 2012 does not work
correctly anymore.


FWIW, Anders Gavare is quite responsive to email, if there is a hint this
would be a bug in gxemul (but it does not really sound that way right now).

The gxemul for tutorial purposes (with the test device added), was 
forked in 2012.


It is more likely that something mis-compiles now (I see a ton of 
warnings trying to build it now), or that there is a bug in 
implementation of test device emulation, that somehow went unnoticed 
back in 2012.


Best regards,
Radoslaw


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Martin Husemann
On Wed, Jun 10, 2020 at 04:12:03PM +0200, Radoslaw Kujawa wrote:
> I suspect there are two options: either something has bit-rotted in
> Cobalt-specific PCI code, or the ancient gxemul from 2012 does not work
> correctly anymore.

FWIW, Anders Gavare is quite responsive to email, if there is a hint this
would be a bug in gxemul (but it does not really sound that way right now).

Martin


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Radoslaw Kujawa




On 6/10/20 12:58 PM, Rocky Hotas wrote:

On giu 10 12:54, Rocky Hotas wrote:


the address 0x1011 belongs to a `user process virtual space, mapped
and cached', both when the processor uses 32 and 64 bit addresses.




The address reported by pcictl is a physical address. It looks correct 
in my opinion.


NetBSD is mapping PCI devices on cobalt starting at address 0x1000. 
This is defined in src/sys/arch/cobalt/include/cpu.h.


gxemul does not really emulate cobalt firmware, so the whole setup of 
PCI bus is done via PCI_NETBSD_CONFIGURE mechanism. Considering that 
device was mapped to a sane looking address, I can guess this part has 
worked successfully.


If you want to double check this, I suggest editing 
src/sys/dev/pci/pciconf.c and setting pci_conf_debug to 1. This should 
give you detailed output of configuration process.


The real question boils down to why pci_mapreg_map() fails and I am 
afraid, this requires inserting a few printfs into this function, as 
suggested by other people on the mailing list.


I suspect there are two options: either something has bit-rotted in 
Cobalt-specific PCI code, or the ancient gxemul from 2012 does not work 
correctly anymore.


Nevertheless, this requires further debugging.

Best regards,
Radoslaw


Re: pg_jobc going negative?

2020-06-10 Thread Robert Elz
Date:Tue, 9 Jun 2020 14:16:16 -0400
From:Christos Zoulas 
Message-ID:  

  | The FreeBSD refactoring LGTM. It also simplifies the code.

Sorry, been off net all day ... that may very well be the way to go,
but I'd like to understand what is happening with our current code
first, so we will know that the changes are fixing a problem, and not
just moving things around.

kre



Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Mouse
>> Guessing should not be relevant; how BARs work is part of the PCI
>> definition.
> What I was meaning is that it seems counterintuitive that a single 32
> bit register can at the same time specify a 32 bit physical memory
> address and a memory region extent.

Yes.  It is clever - but I wish the PCI designers had chosen a bit less
cleverness.

> In the example of the output of `pcictl' for my test device, this is
> exactly the case: only 1 non-zero BAR,

> Base address register at 0x10
>   type: 32-bit nonprefetchable memory
>   base: 0x1011

> I am stuck in trying to interpret this value, and to use it both for
> an address and a range.

You can't.  If you have read-only access to the BAR - which is what
pcictl gives you - then all you can get is the base address.  You need
read/write access to get the range as well.  (In case you haven't yet
seen the trick: you read the value and save it, then you write all-1s
and read it back, to see which bits are read-only, then you write back
the value you saved.  Yes, this implies that the size of the range is
always a power of two.  As I said, clever, but I wish they'd kept their
cleverness a bit more in check.)

> Yes, this may change according to the architecture.  Here it's not
> x86.  If you remember some source, or documentation, I would check it
> out.

I don't.  I have never worked with a cobalt and I don't know whether
there is a de-facto standard for how MIPS interfaces to PCI.

> This makes me wonder if the tutorial is correct and why.

> The main fact here (at least, for a beginner) is that it's extremely
> difficult to make multiple attempts, check and verify, as you would
> with a normal C program (print something, do anything useful to let
> you know how it works).

Can't you print things?  I've done that often enough with kernel code.
If -current has broken "printf from the driver" debugging, I'd call
that a crippling regression.

> Here, for example, I wonder what (in detail) makes pci_mapreg_map
> fail, what kind of space actually needs the PCI test device: in other
> words, some more information about this mapping.

You could always do something like

(in your driver's .c file)

extern int mapreg_debug;

..._attach(...)
{
...
mapreg_debug = 1;
... pci_mapreg_map(...) ...
mapreg_debug = 0;
...
}

(in the implementation of pci_mapreg_map)

int mapreg_debug = 0;
...
if (mapreg_debug) printf("...", ...);

It's a horrible thing to do for "permanent" (production) code, but I
see nothing at all wrong with it for experimental debugging.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Rocky Hotas
On giu 10 12:54, Rocky Hotas wrote:

> the address 0x1011 belongs to a `user process virtual space, mapped
> and cached', both when the processor uses 32 and 64 bit addresses.

No, sorry, that was about Virtual addresses, not physical ones. Here,

 


it is only stated that ``the relationship between the PCI bus address
space and the system memory physical address space differs from one
system type to another''.
So, basically it gives no information.

Rocky


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Rocky Hotas
On giu 10 11:50, Martin Husemann wrote:

> so it is PCI_MAPREG_TYPE_MEM ("memory" in the type line), and the mapping
> should just work.

Ok! And I guess that invoking pci_mapreg_type() would give the same
result.

> Does the map address (0x1011) make sense on that machine?

I really don't know. The manual of MIPS R4000 is available at

 

but not the one of R5000, the processory family which should be used in the
emulated Cobalt machine:

 

Here,

 


(``an overview of the management of physical and virtual memory in the
MIPS R4x00, R5000, R8000, and R1 processors'')

the address 0x1011 belongs to a `user process virtual space, mapped
and cached', both when the processor uses 32 and 64 bit addresses. I
think however that the emulated Cobalt in gxemul uses 32 bit addresses.

Rocky


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Martin Husemann
On Wed, Jun 10, 2020 at 11:41:51AM +0200, Rocky Hotas wrote:
> Base address register at 0x10
>   type: 32-bit nonprefetchable memory
>   base: 0x1011

so it is PCI_MAPREG_TYPE_MEM ("memory" in the type line), and the mapping
should just work.

Does the map address (0x1011) make sense on that machine?

Martin


Re: pci_mapreg_map and BAR from `Bus space tutorial'

2020-06-10 Thread Rocky Hotas
Hi!

On giu 08 15:31, Mouse wrote:
> Guessing should not be relevant; how BARs work is part of the PCI
> definition.

Yes, of course. What I was meaning is that it seems counterintuitive
that a single 32 bit register can at the same time specify a 32 bit
physical memory address and a memory region extent.
In the example of the output of `pcictl' for my test device, this is
exactly the case: only 1 non-zero BAR,

Base address register at 0x10
  type: 32-bit nonprefetchable memory
  base: 0x1011

I am stuck in trying to interpret this value, and to use it both for an
address and a range.

> I forget details, but I think things like "I/O mapping
> versus memory mapping" are present in read-only bits at the LSB end of
> the BAR.  NetBSD has a bunch of #defines for them; look for PCI_MAPREG_
> in pcireg.h - or, at least, that's where they are on 1.4T, 5.2, and
> 8.0; while I don't have anything more recent at ready hand to check,
> I'd guess that something that stable is probably going to stay stable.

It is still as you said, IIUC: starting from line 453 of the current
revision 1.138.2.2,

#define PCI_MAPREG_TYPE_MEM 0x
#define PCI_MAPREG_TYPE_ROM 0x
#define PCI_MAPREG_TYPE_IO  0x0001
#define PCI_MAPREG_ROM_ENABLE   0x0001

> >> Give this is mips I assume PCI_MAPREG_TYPE_MEM is correct, but for a
> >> new device you may be unsure, and sometimes variants of devices
> >> exist that either make the type IO or MEM.  You can query that in
> >> your driver with pci_mapreg_info() - but for the concrete case the
> >> userland pcictl output is easier.
> 
> In this particular case, sure; I suspect the person who wrote the
> double-quoted text above was trying to give advice applicable beyond
> the case at immediate hand.

Got it!

> PCI is defined to have two address spaces, called I/O space and memory
> space.  On x86, these normally correspond to I/O instructions
> (inb/outb/etc) and memory-mapped I/O, respectively.  On other machines,
> this often works differently; some PCI attachments present PCI I/O
> space as memory-mapped too, just in a different part of the address
> space.

Yes, this may change according to the architecture. Here it's not x86.
If you remember some source, or documentation, I would check it out.

> The line you cite
> 
> > Command register: 0x0003
> >   I/O space accesses: on
> 
> is a PCI thing, indicating that the device is configured to respond to
> a PCI access marked as an I/O space access, provided it's mapped by a
> suitable mapping register.

This makes me wonder if the tutorial is correct and why.

The main fact here (at least, for a beginner) is that it's extremely
difficult to make multiple attempts, check and verify, as you would with
a normal C program (print something, do anything useful to let you know
how it works). Again, if you (or someone else) have any advice on how to
do this with kernel dmesg and an example PCI device, it would be very
useful.

Here, for example, I wonder what (in detail) makes pci_mapreg_map fail,
what kind of space actually needs the PCI test device: in other words,
some more information about this mapping.

> For a machine that already has a NetBSD port supporting PCI, those
> details should already be handled for you by the bus_space
> implementation for that port.  Your driver just needs to know "is this
> a (PCI) I/O space mapping or a (PCI) memory space mapping?", and
> sometimes not even that; what it turns into at the machine-code level
> is hidden by the bus_space layer.  See pci_mapreg_type(),
> pci_mapreg_map(), etc.

Maybe I can check the result of pci_mapreg_type(), to obtain more
information.

Anyone who can provide some other basic hints is very welcomed. In the
meanwhile, thanks a lot!

Rocky


Re: makesyscalls

2020-06-10 Thread David Holland
On Wed, Jun 10, 2020 at 01:25:03AM -0400, Thor Lancelot Simon wrote:
 > Could you translate your prototype into a
 > different language, one that's less basically hostile to our build system
 > and its goals and benefits?

Like which one? You removed the part of the post that explained that
there aren't other reasonable choices.

Yes, it can be rewritten in C as a subsequent step. *After* quite a
bit of tidying. And no, I'm not doing that now. Among other problems,
compiling it requires bikeshedding where to put it in the tree. Feel
free to sort that out.

As for lua: it has the same headline issues as awk, namely it doesn't
enforce function arguments and doesn't require that variables are
declared before use. Again, I see no reason to think it'll be any more
maintainable.

-- 
David A. Holland
dholl...@netbsd.org