Re: JH7110 PCIe device tree binding update

2023-08-29 Thread Kevin Lo
On Tue, Aug 29, 2023 at 09:15:41PM +0200, Mark Kettenis wrote:
> 
> > Date: Tue, 29 Aug 2023 11:58:23 +0200
> > From: Mark Kettenis 
> > 
> > Upstreaming of the JH7110 PCIe device tree bindings isn't finished
> > yet, but it seems some progress has been made and things have been
> > reviewed by some of the key people involved:
> > 
> >   https://patchwork.kernel.org/project/linux-pci/list/?series=779297
> > 
> > Here is a diff that adjusts the driver to the current state of things
> > such that we can use the latest device tree from:
> > 
> >   https://github.com/starfive-tech/linux/tree/JH7110_VisionFive2_upstream
> > 
> > to continue development.  The idea is to support the preliminary
> > bindings a little bit longer such that folks can update their device
> > trees.  Will probably drop support for the preliminary bindings in a
> > few weeks.
> > 
> > ok?
> 
> patrick@ pointed out that the dv_unit check won't work properly if the
> first PCIe controller is disabled.  So here is a diff that checks the
> device address instead like we do for dwqe(4).
> 
> ok?

ok kevlo@

Tested on my VisionFive 2 v1.3b with the device tree from:

https://raw.githubusercontent.com/starfive-tech/linux/JH7110_VisionFive2_upstream/arch/riscv/boot/dts/starfive/jh7110-starfive-visionfive-2-v1.3b.dts

It works fine, the NVMe is detected.

BTW, I noticed that the memory statistics seem to be incorrect.
The VisionFive 2 is equipped with 8GB RAM.

OpenBSD 7.3-current (GENERIC.MP) #0: Wed Aug 30 11:52:03 CST 2023
kevlo@vf2:/usr/src/sys/arch/riscv64/compile/GENERIC.MP
real mem  = 4294967296 (4096MB)
^^^
avail mem = 8110370816 (7734MB)
^^^
SBI: OpenSBI v1.2, SBI Specification Version 1.0
random: good seed from bootblocks
mainbus0 at root: StarFive VisionFive 2 v1.3B
cpu0 at mainbus0: SiFive U7 imp 4210427 rv64imafdc_zba_zbb
intc0 at cpu0
cpu0: 32KB 64b/line 64-way L1 I-cache, 32KB 64b/line 64-way L1 D-cache
cpu0: 2048KB 64b/line 2048-way L2 cache
cpu1 at mainbus0: SiFive U7 imp 4210427 rv64imafdc_zba_zbb
cpu1: 32KB 64b/line 64-way L1 I-cache, 32KB 64b/line 64-way L1 D-cache
cpu1: 2048KB 64b/line 2048-way L2 cache
cpu2 at mainbus0: SiFive U7 imp 4210427 rv64imafdc_zba_zbb
cpu2: 32KB 64b/line 64-way L1 I-cache, 32KB 64b/line 64-way L1 D-cache
cpu2: 2048KB 64b/line 2048-way L2 cache
cpu3 at mainbus0: SiFive U7 imp 4210427 rv64imafdc_zba_zbb
cpu3: 32KB 64b/line 64-way L1 I-cache, 32KB 64b/line 64-way L1 D-cache
cpu3: 2048KB 64b/line 2048-way L2 cache
"opp-table-0" at mainbus0 not configured
"display-subsystem" at mainbus0 not configured
"stmmac-axi-config" at mainbus0 not configured
"dvp-clock" at mainbus0 not configured
"gmac0-rgmii-rxin-clock" at mainbus0 not configured
"gmac0-rmii-refin-clock" at mainbus0 not configured
"gmac1-rgmii-rxin-clock" at mainbus0 not configured
"gmac1-rmii-refin-clock" at mainbus0 not configured
"hdmitx0-pixel-clock" at mainbus0 not configured
"i2srx-bclk-ext-clock" at mainbus0 not configured
"i2srx-lrck-ext-clock" at mainbus0 not configured
"i2stx-bclk-ext-clock" at mainbus0 not configured
"i2stx-lrck-ext-clock" at mainbus0 not configured
"mclk-ext-clock" at mainbus0 not configured
"oscillator" at mainbus0 not configured
"rtc-oscillator" at mainbus0 not configured
"tdm-ext-clock" at mainbus0 not configured
simplebus0 at mainbus0: "soc"
plic0 at simplebus0
stfpciephy0 at simplebus0
stfpciephy1 at simplebus0
stfclock0 at simplebus0: stgcrg
syscon0 at simplebus0: "syscon"
stfclock1 at simplebus0: syscrg
syscon1 at simplebus0: "syscon"
stfclock2 at syscon1: pll
stfpinctrl0 at simplebus0
stfclock3 at simplebus0: aoncrg
syscon2 at simplebus0: "syscon"
"timer" at simplebus0 not configured
"cache-controller" at simplebus0 not configured
com0 at simplebus0: dw16550
com0: console
dwiic0 at simplebus0
iic0 at dwiic0
dwiic1 at simplebus0
iic1 at dwiic1
"spi" at simplebus0 not configured
"tdm" at simplebus0 not configured
"pwmdac" at simplebus0 not configured
"i2s" at simplebus0 not configured
"usb" at simplebus0 not configured
"phy" at simplebus0 not configured
dwiic2 at simplebus0
iic2 at dwiic2
axppmic0 at iic2 addr 0x36: AXP15060
dwiic3 at simplebus0
iic3 at dwiic3
"sony,imx219" at iic3 addr 0x10 not configured
"i2s" at simplebus0 not configured
"i2s" at simplebus0 not configured
"pwm" at simplebus0 not configured
stftemp0 at simplebus0
"spi" at simplebus0 not configured
"timer" at simplebus0 not configured
"watchdog" at simplebus0 not configured
"crypto" at simplebus0 not configured
"dma-controller" at simplebus0 not configured
"rng" at simplebus0 not configured
dwmmc0 at simplebus0: 49 MHz base clock
sdmmc0 at dwmmc0: 8-bit, mmc high-speed, dma
dwmmc1 at simplebus0: 49 MHz base clock
sdmmc1 at dwmmc1: 4-bit, sd high-speed, dma
dwqe0 at simplebus0 gmac 0: rev 0x00, address 6c:cf:39:00:31:1d
ytphy0 at dwqe0 phy 0: YT8531 10/100/1000 PHY, rev. 11
dwqe1 at simplebus0 gmac 1: rev 0x00, address 6c:cf:39:00:31:1e
dwqe1: reset timeout
ytphy1 at dwqe1 phy 0: 

Re: sysctl: Fixing "second level name in XX. is invalid" on trailing period

2023-08-29 Thread Ingo Schwarze
Hi Neel,

Neel Chauhan wrote on Tue, Aug 29, 2023 at 02:36:57PM -0700:
> On 2023-08-29 14:23, Theo Buehler wrote:
>> Neel Chauhan:

>>> +   if (string[strlen(string) - 1] == '.')
>>> +   buf[strlen(string) - 1] = '\0';

>> Careful with out-of-bounds accesses. What if string is "" ? Probably
>> easiest to do "len = strlen(string);" and if (len > 0 && ...).

> Good catch!
> Does this look better:

Not much better, and my main complaint isn't even that storing the
return value of strlen(3) in an int variable is bad style.

Your patch is still incorrect in so far as with a command like

  sysctl 'vm.malloc_conf=S>.'

your code would happily and silently remove the period from
the end of the value whereas in

  sysctl vm.malloc_conf.=S

it would still complain in exactly the same way as it does now.

I'm not sure any of the sysctl string nodes that currently exist
provide a use case for a trailing period in the value, but i wouldn't
exclude such a use case arising in the future.  For example, malloc_conf
already supports some trailing non-letter characters in the value,
and kern.hostname already supports periods in the middle of the value,
right now.

Besides, i believe that in security-critical tools, parsing should be
strict, and trying to guess what people mean by invalid syntax is not
generally a good idea.

What's next?  Should "sysctl kern.." also do something, and if so, what?  

I don't see what benefit is provided in making "sysctl vm.malloc_conf."
work, but i do see how flooding someone with multiple pages of
output might be unwelcome if they wanted to type "sysctl kern.tty"
but somehow forget the "tty" part and hit enter after "sysctl kern." -
the trailing dot does indicate that they probably only wanted a
sub-node, or doesn't it?

Yours,
  Ingo


> Index: sbin/sysctl/sysctl.c
> ===
> RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
> retrieving revision 1.259
> diff -u -p -u -r1.259 sysctl.c
> --- sbin/sysctl/sysctl.c  17 May 2023 22:12:51 -  1.259
> +++ sbin/sysctl/sysctl.c  29 Aug 2023 21:36:19 -
> @@ -377,6 +377,9 @@ parse(char *string, int flags)
> 
>   (void)strlcpy(buf, string, sizeof(buf));
>   bufp = buf;
> + len = strlen(string);
> + if (len > 0 && string[len - 1] == '.')
> + buf[len - 1] = '\0';
>   if ((cp = strchr(string, '=')) != NULL) {
>   *strchr(buf, '=') = '\0';
>   *cp++ = '\0';



Re: Removing extra space in "sysctl: top level name in . is invalid"

2023-08-29 Thread Neel Chauhan

On 2023-08-29 17:21, Ingo Schwarze wrote:

Hi Neel,

Neel Chauhan wrote on Tue, Aug 29, 2023 at 02:48:54PM -0700:

I have noticed a bug. When running "sysctl ." or "sysctl kern." 
without

my other patch, I get an extra space between "name" and "in":



sysctl: top level name  in . is invalid


Technically, that message is correct.  The name being complained about
is a zero length string, and of course there are space characters
marking a word boundary before and after it, resulting in two adjacent
space characters around a zero-length word.  But i see how some users
might find that confusing.

If people want that improved, see below for a patch that actually
improves the diagnostic message, resulting in:


In that case, I'll just let it go.

-Neel



Re: Removing extra space in "sysctl: top level name in . is invalid"

2023-08-29 Thread Ingo Schwarze
Hi Neel,

Neel Chauhan wrote on Tue, Aug 29, 2023 at 02:48:54PM -0700:

> I have noticed a bug. When running "sysctl ." or "sysctl kern." without 
> my other patch, I get an extra space between "name" and "in":
 
>> sysctl: top level name  in . is invalid

Technically, that message is correct.  The name being complained about
is a zero length string, and of course there are space characters
marking a word boundary before and after it, resulting in two adjacent
space characters around a zero-length word.  But i see how some users
might find that confusing.

If people want that improved, see below for a patch that actually
improves the diagnostic message, resulting in:

   $ sysctl kern.=test
  sysctl: empty second level name in kern.
   $ sysctl kern.  
  sysctl: empty second level name in kern.
   $ sysctl .=test
  sysctl: empty top level name in .
   $ sysctl .  
  sysctl: empty top level name in .
   $ sysctl =test  
  sysctl: empty top level name in 
   $ sysctl ''
  sysctl: empty top level name in 

Note this only happens for pathological names that are completely
empty or end in a dot, so i'm not sure it is very important.

Besides, there are other ways to provoke highly confusing error
messages by providing insane arguments.  Consider, for example:

   $ sysctl ' '  
  sysctl: top level name   in   is invalid
   $ sysctl 'kern.ostype'   
  kern.ostype=OpenBSD
   $ sysctl 'kern.os^Atype'
  sysctl: second level name ostype in kern.ostype is invalid
   $ sysctl 'kern.os^Gtype' 
  sysctl: second level name ostype in kern.ostype is invalid
   $ sysctl '^H'
  sysctl: top level name in is invalid

I'm not convinced accurately diagnosing every possible kind
of insane input is a priority.

Regarding the patch, note that the condition *bufp == NULL that
is being tested cannot actually happen because all callers test
that very condition right before calling findname(), and they
have to do that because they all want to call listall() in that
case.

Strangely, the property that *bufp cannot be NULL was already present
in McKusick's very first version of the program dated "5.1 93/03/30
23:48:06", about 3 months before the initial 4.4BSD release.  I have
no idea why he considered testing that useful.  Maybe he fell pray
to the confusing design of the strsep(3) API and actually wanted to
test for *name == '\0' but inadvertently wrote name == NULL and no
one ever noticed?

Maybe one argument in favor of my patch is that it makes the
source code of findname() easier to read in addition to making
the error message less confusing in your corner case.

Lightly tested so far.

Yours,
  Ingo


Index: sysctl.c
===
RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.259
diff -u -p -r1.259 sysctl.c
--- sysctl.c17 May 2023 22:12:51 -  1.259
+++ sysctl.c29 Aug 2023 23:31:09 -
@@ -2948,8 +2948,13 @@ findname(char *string, char *level, char
char *name;
int i;
 
-   if (namelist->list == 0 || (name = strsep(bufp, ".")) == NULL) {
+   if (namelist->list == 0) {
warnx("%s: incomplete specification", string);
+   return (-1);
+   }
+   name = strsep(bufp, ".");
+   if (*name == '\0') {
+   warnx("empty %s level name in %s", level, string);
return (-1);
}
for (i = 0; i < namelist->size; i++)



Removing extra space in "sysctl: top level name in . is invalid"

2023-08-29 Thread Neel Chauhan

Hi,

I have noticed a bug. When running "sysctl ." or "sysctl kern." without 
my other patch, I get an extra space between "name" and "in":



sysctl: top level name  in . is invalid


It should be:


sysctl: top level name in . is invalid


I also have a patch.

What this patch does is if the top level name is a blank string "", we 
do not output anything. Otherwise, we output a space.


The patch is very minor, and may not get accepted (which I understand if 
it doesn't).


-Neel

Patch:

Index: sbin/sysctl/sysctl.c
===
RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.259
diff -u -p -u -r1.259 sysctl.c
--- sbin/sysctl/sysctl.c17 May 2023 22:12:51 -  1.259
+++ sbin/sysctl/sysctl.c29 Aug 2023 21:45:56 -
@@ -2957,7 +2957,9 @@ findname(char *string, char *level, char
strcmp(name, namelist->list[i].ctl_name) == 0)
break;
if (i == namelist->size) {
-   warnx("%s level name %s in %s is invalid", level, name, string);
+   char sep = strcmp(name, "") == 0 ? '\0' : ' ';
+   warnx("%s level name %s%cin %s is invalid",
+   level, name, sep, string);
return (-1);
}
return (i);



Re: sysctl: Fixing "second level name in XX. is invalid" on trailing period

2023-08-29 Thread Neel Chauhan

On 2023-08-29 14:23, Theo Buehler wrote:

+   if (string[strlen(string) - 1] == '.')
+   buf[strlen(string) - 1] = '\0';


Careful with out-of-bounds accesses. What if string is "" ? Probably
easiest to do "len = strlen(string);" and if (len > 0 && ...).


Good catch!

Does this look better:

Index: sbin/sysctl/sysctl.c
===
RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.259
diff -u -p -u -r1.259 sysctl.c
--- sbin/sysctl/sysctl.c17 May 2023 22:12:51 -  1.259
+++ sbin/sysctl/sysctl.c29 Aug 2023 21:36:19 -
@@ -377,6 +377,9 @@ parse(char *string, int flags)

(void)strlcpy(buf, string, sizeof(buf));
bufp = buf;
+   len = strlen(string);
+   if (len > 0 && string[len - 1] == '.')
+   buf[len - 1] = '\0';
if ((cp = strchr(string, '=')) != NULL) {
*strchr(buf, '=') = '\0';
*cp++ = '\0';



Re: sysctl: Fixing "second level name in XX. is invalid" on trailing period

2023-08-29 Thread Theo Buehler
> + if (string[strlen(string) - 1] == '.')
> + buf[strlen(string) - 1] = '\0';

Careful with out-of-bounds accesses. What if string is "" ? Probably
easiest to do "len = strlen(string);" and if (len > 0 && ...).



sysctl: Fixing "second level name in XX. is invalid" on trailing period

2023-08-29 Thread Neel Chauhan

Hi,

I don't know if this is a big issue for you, but when
doing a `sysctl` call with a trailing period ("."),
like `sysctl kern.`, I get an error like this:


sysctl: second level name  in kern. is invalid


On other systems, if I do `sysctl XX.`, I get all the
entries inside XX. We should fix it, and I have a patch.

-Neel

The diff is below:

Index: sbin/sysctl/sysctl.c
===
RCS file: /cvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.259
diff -u -p -u -r1.259 sysctl.c
--- sbin/sysctl/sysctl.c17 May 2023 22:12:51 -  1.259
+++ sbin/sysctl/sysctl.c29 Aug 2023 19:42:56 -
@@ -377,6 +377,8 @@ parse(char *string, int flags)

(void)strlcpy(buf, string, sizeof(buf));
bufp = buf;
+   if (string[strlen(string) - 1] == '.')
+   buf[strlen(string) - 1] = '\0';
if ((cp = strchr(string, '=')) != NULL) {
*strchr(buf, '=') = '\0';
*cp++ = '\0';



Re: vmd: fix i8259 race condition, vioblk hang

2023-08-29 Thread Florian Riehm
Hi,

I tested your patch and it is a great improvement for me.
My vms are hanging reproducible without your fix.

Thank you

Florian





Am Di., 29. Aug. 2023 um 18:01 Uhr schrieb Dave Voutila :

>
> Dave Voutila  writes:
>
> > mbuhl@ found an issue where the emulated virtio block device can
> > hang. The tl;dr: the emulated pic never injects an interrupt and the
> > vioblk(4) driver in the guest starves, waiting to be told to check the
> > virtqueue.
> >
> > Flipping the bit in the i8259 using gdb causes the spice to flow once
> > again.
> >
> > This diff fixes two related things (so could be committed in 2 parts):
> >
> > 1. the irq deassert on a virtio pci interrupt status register read
> >should occur synchronously by the vcpu thread in the vm and not
> >async.
> >
> > 2. always raise the irr bit in the pic, regardless of mask. The mask is
> >used when finding an irq to ack and shouldn't be used to determine if
> >the irr bit is raised. AFAICT from the old i8259 data sheets, masking
> >should have no effect on if the irr bit is set.
> >
> > This is holding up another diff I want to share that reduces latency in
> > interrupts, but it's shown to make this i8259 race condition worse, so
> > I'd like to give folks a few days to check if the below diff causes
> > regressions. Once this is committed, I'll feel safe sharing the latency
> > diff with tech@.
> >
> > Any ok's pending a few days for folks to test?
>
> Still looking for ok's or test reports (other than from mbuhl@, of
> course.)
>
> >
> > -dv
> >
> >
> > diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> > commit - ad1cd1152fddbf55189657a2df9f2468409698ab
> > commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> > blob - f7862f5e9d17f96f5260358271fab8f253b26505
> > blob + b6891c52c824d7b8c69e67e5323772152b1ed844
> > --- usr.sbin/vmd/i8259.c
> > +++ usr.sbin/vmd/i8259.c
> > @@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq)
> >  {
> >   mutex_lock(_mtx);
> >   if (irq <= 7) {
> > - if (!ISSET(pics[MASTER].imr, 1 << irq)) {
> > - SET(pics[MASTER].irr, 1 << irq);
> > - pics[MASTER].asserted = 1;
> > - }
> > + SET(pics[MASTER].irr, 1 << irq);
> > + pics[MASTER].asserted = 1;
> >   } else {
> >   irq -= 8;
> > - if (!ISSET(pics[SLAVE].imr, 1 << irq)) {
> > - SET(pics[SLAVE].irr, 1 << irq);
> > - pics[SLAVE].asserted = 1;
> > + SET(pics[SLAVE].irr, 1 << irq);
> > + pics[SLAVE].asserted = 1;
> >
> > - /* Assert cascade IRQ on master PIC */
> > - SET(pics[MASTER].irr, 1 << 2);
> > - pics[MASTER].asserted = 1;
> > - }
> > + /* Assert cascade IRQ on master PIC */
> > + SET(pics[MASTER].irr, 1 << 2);
> > + pics[MASTER].asserted = 1;
> >   }
> >   mutex_unlock(_mtx);
> >  }
> > blob - 7bc76c4daed427dae82688452ec21985be679bc4
> > blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80
> > --- usr.sbin/vmd/vioblk.c
> > +++ usr.sbin/vmd/vioblk.c
> > @@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st
> >  extern struct vmd_vm *current_vm;
> >
> >  static const char *disk_type(int);
> > -static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev
> *);
> > +static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *,
> > +int8_t *);
> >  static int handle_io_write(struct viodev_msg *, struct virtio_dev *);
> >  void vioblk_notify_rx(struct vioblk_dev *);
> >  int vioblk_notifyq(struct vioblk_dev *);
> > @@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg)
> >   struct viodev_msg msg;
> >   struct imsg imsg;
> >   ssize_t n;
> > + int8_t intr = INTR_STATE_NOOP;
> >
> >   if (event & EV_READ) {
> >   if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
> > @@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg)
> >   }
> >   case VIODEV_MSG_IO_READ:
> >   /* Read IO: make sure to send a reply */
> > - msg.data = handle_io_read(, dev);
> > + msg.data = handle_io_read(, dev, );
> >   msg.data_valid = 1;
> > + msg.state = intr;
> >   imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1,
> ,
> >   sizeof(msg));
> >   break;
> > @@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct
> virtio_d
> >  }
> >
> >  static uint32_t
> > -handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev)
> > +handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t
> *intr)
> >  {
> >   struct vioblk_dev *vioblk = >vioblk;
> >   uint8_t sz = msg->io_sz;
> > @@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct
> virtio_d

Re: JH7110 PCIe device tree binding update

2023-08-29 Thread Mark Kettenis
> Date: Tue, 29 Aug 2023 11:58:23 +0200
> From: Mark Kettenis 
> 
> Upstreaming of the JH7110 PCIe device tree bindings isn't finished
> yet, but it seems some progress has been made and things have been
> reviewed by some of the key people involved:
> 
>   https://patchwork.kernel.org/project/linux-pci/list/?series=779297
> 
> Here is a diff that adjusts the driver to the current state of things
> such that we can use the latest device tree from:
> 
>   https://github.com/starfive-tech/linux/tree/JH7110_VisionFive2_upstream
> 
> to continue development.  The idea is to support the preliminary
> bindings a little bit longer such that folks can update their device
> trees.  Will probably drop support for the preliminary bindings in a
> few weeks.
> 
> ok?

patrick@ pointed out that the dv_unit check won't work properly if the
first PCIe controller is disabled.  So here is a diff that checks the
device address instead like we do for dwqe(4).

ok?

Index: arch/riscv64/dev/stfpcie.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/stfpcie.c,v
retrieving revision 1.1
diff -u -p -r1.1 stfpcie.c
--- arch/riscv64/dev/stfpcie.c  8 Jul 2023 10:06:13 -   1.1
+++ arch/riscv64/dev/stfpcie.c  29 Aug 2023 19:13:28 -
@@ -80,16 +80,23 @@
 #define  TRSL_ID_PCIE_RX_TX0
 #define  TRSL_ID_PCIE_CONFIG   1
 
-#define STG_ARFUN_AXI4_SLVL_MASK   (0x7 << 8)
-#define STG_ARFUN_AXI4_SLVL_SHIFT  8
-#define STG_AWFUN_AXI4_SLVL_MASK   (0x7 << 0)
-#define STG_AWFUN_AXI4_SLVL_SHIFT  0
-#define STG_AWFUN_CKREF_SRC_MASK   (0x3 << 18)
-#define STG_AWFUN_CKREF_SRC_SHIFT  18
-#define STG_AWFUN_CLKREQ   (1 << 22)
-#define STG_PHY_FUNC_SHIFT 9
-#define STG_K_RP_NEP   (1 << 8)
-#define STG_DATA_LINK_ACTIVE   (1 << 5)
+#define STG_PCIE0_BASE 0x048
+#define STG_PCIE1_BASE 0x1f8
+
+#define STG_ARFUN  0x078
+#define  STG_ARFUN_AXI4_SLVL_MASK  (0x7 << 8)
+#define  STG_ARFUN_AXI4_SLVL_SHIFT 8
+#define  STG_PHY_FUNC_SHIFT9
+#define STG_AWFUN  0x07c
+#define  STG_AWFUN_AXI4_SLVL_MASK  (0x7 << 0)
+#define  STG_AWFUN_AXI4_SLVL_SHIFT 0
+#define  STG_AWFUN_CKREF_SRC_MASK  (0x3 << 18)
+#define  STG_AWFUN_CKREF_SRC_SHIFT 18
+#define  STG_AWFUN_CLKREQ  (1 << 22)
+#define STG_RP_NEP 0x0e8
+#define  STG_K_RP_NEP  (1 << 8)
+#define STG_LNKSTA 0x170
+#define  STG_DATA_LINK_ACTIVE  (1 << 5)
 
 #define HREAD4(sc, reg)
\
 (bus_space_read_4((sc)->sc_iot, (sc)->sc_ioh, (reg)))
@@ -228,15 +235,18 @@ stfpcie_attach(struct device *parent, st
uint32_t bus_range[2];
bus_addr_t cfg_base;
bus_size_t cfg_size;
-   uint32_t *reset_gpio;
-   int reset_gpiolen;
-   uint32_t stg[5], arfun, awfun, rp_nep, lnksta;
-   uint32_t reg;
+   bus_size_t stg_base;
+   uint32_t *perst_gpio;
+   int perst_gpiolen;
+   uint32_t reg, stg;
int idx, node, timo;
 
sc->sc_iot = faa->fa_iot;
 
-   idx = OF_getindex(faa->fa_node, "reg", "reg-names");
+   idx = OF_getindex(faa->fa_node, "apb", "reg-names");
+   /* XXX Preliminary bindings used a different name. */
+   if (idx < 0)
+   idx = OF_getindex(faa->fa_node, "reg", "reg-names");
if (idx < 0 || idx >= faa->fa_nreg ||
bus_space_map(sc->sc_iot, faa->fa_reg[idx].addr,
faa->fa_reg[idx].size, 0, >sc_ioh)) {
@@ -244,7 +254,10 @@ stfpcie_attach(struct device *parent, st
return;
}
 
-   idx = OF_getindex(faa->fa_node, "config", "reg-names");
+   idx = OF_getindex(faa->fa_node, "cfg", "reg-names");
+   /* XXX Preliminary bindings used a different name. */
+   if (idx < 0)
+   idx = OF_getindex(faa->fa_node, "config", "reg-names");
if (idx < 0 || idx >= faa->fa_nreg ||
bus_space_map(sc->sc_iot, faa->fa_reg[idx].addr,
faa->fa_reg[idx].size, 0, >sc_cfg_ioh)) {
@@ -257,34 +270,48 @@ stfpcie_attach(struct device *parent, st
sc->sc_dmat = faa->fa_dmat;
sc->sc_node = faa->fa_node;
 
-   if (OF_getpropintarray(sc->sc_node, "starfive,stg-syscon", stg,
-   sizeof(stg)) != sizeof(stg)) {
+   switch (cfg_base) {
+   case 0x94000:
+   stg_base = STG_PCIE0_BASE;
+   break;
+   case 0x9c000:
+   stg_base = STG_PCIE1_BASE;
+   break;
+   default:
+   printf(": unknown controller at 0x%lx\n", cfg_base);
+   return;
+   }
+
+   /*
+* XXX This was an array in the preliminary bindings; simplify
+* when we drop support for those.
+*/
+   if (OF_getpropintarray(sc->sc_node, 

Re: vmd: fix i8259 race condition, vioblk hang

2023-08-29 Thread Dave Voutila


Dave Voutila  writes:

> mbuhl@ found an issue where the emulated virtio block device can
> hang. The tl;dr: the emulated pic never injects an interrupt and the
> vioblk(4) driver in the guest starves, waiting to be told to check the
> virtqueue.
>
> Flipping the bit in the i8259 using gdb causes the spice to flow once
> again.
>
> This diff fixes two related things (so could be committed in 2 parts):
>
> 1. the irq deassert on a virtio pci interrupt status register read
>should occur synchronously by the vcpu thread in the vm and not
>async.
>
> 2. always raise the irr bit in the pic, regardless of mask. The mask is
>used when finding an irq to ack and shouldn't be used to determine if
>the irr bit is raised. AFAICT from the old i8259 data sheets, masking
>should have no effect on if the irr bit is set.
>
> This is holding up another diff I want to share that reduces latency in
> interrupts, but it's shown to make this i8259 race condition worse, so
> I'd like to give folks a few days to check if the below diff causes
> regressions. Once this is committed, I'll feel safe sharing the latency
> diff with tech@.
>
> Any ok's pending a few days for folks to test?

Still looking for ok's or test reports (other than from mbuhl@, of
course.)

>
> -dv
>
>
> diff refs/heads/master cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> commit - ad1cd1152fddbf55189657a2df9f2468409698ab
> commit + cdba90a89ee6e77b52c2b6955d36a260a23a3b85
> blob - f7862f5e9d17f96f5260358271fab8f253b26505
> blob + b6891c52c824d7b8c69e67e5323772152b1ed844
> --- usr.sbin/vmd/i8259.c
> +++ usr.sbin/vmd/i8259.c
> @@ -222,20 +222,16 @@ i8259_assert_irq(uint8_t irq)
>  {
>   mutex_lock(_mtx);
>   if (irq <= 7) {
> - if (!ISSET(pics[MASTER].imr, 1 << irq)) {
> - SET(pics[MASTER].irr, 1 << irq);
> - pics[MASTER].asserted = 1;
> - }
> + SET(pics[MASTER].irr, 1 << irq);
> + pics[MASTER].asserted = 1;
>   } else {
>   irq -= 8;
> - if (!ISSET(pics[SLAVE].imr, 1 << irq)) {
> - SET(pics[SLAVE].irr, 1 << irq);
> - pics[SLAVE].asserted = 1;
> + SET(pics[SLAVE].irr, 1 << irq);
> + pics[SLAVE].asserted = 1;
>
> - /* Assert cascade IRQ on master PIC */
> - SET(pics[MASTER].irr, 1 << 2);
> - pics[MASTER].asserted = 1;
> - }
> + /* Assert cascade IRQ on master PIC */
> + SET(pics[MASTER].irr, 1 << 2);
> + pics[MASTER].asserted = 1;
>   }
>   mutex_unlock(_mtx);
>  }
> blob - 7bc76c4daed427dae82688452ec21985be679bc4
> blob + 4d321fd4ff23514ffa7a4bee0eeb7a9324020d80
> --- usr.sbin/vmd/vioblk.c
> +++ usr.sbin/vmd/vioblk.c
> @@ -39,7 +39,8 @@ static uint32_t handle_io_read(struct viodev_msg *, st
>  extern struct vmd_vm *current_vm;
>
>  static const char *disk_type(int);
> -static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *);
> +static uint32_t handle_io_read(struct viodev_msg *, struct virtio_dev *,
> +int8_t *);
>  static int handle_io_write(struct viodev_msg *, struct virtio_dev *);
>  void vioblk_notify_rx(struct vioblk_dev *);
>  int vioblk_notifyq(struct vioblk_dev *);
> @@ -723,6 +724,7 @@ handle_sync_io(int fd, short event, void *arg)
>   struct viodev_msg msg;
>   struct imsg imsg;
>   ssize_t n;
> + int8_t intr = INTR_STATE_NOOP;
>
>   if (event & EV_READ) {
>   if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
> @@ -770,8 +772,9 @@ handle_sync_io(int fd, short event, void *arg)
>   }
>   case VIODEV_MSG_IO_READ:
>   /* Read IO: make sure to send a reply */
> - msg.data = handle_io_read(, dev);
> + msg.data = handle_io_read(, dev, );
>   msg.data_valid = 1;
> + msg.state = intr;
>   imsg_compose_event(iev, IMSG_DEVOP_MSG, 0, 0, -1, ,
>   sizeof(msg));
>   break;
> @@ -844,7 +847,7 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
>  }
>
>  static uint32_t
> -handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev)
> +handle_io_read(struct viodev_msg *msg, struct virtio_dev *dev, int8_t *intr)
>  {
>   struct vioblk_dev *vioblk = >vioblk;
>   uint8_t sz = msg->io_sz;
> @@ -1001,7 +1004,8 @@ handle_io_read(struct viodev_msg *msg, struct virtio_d
>   case VIRTIO_CONFIG_ISR_STATUS:
>   data = vioblk->cfg.isr_status;
>   vioblk->cfg.isr_status = 0;
> - virtio_deassert_pic_irq(dev, 0);
> + if (intr != NULL)
> + *intr = INTR_STATE_DEASSERT;
>   break;
>   default:
>   return (0x);
> blob - c16ad2635ea622fd7fce48b5145e2199dd451346
> blob + 

JH7110 PCIe device tree binding update

2023-08-29 Thread Mark Kettenis
Upstreaming of the JH7110 PCIe device tree bindings isn't finished
yet, but it seems some progress has been made and things have been
reviewed by some of the key people involved:

  https://patchwork.kernel.org/project/linux-pci/list/?series=779297

Here is a diff that adjusts the driver to the current state of things
such that we can use the latest device tree from:

  https://github.com/starfive-tech/linux/tree/JH7110_VisionFive2_upstream

to continue development.  The idea is to support the preliminary
bindings a little bit longer such that folks can update their device
trees.  Will probably drop support for the preliminary bindings in a
few weeks.

ok?


Index: arch/riscv64/dev/stfpcie.c
===
RCS file: /cvs/src/sys/arch/riscv64/dev/stfpcie.c,v
retrieving revision 1.1
diff -u -p -r1.1 stfpcie.c
--- arch/riscv64/dev/stfpcie.c  8 Jul 2023 10:06:13 -   1.1
+++ arch/riscv64/dev/stfpcie.c  29 Aug 2023 09:47:24 -
@@ -80,16 +80,23 @@
 #define  TRSL_ID_PCIE_RX_TX0
 #define  TRSL_ID_PCIE_CONFIG   1
 
-#define STG_ARFUN_AXI4_SLVL_MASK   (0x7 << 8)
-#define STG_ARFUN_AXI4_SLVL_SHIFT  8
-#define STG_AWFUN_AXI4_SLVL_MASK   (0x7 << 0)
-#define STG_AWFUN_AXI4_SLVL_SHIFT  0
-#define STG_AWFUN_CKREF_SRC_MASK   (0x3 << 18)
-#define STG_AWFUN_CKREF_SRC_SHIFT  18
-#define STG_AWFUN_CLKREQ   (1 << 22)
-#define STG_PHY_FUNC_SHIFT 9
-#define STG_K_RP_NEP   (1 << 8)
-#define STG_DATA_LINK_ACTIVE   (1 << 5)
+#define STG_PCIE0_BASE 0x048
+#define STG_PCIE1_BASE 0x1f8
+
+#define STG_ARFUN  0x078
+#define  STG_ARFUN_AXI4_SLVL_MASK  (0x7 << 8)
+#define  STG_ARFUN_AXI4_SLVL_SHIFT 8
+#define  STG_PHY_FUNC_SHIFT9
+#define STG_AWFUN  0x07c
+#define  STG_AWFUN_AXI4_SLVL_MASK  (0x7 << 0)
+#define  STG_AWFUN_AXI4_SLVL_SHIFT 0
+#define  STG_AWFUN_CKREF_SRC_MASK  (0x3 << 18)
+#define  STG_AWFUN_CKREF_SRC_SHIFT 18
+#define  STG_AWFUN_CLKREQ  (1 << 22)
+#define STG_RP_NEP 0x0e8
+#define  STG_K_RP_NEP  (1 << 8)
+#define STG_LNKSTA 0x170
+#define  STG_DATA_LINK_ACTIVE  (1 << 5)
 
 #define HREAD4(sc, reg)
\
 (bus_space_read_4((sc)->sc_iot, (sc)->sc_ioh, (reg)))
@@ -228,15 +235,18 @@ stfpcie_attach(struct device *parent, st
uint32_t bus_range[2];
bus_addr_t cfg_base;
bus_size_t cfg_size;
-   uint32_t *reset_gpio;
-   int reset_gpiolen;
-   uint32_t stg[5], arfun, awfun, rp_nep, lnksta;
-   uint32_t reg;
+   bus_size_t stg_base;
+   uint32_t *perst_gpio;
+   int perst_gpiolen;
+   uint32_t reg, stg;
int idx, node, timo;
 
sc->sc_iot = faa->fa_iot;
 
-   idx = OF_getindex(faa->fa_node, "reg", "reg-names");
+   idx = OF_getindex(faa->fa_node, "apb", "reg-names");
+   /* XXX Preliminary bindings used a different name. */
+   if (idx < 0)
+   idx = OF_getindex(faa->fa_node, "reg", "reg-names");
if (idx < 0 || idx >= faa->fa_nreg ||
bus_space_map(sc->sc_iot, faa->fa_reg[idx].addr,
faa->fa_reg[idx].size, 0, >sc_ioh)) {
@@ -244,7 +254,10 @@ stfpcie_attach(struct device *parent, st
return;
}
 
-   idx = OF_getindex(faa->fa_node, "config", "reg-names");
+   idx = OF_getindex(faa->fa_node, "cfg", "reg-names");
+   /* XXX Preliminary bindings used a different name. */
+   if (idx < 0)
+   idx = OF_getindex(faa->fa_node, "config", "reg-names");
if (idx < 0 || idx >= faa->fa_nreg ||
bus_space_map(sc->sc_iot, faa->fa_reg[idx].addr,
faa->fa_reg[idx].size, 0, >sc_cfg_ioh)) {
@@ -257,34 +270,41 @@ stfpcie_attach(struct device *parent, st
sc->sc_dmat = faa->fa_dmat;
sc->sc_node = faa->fa_node;
 
-   if (OF_getpropintarray(sc->sc_node, "starfive,stg-syscon", stg,
-   sizeof(stg)) != sizeof(stg)) {
+   if (sc->sc_dev.dv_unit == 0)
+   stg_base = STG_PCIE0_BASE;
+   else
+   stg_base = STG_PCIE1_BASE;
+
+   /*
+* XXX This was an array in the preliminary bindings; simplify
+* when we drop support for those.
+*/
+   if (OF_getpropintarray(sc->sc_node, "starfive,stg-syscon", ,
+   sizeof(stg)) < sizeof(stg)) {
printf(": failed to get starfive,stg-syscon\n");
return;
}
-   arfun = stg[1];
-   awfun = stg[2];
-   rp_nep = stg[3];
-   lnksta = stg[4];
 
-   rm = regmap_byphandle(stg[0]);
+   rm = regmap_byphandle(stg);
if (rm == NULL) {
printf(": can't get regmap\n");
return;
}
 
-   reg = regmap_read_4(rm, rp_nep);
+   

Re: PATCH: unify VAR!=cmd and other command execution

2023-08-29 Thread Marc Espie
On Mon, Aug 28, 2023 at 10:03:46PM +0200, Theo Buehler wrote:
> On Mon, Aug 28, 2023 at 09:18:40PM +0200, Marc Espie wrote:
> > On Mon, Aug 28, 2023 at 08:42:50PM +0200, Marc Espie wrote:
> > > Turns out the exec in cmd_exec.c has absolutely zero reason to be
> > > different from what engine does.
> > > 
> > > This small patch moves a bit of code around, so that all execv() consumers
> > > benefit from the same optimisation (namely, no extra shell when not 
> > > needed).
> > > 
> > > The only visible change should be that now, VAR!=cmd *will* display
> > > some relevant information if exec fails.
> > > 
> > > This is a fairly trivial change, I don't expect any fallout.
> > > 
> > > (still need to run it through tests)
> > 
> > Better with the patch (thx miod@)
> > 
> > Index: cmd_exec.c
> 
> [...]
> 
> Please drop this:
> 
> > +static void retry_with_temp_file(bool, char **);
> 
> Otherwise ok once you're happy with your testing.
> 
> 
Oh, I thought I had removed the code along with the call.

Yeah, I don't intend to commit that part with the patch anyhow.

Gonna run a build or two just to be sure.