date:20220727

Re: [PATCH v6 1/5] target/riscv: Add smstateen support

2022-07-27 Thread Mayuresh Chitale

On Mon, 2022-07-25 at 15:11 +0800, Weiwei Li wrote:
> 在 2022/7/24 下午11:39, Mayuresh Chitale 写道:
> > On Fri, 2022-07-22 at 08:31 +0800, Weiwei Li wrote:
> > > 在 2022/7/21 下午11:31, Mayuresh Chitale 写道:
> > > > Smstateen extension specifies a mechanism to close
> > > > the potential covert channels that could cause security issues.
> > > > 
> > > > This patch adds the CSRs defined in the specification and
> > > > the corresponding predicates and read/write functions.
> > > > 
> > > > Signed-off-by: Mayuresh Chitale 
> > > > ---
> > > >target/riscv/cpu.h  |   4 +
> > > >target/riscv/cpu_bits.h |  37 
> > > >target/riscv/csr.c  | 370
> > > > 
> > > >target/riscv/machine.c  |  21 +++
> > > >4 files changed, 432 insertions(+)
> > > > 
> > > > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > > > index ffb1a18873..7f8e5b0014 100644
> > > > --- a/target/riscv/cpu.h
> > > > +++ b/target/riscv/cpu.h
> > > > @@ -354,6 +354,9 @@ struct CPUArchState {
> > > >
> > > >/* CSRs for execution enviornment configuration */
> > > >uint64_t menvcfg;
> > > > +uint64_t mstateen[SMSTATEEN_MAX_COUNT];
> > > > +uint64_t hstateen[SMSTATEEN_MAX_COUNT];
> > > > +uint64_t sstateen[SMSTATEEN_MAX_COUNT];
> > > >target_ulong senvcfg;
> > > >uint64_t henvcfg;
> > > >#endif
> > > > @@ -426,6 +429,7 @@ struct RISCVCPUConfig {
> > > >bool ext_zkt;
> > > >bool ext_ifencei;
> > > >bool ext_icsr;
> > > > +bool ext_smstateen;
> > > >bool ext_svinval;
> > > >bool ext_svnapot;
> > > >bool ext_svpbmt;
> > > > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > > > index 6be5a9e9f0..56b7c5bed6 100644
> > > > --- a/target/riscv/cpu_bits.h
> > > > +++ b/target/riscv/cpu_bits.h
> > > > @@ -199,6 +199,12 @@
> > > >/* Supervisor Configuration CSRs */
> > > >#define CSR_SENVCFG 0x10A
> > > >
> > > > +/* Supervisor state CSRs */
> > > > +#define CSR_SSTATEEN0   0x10C
> > > > +#define CSR_SSTATEEN1   0x10D
> > > > +#define CSR_SSTATEEN2   0x10E
> > > > +#define CSR_SSTATEEN3   0x10F
> > > > +
> > > >/* Supervisor Trap Handling */
> > > >#define CSR_SSCRATCH0x140
> > > >#define CSR_SEPC0x141
> > > > @@ -242,6 +248,16 @@
> > > >#define CSR_HENVCFG 0x60A
> > > >#define CSR_HENVCFGH0x61A
> > > >
> > > > +/* Hypervisor state CSRs */
> > > > +#define CSR_HSTATEEN0   0x60C
> > > > +#define CSR_HSTATEEN0H  0x61C
> > > > +#define CSR_HSTATEEN1   0x60D
> > > > +#define CSR_HSTATEEN1H  0x61D
> > > > +#define CSR_HSTATEEN2   0x60E
> > > > +#define CSR_HSTATEEN2H  0x61E
> > > > +#define CSR_HSTATEEN3   0x60F
> > > > +#define CSR_HSTATEEN3H  0x61F
> > > > +
> > > >/* Virtual CSRs */
> > > >#define CSR_VSSTATUS0x200
> > > >#define CSR_VSIE0x204
> > > > @@ -283,6 +299,27 @@
> > > >#define CSR_MENVCFG 0x30A
> > > >#define CSR_MENVCFGH0x31A
> > > >
> > > > +/* Machine state CSRs */
> > > > +#define CSR_MSTATEEN0   0x30C
> > > > +#define CSR_MSTATEEN0H  0x31C
> > > > +#define CSR_MSTATEEN1   0x30D
> > > > +#define CSR_MSTATEEN1H  0x31D
> > > > +#define CSR_MSTATEEN2   0x30E
> > > > +#define CSR_MSTATEEN2H  0x31E
> > > > +#define CSR_MSTATEEN3   0x30F
> > > > +#define CSR_MSTATEEN3H  0x31F
> > > > +
> > > > +/* Common defines for all smstateen */
> > > > +#define SMSTATEEN_MAX_COUNT 4
> > > > +#define SMSTATEEN0_CS   (1ULL << 0)
> > > > +#define SMSTATEEN0_FCSR (1ULL << 1)
> > > > +#define SMSTATEEN0_HSCONTXT (1ULL << 57)
> > > > +#define SMSTATEEN0_IMSIC(1ULL << 58)
> > > > +#define SMSTATEEN0_AIA  (1ULL << 59)
> > > > +#define SMSTATEEN0_SVSLCT   (1ULL << 60)
> > > > +#define SMSTATEEN0_HSENVCFG (1ULL << 62)
> > > > +#define SMSTATEEN_STATEN(1ULL << 63)
> > > Maybe  SMSTATEEN_STATEEN better.
> > ok. Will update in the next version.
> > > > +
> > > >/* Enhanced Physical Memory Protection (ePMP) */
> > > >#define CSR_MSECCFG 0x747
> > > >#define CSR_MSECCFGH0x757
> > > > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > > > index 235f2a011e..27032a416c 100644
> > > > --- a/target/riscv/csr.c
> > > > +++ b/target/riscv/csr.c
> > > > @@ -339,6 +339,68 @@ static RISCVException
> > > > hmode32(CPURISCVState
> > > > *env, int csrno)
> > > >
> > > >}
> > > >
> > > > +static RISCVException mstateen(CPURISCVState *env, int csrno)
> > > > +{
> > > > +CPUState *cs = env_cpu(env);
> > > > +RISCVCPU *cpu = RISCV_CPU(cs);
> > > > +
> > > > +if (!cpu->cfg.ext_smstateen) {
> > > > +return RISCV_EXCP_ILLEGAL_INST;
> > > > +}
> > > > +
> > > > +return any(env, csrno);
> > > > +}
> > > > +
> > > > +static RISCVException hstateen_pred(CPURISCVState *env, int
> > > > csrno,
> > > > int base)
> >

Re: VIRTIO_NET_F_MTU not negotiated

2022-07-27 Thread Jason Wang

On Thu, Jul 28, 2022 at 1:39 PM Eli Cohen  wrote:
>
> > From: Jason Wang 
> > Sent: Thursday, July 28, 2022 5:09 AM
> > To: Eli Cohen 
> > Cc: Eugenio Perez Martin ; qemu-devel@nongnu.org; 
> > Michael S. Tsirkin ;
> > virtualizat...@lists.linux-foundation.org
> > Subject: Re: VIRTIO_NET_F_MTU not negotiated
> >
> > On Wed, Jul 27, 2022 at 2:52 PM Eli Cohen  wrote:
> > >
> > > I found out that the reason why I could not enforce the mtu stems from 
> > > the fact that I did not configure max mtu for the net device
> > (e.g. through libvirt ).
> > > Libvirt does not allow this configuration for vdpa devices and probably 
> > > for a reason. The vdpa backend driver has the freedom to do
> > it using its copy of virtio_net_config.
> > >
> > > The code in qemu that is responsible to allow to consider the device MTU 
> > > restriction is here:
> > >
> > > static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > > {
> > > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > > VirtIONet *n = VIRTIO_NET(dev);
> > > NetClientState *nc;
> > > int i;
> > >
> > > if (n->net_conf.mtu) {
> > > n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
> > > }
> > >
> > > The above code can be interpreted as follows:
> > > if the command line arguments of qemu indicates that mtu should be 
> > > limited, then we would read this mtu limitation from the
> > device (that actual value is ignored).
> > >
> > > I worked around this limitation by unconditionally setting 
> > > VIRTIO_NET_F_MTU in the host features. As said, it only indicates that
> > we should read the actual limitation for the device.
> > >
> > > If this makes sense I can send a patch to fix this.
> >
> > I wonder whether it's worth to bother:
> >
> > 1) mgmt (above libvirt) should have the knowledge to prepare the correct XML
> > 2) it's not specific to MTU, we had other features work like, for
> > example, the multiqueue?
> >
>
>
> Currently libvirt does not recognize setting the mtu through XML for vdpa 
> device. So you mean the fix should go to libvirt?

Probably.

> Furthermore, even if libvirt supports MTU configuration for a vdpa device, 
> the actual value provided will be ignored and the limitation will be taken 
> from what the vdpa device published in its virtio_net_config structure. That 
> makes the XML configuration binary.

Yes, we suffer from a similar issue for "queues=". I think we should
fix qemu by failing the initialization if the value provided by cli
doesn't match what is read from config space.

E.g when mtu=9000 was set by cli but the actual mtu is 1500.

Thanks

>
> > Thanks
>

RE: VIRTIO_NET_F_MTU not negotiated

2022-07-27 Thread Eli Cohen

> From: Jason Wang 
> Sent: Thursday, July 28, 2022 5:09 AM
> To: Eli Cohen 
> Cc: Eugenio Perez Martin ; qemu-devel@nongnu.org; 
> Michael S. Tsirkin ;
> virtualizat...@lists.linux-foundation.org
> Subject: Re: VIRTIO_NET_F_MTU not negotiated
> 
> On Wed, Jul 27, 2022 at 2:52 PM Eli Cohen  wrote:
> >
> > I found out that the reason why I could not enforce the mtu stems from the 
> > fact that I did not configure max mtu for the net device
> (e.g. through libvirt ).
> > Libvirt does not allow this configuration for vdpa devices and probably for 
> > a reason. The vdpa backend driver has the freedom to do
> it using its copy of virtio_net_config.
> >
> > The code in qemu that is responsible to allow to consider the device MTU 
> > restriction is here:
> >
> > static void virtio_net_device_realize(DeviceState *dev, Error **errp)
> > {
> > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > VirtIONet *n = VIRTIO_NET(dev);
> > NetClientState *nc;
> > int i;
> >
> > if (n->net_conf.mtu) {
> > n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
> > }
> >
> > The above code can be interpreted as follows:
> > if the command line arguments of qemu indicates that mtu should be limited, 
> > then we would read this mtu limitation from the
> device (that actual value is ignored).
> >
> > I worked around this limitation by unconditionally setting VIRTIO_NET_F_MTU 
> > in the host features. As said, it only indicates that
> we should read the actual limitation for the device.
> >
> > If this makes sense I can send a patch to fix this.
> 
> I wonder whether it's worth to bother:
> 
> 1) mgmt (above libvirt) should have the knowledge to prepare the correct XML
> 2) it's not specific to MTU, we had other features work like, for
> example, the multiqueue?
> 


Currently libvirt does not recognize setting the mtu through XML for vdpa 
device. So you mean the fix should go to libvirt?
Furthermore, even if libvirt supports MTU configuration for a vdpa device, the 
actual value provided will be ignored and the limitation will be taken from 
what the vdpa device published in its virtio_net_config structure. That makes 
the XML configuration binary.

> Thanks

Re: [PATCH v3] target/ppc: Implement new wait variants

2022-07-27 Thread Joel Stanley

On Wed, 27 Jul 2022 at 13:49, Daniel Henrique Barboza
 wrote:
>
>
>
> On 7/20/22 10:33, Nicholas Piggin wrote:
> > ISA v2.06 adds new variations of wait, specified by the WC field. These
> > are not all compatible with the prior wait implementation, because they
> > add additional conditions that cause the processor to resume, which can
> > cause software to hang or run very slowly.
>
> So I suppose this is not a new feature, but a bug fix to remediate these hangs
> cause by the incompatibility of the WC field  with other ISA versions. Is
> that right?

That's the case. Nick has some kernel patches that make Linux use the
new opcode:

 https://lore.kernel.org/all/20220720132132.903462-1-npig...@gmail.com/

With these applied the kernel hangs during boot if more than one CPU
is present. I was able to reproduce with ppc64le_defconfig and this
command line:

 qemu-system-ppc64 -M pseries,x-vof=on -cpu POWER10 -smp 2 -nographic
-kernel zImage.pseries -no-reboot

Qemu will exit (as there's no filesystem) if the test "passes", or
hang during boot if it hits the bug.

There's a kernel here with the patches applied in case someone else
wants to test:

https://ozlabs.org/~joel/zImage.pseries-v5.19-rc8-wait-v3

Tested-by: Joel Stanley 

Because of the hang it would be best if we merged the patch as a fix
sooner rather than later.

Cheers,

Joel

> I'm explicitly asking for it because if it's a bug fix it's ok to pick it
> during the freeze. Especially here, given that what you're doing is mostly
> adding no-ops for conditions that we're not covering.
>
> >
> > ISA v3.0 changed the wait opcode and removed the new variants (retaining
> > the WC field but making non-zero values reserved).
> >
> > ISA v3.1 added new WC values to the new wait opcode, and added a PL
> > field.
> >
> > This implements the new wait encoding and supports WC variants with
> > no-op implementations, which provides basic correctness as explained in
> > comments.
> >
> > Signed-off-by: Nicholas Piggin 
> > ---
> > v3:
> > - Add EXTRACT_HELPERs
> > - Reserved fields should be ignored, not trap.
> > - v3.1 defines special case of reserved PL values being treated as
> >a no-op when WC=2.
> > - Change code organization to (hopefully) be easier to follow each
> >ISA / variation.
> > - Tested old wait variant with Linux e6500 boot and verify that
> >gen_wait is called and takes the expected path.
> >
> > Thanks,
> > Nick
> >
> >   target/ppc/internal.h  |  3 ++
> >   target/ppc/translate.c | 96 ++
> >   2 files changed, 91 insertions(+), 8 deletions(-)
> >
> > diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> > index 2add128cd1..57c0a42a6b 100644
> > --- a/target/ppc/internal.h
> > +++ b/target/ppc/internal.h
> > @@ -168,6 +168,9 @@ EXTRACT_HELPER_SPLIT_3(DX, 10, 6, 6, 5, 16, 1, 1, 0, 0)
> >   /* darn */
> >   EXTRACT_HELPER(L, 16, 2);
> >   #endif
> > +/* wait */
> > +EXTRACT_HELPER(WC, 21, 2);
> > +EXTRACT_HELPER(PL, 16, 2);
> >
> >   /***Jump target decoding  
> >  ***/
> >   /* Immediate address */
> > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > index 1d6daa4608..e0a835ac90 100644
> > --- a/target/ppc/translate.c
> > +++ b/target/ppc/translate.c
> > @@ -4066,12 +4066,91 @@ static void gen_sync(DisasContext *ctx)
> >   /* wait */
> >   static void gen_wait(DisasContext *ctx)
> >   {
> > -TCGv_i32 t0 = tcg_const_i32(1);
> > -tcg_gen_st_i32(t0, cpu_env,
> > -   -offsetof(PowerPCCPU, env) + offsetof(CPUState, 
> > halted));
> > -tcg_temp_free_i32(t0);
> > -/* Stop translation, as the CPU is supposed to sleep from now */
> > -gen_exception_nip(ctx, EXCP_HLT, ctx->base.pc_next);
> > +uint32_t wc;
> > +
> > +if (ctx->insns_flags & PPC_WAIT) {
> > +/* v2.03-v2.07 define an older incompatible 'wait' encoding. */
> > +
> > +if (ctx->insns_flags2 & PPC2_PM_ISA206) {
> > +/* v2.06 introduced the WC field. WC > 0 may be treated as 
> > no-op. */
> > +wc = WC(ctx->opcode);
> > +} else {
> > +wc = 0;
> > +}
> > +
> > +} else if (ctx->insns_flags2 & PPC2_ISA300) {
> > +/* v3.0 defines a new 'wait' encoding. */
> > +wc = WC(ctx->opcode);
>
>
> The ISA seems to indicate that WC=3 is always reserved in both ISA300 and
> ISA310. I believe you can check for WC=3 and gen_invalid() return right
> off the bat at this point.
>
>
> Thanks,
>
>
> Daniel
>
>
>
> > +if (ctx->insns_flags2 & PPC2_ISA310) {
> > +uint32_t pl = PL(ctx->opcode);
> > +
> > +/* WC 1,2 may be treated as no-op. WC 3 is reserved. */
> > +if (wc == 3) {
> > +gen_invalid(ctx);
> > +return;
> > +}
> > +
> > +/* PL 1-3 are reserved. If WC=2 then the insn is treated as 
> > noop. */
> > +if (pl > 0 && wc != 2) {
> > +

Re: [RFC patch 0/1] block: vhost-blk backend

2022-07-27 Thread Andrey Zhadchenko

On 7/27/22 16:06, Stefano Garzarella wrote:

On Tue, Jul 26, 2022 at 04:15:48PM +0200, Denis V. Lunev wrote:

On 26.07.2022 15:51, Michael S. Tsirkin wrote:

On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote:

Although QEMU virtio-blk is quite fast, there is still some room for
improvements. Disk latency can be reduced if we handle virito-blk
requests

in host kernel so we avoid a lot of syscalls and context switches.

The biggest disadvantage of this vhost-blk flavor is raw format.
Luckily Kirill Thai proposed device mapper driver for QCOW2 format
to attach
files as block devices:
https://www.spinics.net/lists/kernel/msg4292965.html

That one seems stalled. Do you plan to work on that too?

We have too. The difference in numbers, as you seen below is quite too
much. We have waited for this patch to be sent to keep pushing.

It should be noted that may be talk on OSS this year could also push a
bit.

Cool, the results are similar of what I saw when I compared vhost-blk
and io_uring passthrough with NVMe (Slide 7 here: [1]).

About QEMU block layer support, we recently started to work on libblkio
[2]. Stefan also sent an RFC [3] to implement the QEMU BlockDriver.

Currently it supports virtio-blk devices using vhost-vdpa and vhost-user.
We could add support for vhost (kernel) as well, though, we were
thinking of leveraging vDPA to implement in-kernel software device as well.

That way we could reuse a lot of the code to support both hardware and
software accelerators.

In the talk [1] I describe the idea a little bit, and a few months ago I
did a PoC (unsubmitted RFC) to see if it was feasible and the numbers
were in line with vhost-blk.

Do you think we could join forces and just have an in-kernel vdpa-blk
software device?

This seems worth trying. Why double the efforts to do the same. Yet I
would like to play a bit with your vdpa-blk PoC beforehand. Can you send
it to me with some instructions how to run it?

Of course we could have both vhost-blk and vdpa-blk, but with vDPA
perhaps we can have one software stack to maintain for both HW and
software accelerators.

1 2 3 >

1 - 100 of 226 matches

Mail list logo