Re: [PATCH] vhost-vdpa: fix memory leak in error path

2020-09-09 Thread Jason Wang


On 2020/9/9 下午11:41, Li Qiang wrote:

Free the 'page_list' when the 'npages' is zero.

Signed-off-by: Li Qiang 
---
  drivers/vhost/vdpa.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..6a9fcaf1831d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -609,8 +609,10 @@ static int vhost_vdpa_process_iotlb_update(struct 
vhost_vdpa *v,
gup_flags |= FOLL_WRITE;
  
  	npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT;

-   if (!npages)
-   return -EINVAL;
+   if (!npages) {
+   ret = -EINVAL;
+   goto free_page;
+   }
  
  	mmap_read_lock(dev->mm);
  
@@ -666,6 +668,8 @@ static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v,

atomic64_sub(npages, >mm->pinned_vm);
}
mmap_read_unlock(dev->mm);
+
+free_page:
free_page((unsigned long)page_list);
return ret;
  }



Cc: sta...@vger.kernel.org

Acked-by: Jason Wang 



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] vhost: new vhost_vdpa SET/GET_BACKEND_FEATURES handlers

2020-09-09 Thread Dan Carpenter
Hi Zhu,

url:
https://github.com/0day-ci/linux/commits/Zhu-Lingshan/vhost-new-vhost_vdpa-SET-GET_BACKEND_FEATURES-handlers/20200909-115726
base:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git linux-next
config: parisc-randconfig-m031-20200909 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

New smatch warnings:
drivers/vhost/vdpa.c:356 vhost_vdpa_get_backend_features() warn: maybe return 
-EFAULT instead of the bytes remaining?

# 
https://github.com/0day-ci/linux/commit/f2da8fc35b4ef003de7d559a8c7730fedf9926f8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Zhu-Lingshan/vhost-new-vhost_vdpa-SET-GET_BACKEND_FEATURES-handlers/20200909-115726
git checkout f2da8fc35b4ef003de7d559a8c7730fedf9926f8
vim +356 drivers/vhost/vdpa.c

f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  348  static long 
vhost_vdpa_get_backend_features(void __user *argp)
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  349  {
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  350 u64 features = 
VHOST_VDPA_BACKEND_FEATURES;
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  351 u64 __user *featurep = argp;
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  352 long r;
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  353  
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  354 r = copy_to_user(featurep, 
, sizeof(features));
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  355  
f2da8fc35b4ef0 Zhu Lingshan 2020-09-09 @356 return r;

copy_to_user() returns the number of bytes remaining.  I haven't looked
at how this is called but it should probably return negative error codes
on error:

if (copy_to_user(featurep, , sizeof(features)))
return -EFAULT;

return 0;

f2da8fc35b4ef0 Zhu Lingshan 2020-09-09  357  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active

2020-09-09 Thread Tom Lendacky

On 9/9/20 7:44 AM, Laszlo Ersek wrote:

On 09/09/20 10:27, Ard Biesheuvel wrote:

(adding Laszlo and Brijesh)

On Tue, 8 Sep 2020 at 20:46, Borislav Petkov  wrote:


+ Ard so that he can ack the efi bits.

On Mon, Sep 07, 2020 at 03:16:12PM +0200, Joerg Roedel wrote:

From: Tom Lendacky 

Calling down to EFI runtime services can result in the firmware
performing VMGEXIT calls. The firmware is likely to use the GHCB of
the OS (e.g., for setting EFI variables),


I've had to stare at this for a while.

Because, normally a VMGEXIT is supposed to occur like this:

- guest does something privileged
- resultant non-automatic exit (NAE) injects a #VC exception
- exception handler figures out what that privileged thing was
- exception handler submits request to hypervisor via GHCB contents plus
   VMGEXIT instruction

Point being, the agent that "owns" the exception handler is supposed to
pre-allocate or otherwise provide the GHCB too, for information passing.

So... what is the particular NAE that occurs during the execution of
UEFI runtime services (at OS runtime)?

And assuming it occurs, I'm unsure if the exception handler (IDT) at
that point is owned (temporarily?) by the firmware.

- If the #VC handler comes from the firmware, then I don't know why it
would use the OS's GHCB.

- If the #VC handler comes from the OS, then I don't understand why the
commit message says "firmware performing VMGEXIT", given that in this
case it would be the OS's #VC handler executing VMGEXIT.

So, I think the above commit message implies a VMGEXIT *without* a NAE /
#VC context. (Because, I fail to interpret the commit message in a NAE /
#VC context in any way; see above.)


Correct.



OK, so let's see where the firmware performs a VMGEXIT *outside* of an
exception handler, *while* at OS runtime. There seems to be one, in file
"OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c":


Again, correct. Basically this is what is invoked when setting UEFI variables.




VOID
QemuFlashPtrWrite (
   INvolatile UINT8*Ptr,
   INUINT8 Value
   )
{
   if (MemEncryptSevEsIsEnabled ()) {
 MSR_SEV_ES_GHCB_REGISTER  Msr;
 GHCB  *Ghcb;

 Msr.GhcbPhysicalAddress = AsmReadMsr64 (MSR_SEV_ES_GHCB);
 Ghcb = Msr.Ghcb;

 //
 // Writing to flash is emulated by the hypervisor through the use of write
 // protection. This won't work for an SEV-ES guest because the write won't
 // be recognized as a true MMIO write, which would result in the required
 // #VC exception. Instead, use the the VMGEXIT MMIO write support directly
 // to perform the update.
 //
 VmgInit (Ghcb);
 Ghcb->SharedBuffer[0] = Value;
 Ghcb->SaveArea.SwScratch = (UINT64) (UINTN) Ghcb->SharedBuffer;
 VmgExit (Ghcb, SVM_EXIT_MMIO_WRITE, (UINT64) (UINTN) Ptr, 1);
 VmgDone (Ghcb);
   } else {
 *Ptr = Value;
   }
}


This function *does* run at OS runtime (as a part of non-volatile UEFI
variable writes).

And note that, wherever MSR_SEV_ES_GHCB points to at the moment, is used
as GHCB.

If the guest kernel allocates its own GHCB and writes the allocation
address to MSR_SEV_ES_GHCB, then indeed the firmware will use the GHCB
of the OS.

I reviewed edk2 commit 437eb3f7a8db
("OvmfPkg/QemuFlashFvbServicesRuntimeDxe: Bypass flash detection with
SEV-ES", 2020-08-17), but I admit I never thought of the guest OS
changing MSR_SEV_ES_GHCB. I'm sorry about that.

As long as this driver is running before OS runtime (i.e., during the
DXE and BDS phases), MSR_SEV_ES_GHCB is supposed to carry the value we
set in "OvmfPkg/PlatformPei/AmdSev.c":


STATIC
VOID
AmdSevEsInitialize (
   VOID
   )
{
   VOID  *GhcbBase;
   PHYSICAL_ADDRESS  GhcbBasePa;
   UINTN GhcbPageCount, PageCount;
   RETURN_STATUS PcdStatus, DecryptStatus;
   IA32_DESCRIPTOR   Gdtr;
   VOID  *Gdt;

   if (!MemEncryptSevEsIsEnabled ()) {
 return;
   }

   PcdStatus = PcdSetBoolS (PcdSevEsIsEnabled, TRUE);
   ASSERT_RETURN_ERROR (PcdStatus);

   //
   // Allocate GHCB and per-CPU variable pages.
   //   Since the pages must survive across the UEFI to OS transition
   //   make them reserved.
   //
   GhcbPageCount = mMaxCpuCount * 2;
   GhcbBase = AllocateReservedPages (GhcbPageCount);
   ASSERT (GhcbBase != NULL);

   GhcbBasePa = (PHYSICAL_ADDRESS)(UINTN) GhcbBase;

   //
   // Each vCPU gets two consecutive pages, the first is the GHCB and the
   // second is the per-CPU variable page. Loop through the allocation and
   // only clear the encryption mask for the GHCB pages.
   //
   for (PageCount = 0; PageCount < GhcbPageCount; PageCount += 2) {
 DecryptStatus = MemEncryptSevClearPageEncMask (
   0,
   GhcbBasePa + EFI_PAGES_TO_SIZE (PageCount),
   1,
   TRUE
   );
 ASSERT_RETURN_ERROR (DecryptStatus);
   }

   ZeroMem (GhcbBase, EFI_PAGES_TO_SIZE (GhcbPageCount));

   PcdStatus = PcdSet64S (PcdGhcbBase, GhcbBasePa);
   ASSERT_RETURN_ERROR 

Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active

2020-09-09 Thread Laszlo Ersek
On 09/09/20 14:44, Laszlo Ersek wrote:

> To summarize: for QemuFlashFvbServicesRuntimeDxe to allocate UEFI
> Runtime Services Data type memory, for its own runtime GHCB, two
> permissions are necessary (together), at OS runtime:
> 
> - QemuFlashFvbServicesRuntimeDxe must be allowed to swap MSR_SEV_ES_GHCB
>   temporarily (before executing VMGEXIT),
> 
> - QemuFlashFvbServicesRuntimeDxe must be allowed to change the OS-owned
>   PTE temporarily (for remapping the GHCB as plaintext, before writing
>   to it).

Condition#2 gets worse:

If the firmware-allocated runtime GHCB happens to be virt-mapped by the
OS using a 2MB page (covering other UEFI runtime data areas, perhaps?),
then simply flipping the encryption bit in
QemuFlashFvbServicesRuntimeDxe would mark more runtime memory than just
the GHCB as "plaintext". (2MB-4KB specifically.)

This could result in:
- firmware read accesses outside of the GHCB returning garbage
- firmware write accesses ouside of the GHCB leaking information to the
hypervisor, and reading back later as garbage

In order to prevent those symptoms, the firmware would have to split the
2MB page to 4KB pages, and decrypt just the one (GHCB) page.

But page splitting requires additional memory (for the finer granularity
page table), and fw memory cannot be allocated at OS runtime. So that
extra memory too would have to be pre-allocated by the firmware. Nasty.

I'd recommend sticking with this kernel patch.

Thanks,
Laszlo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 71/72] x86/efi: Add GHCB mappings when SEV-ES is active

2020-09-09 Thread Laszlo Ersek
On 09/09/20 10:27, Ard Biesheuvel wrote:
> (adding Laszlo and Brijesh)
>
> On Tue, 8 Sep 2020 at 20:46, Borislav Petkov  wrote:
>>
>> + Ard so that he can ack the efi bits.
>>
>> On Mon, Sep 07, 2020 at 03:16:12PM +0200, Joerg Roedel wrote:
>>> From: Tom Lendacky 
>>>
>>> Calling down to EFI runtime services can result in the firmware
>>> performing VMGEXIT calls. The firmware is likely to use the GHCB of
>>> the OS (e.g., for setting EFI variables),

I've had to stare at this for a while.

Because, normally a VMGEXIT is supposed to occur like this:

- guest does something privileged
- resultant non-automatic exit (NAE) injects a #VC exception
- exception handler figures out what that privileged thing was
- exception handler submits request to hypervisor via GHCB contents plus
  VMGEXIT instruction

Point being, the agent that "owns" the exception handler is supposed to
pre-allocate or otherwise provide the GHCB too, for information passing.

So... what is the particular NAE that occurs during the execution of
UEFI runtime services (at OS runtime)?

And assuming it occurs, I'm unsure if the exception handler (IDT) at
that point is owned (temporarily?) by the firmware.

- If the #VC handler comes from the firmware, then I don't know why it
would use the OS's GHCB.

- If the #VC handler comes from the OS, then I don't understand why the
commit message says "firmware performing VMGEXIT", given that in this
case it would be the OS's #VC handler executing VMGEXIT.

So, I think the above commit message implies a VMGEXIT *without* a NAE /
#VC context. (Because, I fail to interpret the commit message in a NAE /
#VC context in any way; see above.)

OK, so let's see where the firmware performs a VMGEXIT *outside* of an
exception handler, *while* at OS runtime. There seems to be one, in file
"OvmfPkg/QemuFlashFvbServicesRuntimeDxe/QemuFlashDxe.c":

> VOID
> QemuFlashPtrWrite (
>   INvolatile UINT8*Ptr,
>   INUINT8 Value
>   )
> {
>   if (MemEncryptSevEsIsEnabled ()) {
> MSR_SEV_ES_GHCB_REGISTER  Msr;
> GHCB  *Ghcb;
>
> Msr.GhcbPhysicalAddress = AsmReadMsr64 (MSR_SEV_ES_GHCB);
> Ghcb = Msr.Ghcb;
>
> //
> // Writing to flash is emulated by the hypervisor through the use of write
> // protection. This won't work for an SEV-ES guest because the write won't
> // be recognized as a true MMIO write, which would result in the required
> // #VC exception. Instead, use the the VMGEXIT MMIO write support directly
> // to perform the update.
> //
> VmgInit (Ghcb);
> Ghcb->SharedBuffer[0] = Value;
> Ghcb->SaveArea.SwScratch = (UINT64) (UINTN) Ghcb->SharedBuffer;
> VmgExit (Ghcb, SVM_EXIT_MMIO_WRITE, (UINT64) (UINTN) Ptr, 1);
> VmgDone (Ghcb);
>   } else {
> *Ptr = Value;
>   }
> }

This function *does* run at OS runtime (as a part of non-volatile UEFI
variable writes).

And note that, wherever MSR_SEV_ES_GHCB points to at the moment, is used
as GHCB.

If the guest kernel allocates its own GHCB and writes the allocation
address to MSR_SEV_ES_GHCB, then indeed the firmware will use the GHCB
of the OS.

I reviewed edk2 commit 437eb3f7a8db
("OvmfPkg/QemuFlashFvbServicesRuntimeDxe: Bypass flash detection with
SEV-ES", 2020-08-17), but I admit I never thought of the guest OS
changing MSR_SEV_ES_GHCB. I'm sorry about that.

As long as this driver is running before OS runtime (i.e., during the
DXE and BDS phases), MSR_SEV_ES_GHCB is supposed to carry the value we
set in "OvmfPkg/PlatformPei/AmdSev.c":

> STATIC
> VOID
> AmdSevEsInitialize (
>   VOID
>   )
> {
>   VOID  *GhcbBase;
>   PHYSICAL_ADDRESS  GhcbBasePa;
>   UINTN GhcbPageCount, PageCount;
>   RETURN_STATUS PcdStatus, DecryptStatus;
>   IA32_DESCRIPTOR   Gdtr;
>   VOID  *Gdt;
>
>   if (!MemEncryptSevEsIsEnabled ()) {
> return;
>   }
>
>   PcdStatus = PcdSetBoolS (PcdSevEsIsEnabled, TRUE);
>   ASSERT_RETURN_ERROR (PcdStatus);
>
>   //
>   // Allocate GHCB and per-CPU variable pages.
>   //   Since the pages must survive across the UEFI to OS transition
>   //   make them reserved.
>   //
>   GhcbPageCount = mMaxCpuCount * 2;
>   GhcbBase = AllocateReservedPages (GhcbPageCount);
>   ASSERT (GhcbBase != NULL);
>
>   GhcbBasePa = (PHYSICAL_ADDRESS)(UINTN) GhcbBase;
>
>   //
>   // Each vCPU gets two consecutive pages, the first is the GHCB and the
>   // second is the per-CPU variable page. Loop through the allocation and
>   // only clear the encryption mask for the GHCB pages.
>   //
>   for (PageCount = 0; PageCount < GhcbPageCount; PageCount += 2) {
> DecryptStatus = MemEncryptSevClearPageEncMask (
>   0,
>   GhcbBasePa + EFI_PAGES_TO_SIZE (PageCount),
>   1,
>   TRUE
>   );
> ASSERT_RETURN_ERROR (DecryptStatus);
>   }
>
>   ZeroMem (GhcbBase, EFI_PAGES_TO_SIZE (GhcbPageCount));
>
>   PcdStatus = PcdSet64S (PcdGhcbBase, GhcbBasePa);
>   ASSERT_RETURN_ERROR (PcdStatus);
>   PcdStatus = 

Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-09 Thread David Hildenbrand
On 09.09.20 13:37, David Hildenbrand wrote:
> On 09.09.20 13:24, Michael Ellerman wrote:
>> David Hildenbrand  writes:
>>> On 09.09.20 09:17, Greg Kroah-Hartman wrote:
 On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote:
> We soon want to pass flags, e.g., to mark added System RAM resources.
> mergeable. Prepare for that.

 What are these random "flags", and how do we know what should be passed
 to them?

 Why not make this an enumerated type so that we know it all works
 properly, like the GPF_* flags are?  Passing around a random unsigned
 long feels very odd/broken...
>>>
>>> Agreed, an enum (mhp_flags) seems to give a better hint what can
>>> actually be passed. Thanks!
>>
>> You probably know this but ...
>>
>> Just using a C enum doesn't get you any type safety.
>>
>> You can get some checking via sparse by using __bitwise, which is what
>> gfp_t does. You don't actually have to use an enum for that, it works
>> with #defines also.
> 
> Yeah, we seem to be using different approaches. And there is always a
> way to mess things up :)
> 
> gfp_t is one (extreme) example, enum memblock_flags is another example.
> I tend to prefer an enum in this particular case, because it's simple
> and at least tells the user which values are expected.
> 

Gave it another try, looks like mhp_t (like gfp_t) is actually nicer.

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-09 Thread David Hildenbrand
On 09.09.20 13:24, Michael Ellerman wrote:
> David Hildenbrand  writes:
>> On 09.09.20 09:17, Greg Kroah-Hartman wrote:
>>> On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote:
 We soon want to pass flags, e.g., to mark added System RAM resources.
 mergeable. Prepare for that.
>>>
>>> What are these random "flags", and how do we know what should be passed
>>> to them?
>>>
>>> Why not make this an enumerated type so that we know it all works
>>> properly, like the GPF_* flags are?  Passing around a random unsigned
>>> long feels very odd/broken...
>>
>> Agreed, an enum (mhp_flags) seems to give a better hint what can
>> actually be passed. Thanks!
> 
> You probably know this but ...
> 
> Just using a C enum doesn't get you any type safety.
> 
> You can get some checking via sparse by using __bitwise, which is what
> gfp_t does. You don't actually have to use an enum for that, it works
> with #defines also.

Yeah, we seem to be using different approaches. And there is always a
way to mess things up :)

gfp_t is one (extreme) example, enum memblock_flags is another example.
I tend to prefer an enum in this particular case, because it's simple
and at least tells the user which values are expected.

Thoughts?

> 
> Or you can wrap the flag in a struct, the way atomic_t does, and then
> the compiler will prevent passing plain integers in place of your custom
> type.



-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-09 Thread Michael Ellerman
David Hildenbrand  writes:
> On 09.09.20 09:17, Greg Kroah-Hartman wrote:
>> On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote:
>>> We soon want to pass flags, e.g., to mark added System RAM resources.
>>> mergeable. Prepare for that.
>> 
>> What are these random "flags", and how do we know what should be passed
>> to them?
>> 
>> Why not make this an enumerated type so that we know it all works
>> properly, like the GPF_* flags are?  Passing around a random unsigned
>> long feels very odd/broken...
>
> Agreed, an enum (mhp_flags) seems to give a better hint what can
> actually be passed. Thanks!

You probably know this but ...

Just using a C enum doesn't get you any type safety.

You can get some checking via sparse by using __bitwise, which is what
gfp_t does. You don't actually have to use an enum for that, it works
with #defines also.

Or you can wrap the flag in a struct, the way atomic_t does, and then
the compiler will prevent passing plain integers in place of your custom
type.

cheers
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call

2020-09-09 Thread Jason Wang


On 2020/9/9 下午2:52, Zhu Lingshan wrote:

This commit removed unnecessary spin_locks in vhost_vring_call
and related operations. Because we manipulate irq offloading
contents in vhost_vdpa ioctl code path which is already
protected by dev mutex and vq mutex.

Signed-off-by: Zhu Lingshan 



Acked-by: Jason Wang 



---
  drivers/vhost/vdpa.c  | 8 +---
  drivers/vhost/vhost.c | 3 ---
  drivers/vhost/vhost.h | 1 -
  3 files changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 3fab94f88894..bc679d0b7b87 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -97,26 +97,20 @@ static void vhost_vdpa_setup_vq_irq(struct vhost_vdpa *v, 
u16 qid)
return;
  
  	irq = ops->get_vq_irq(vdpa, qid);

-   spin_lock(>call_ctx.ctx_lock);
irq_bypass_unregister_producer(>call_ctx.producer);
-   if (!vq->call_ctx.ctx || irq < 0) {
-   spin_unlock(>call_ctx.ctx_lock);
+   if (!vq->call_ctx.ctx || irq < 0)
return;
-   }
  
  	vq->call_ctx.producer.token = vq->call_ctx.ctx;

vq->call_ctx.producer.irq = irq;
ret = irq_bypass_register_producer(>call_ctx.producer);
-   spin_unlock(>call_ctx.ctx_lock);
  }
  
  static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)

  {
struct vhost_virtqueue *vq = >vqs[qid];
  
-	spin_lock(>call_ctx.ctx_lock);

irq_bypass_unregister_producer(>call_ctx.producer);
-   spin_unlock(>call_ctx.ctx_lock);
  }
  
  static void vhost_vdpa_reset(struct vhost_vdpa *v)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index b45519ca66a7..99f27ce982da 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -302,7 +302,6 @@ static void vhost_vring_call_reset(struct vhost_vring_call 
*call_ctx)
  {
call_ctx->ctx = NULL;
memset(_ctx->producer, 0x0, sizeof(struct irq_bypass_producer));
-   spin_lock_init(_ctx->ctx_lock);
  }
  
  static void vhost_vq_reset(struct vhost_dev *dev,

@@ -1637,9 +1636,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int 
ioctl, void __user *arg
break;
}
  
-		spin_lock(>call_ctx.ctx_lock);

swap(ctx, vq->call_ctx.ctx);
-   spin_unlock(>call_ctx.ctx_lock);
break;
case VHOST_SET_VRING_ERR:
if (copy_from_user(, argp, sizeof f)) {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 9032d3c2a9f4..486dcf371e06 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -64,7 +64,6 @@ enum vhost_uaddr_type {
  struct vhost_vring_call {
struct eventfd_ctx *ctx;
struct irq_bypass_producer producer;
-   spinlock_t ctx_lock;
  };
  
  /* The virtqueue structure describes a queue attached to a device. */


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-09 Thread Jason Wang


On 2020/9/3 下午1:34, Jie Deng wrote:

--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -6,6 +6,9 @@
  # ACPI drivers
  obj-$(CONFIG_I2C_SCMI)+= i2c-scmi.o
  
+# VIRTIO I2C host controller driver

+obj-$(CONFIG_I2C_VIRTIO)   += i2c-virtio.o
+
  # PC SMBus host controller drivers
  obj-$(CONFIG_I2C_ALI1535) += i2c-ali1535.o
  obj-$(CONFIG_I2C_ALI1563) += i2c-ali1563.o
diff --git a/drivers/i2c/busses/i2c-virtio.c b/drivers/i2c/busses/i2c-virtio.c
new file mode 100644
index 000..47f9fd1
--- /dev/null
+++ b/drivers/i2c/busses/i2c-virtio.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Virtio I2C Bus Driver
+ *
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define VIRTIO_I2C_MSG_OK  0
+#define VIRTIO_I2C_MSG_ERR 1
+
+/**
+ * struct virtio_i2c_hdr - the virtio I2C message header structure
+ * @addr: i2c_msg addr, the slave address
+ * @flags: i2c_msg flags
+ * @len: i2c_msg len
+ */
+struct virtio_i2c_hdr {
+   __virtio16 addr;
+   __virtio16 flags;
+   __virtio16 len;
+} __packed;



Btw, this part should belong to uAPI, and you need to define the status 
in uAPI.


Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] i2c: virtio: add a virtio i2c frontend driver

2020-09-09 Thread Jason Wang


On 2020/9/8 上午9:40, Jie Deng wrote:



On 2020/9/7 13:40, Jason Wang wrote:









+struct virtio_i2c_msg {
+    struct virtio_i2c_hdr hdr;
+    char *buf;
+    u8 status;



Any reason for separating status out of virtio_i2c_hdr?

The status is not from i2c_msg. 



You meant ic2_hdr? You embed status in virtio_i2c_msg anyway.



The "i2c_msg" structure defined in i2c.h.


So I put it out of virtio_i2c_hdr.



Something like status or response is pretty common in virtio request 
(e.g net or scsi), if no special reason, it's better to keep it in 
the hdr.



Mainly based on IN or OUT.

The addr, flags and len are from "i2c_msg". They are put in one 
structure as an OUT**scatterlist.

The buf can be an OUT**or an IN scatterlist depending on write or read.
The status is a result from the backend  which is defined as an IN 
scatterlis. 



Ok. I get this.

Thanks


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 00/22] Enhance VHOST to enable SoC-to-SoC communication

2020-09-09 Thread Jason Wang


On 2020/9/9 上午12:37, Cornelia Huck wrote:

Then you need something that is functional equivalent to virtio PCI
which is actually the concept of vDPA (e.g vDPA provides alternatives if
the queue_sel is hard in the EP implementation).

It seems I really need to read up on vDPA more... do you have a pointer
for diving into this alternatives aspect?



See vpda_config_ops in include/linux/vdpa.h

Especially this part:

    int (*set_vq_address)(struct vdpa_device *vdev,
              u16 idx, u64 desc_area, u64 driver_area,
              u64 device_area);

This means for the devices (e.g endpoint device) that is hard to 
implement virtio-pci layout, it can use any other register layout or 
vendor specific way to configure the virtqueue.






"Virtio Over NTB" should anyways be a new transport.

Does that make any sense?

yeah, in the approach I used the initial features are hard-coded in
vhost-rpmsg (inherent to the rpmsg) but when we have to use adapter
layer (vhost only for accessing virtio ring and use virtio drivers on
both front end and backend), based on the functionality (e.g, rpmsg),
the vhost should be configured with features (to be presented to the
virtio) and that's why additional layer or APIs will be required.

A question here, if we go with vhost bus approach, does it mean the
virtio device can only be implemented in EP's userspace?

Can we maybe implement an alternative bus as well that would allow us
to support different virtio device implementations (in addition to the
vhost bus + userspace combination)?



That should be fine, but I'm not quite sure that implementing the device 
in kerne (kthread) is the good approach.


Thanks






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-09 Thread David Hildenbrand
On 09.09.20 09:16, Greg Kroah-Hartman wrote:
> On Tue, Sep 08, 2020 at 10:10:07PM +0200, David Hildenbrand wrote:
>> IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
>> always set to 0 by hardware. This is far from beautiful (and confusing),
>> and the bit only applies to SYSRAM. So let's move it out of the
>> bus-specific (PnP) defined bits.
>>
>> We'll add another SYSRAM specific bit soon. If we ever need more bits for
>> other purposes, we can steal some from "desc", or reshuffle/regroup what we
>> have.
>>
>> Cc: Andrew Morton 
>> Cc: Michal Hocko 
>> Cc: Dan Williams 
>> Cc: Jason Gunthorpe 
>> Cc: Kees Cook 
>> Cc: Ard Biesheuvel 
>> Cc: Pankaj Gupta 
>> Cc: Baoquan He 
>> Cc: Wei Yang 
>> Cc: Eric Biederman 
>> Cc: Thomas Gleixner 
>> Cc: Greg Kroah-Hartman 
>> Cc: ke...@lists.infradead.org
>> Signed-off-by: David Hildenbrand 
>> ---
>>  include/linux/ioport.h | 4 +++-
>>  kernel/kexec_file.c| 2 +-
>>  mm/memory_hotplug.c| 4 ++--
>>  3 files changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
>> index 52a91f5fa1a36..d7620d7c941a0 100644
>> --- a/include/linux/ioport.h
>> +++ b/include/linux/ioport.h
>> @@ -58,6 +58,9 @@ struct resource {
>>  #define IORESOURCE_EXT_TYPE_BITS 0x0100 /* Resource extended types */
>>  #define IORESOURCE_SYSRAM   0x0100  /* System RAM (modifier) */
>>  
>> +/* IORESOURCE_SYSRAM specific bits. */
>> +#define IORESOURCE_SYSRAM_DRIVER_MANAGED0x0200 /* Always detected 
>> via a driver. */
>> +
> 
> Can't you use BIT() here?

I could, but this will make it look different to all other IORESOURCE_*
definitions?

If so, we should change all existing definitions - however the ones
spanning multiple bits might turn out rather ugly, like

#define IORESOURCE_TYPE_BITS0x1f00

Thoughts? Thanks

> 
> thanks,
> 
> greg k-h
> 


-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-09 Thread David Hildenbrand
On 09.09.20 09:17, Greg Kroah-Hartman wrote:
> On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote:
>> We soon want to pass flags, e.g., to mark added System RAM resources.
>> mergeable. Prepare for that.
> 
> What are these random "flags", and how do we know what should be passed
> to them?
> 
> Why not make this an enumerated type so that we know it all works
> properly, like the GPF_* flags are?  Passing around a random unsigned
> long feels very odd/broken...

Agreed, an enum (mhp_flags) seems to give a better hint what can
actually be passed. Thanks!

-- 
Thanks,

David / dhildenb

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 3/7] mm/memory_hotplug: prepare passing flags to add_memory() and friends

2020-09-09 Thread Greg Kroah-Hartman
On Tue, Sep 08, 2020 at 10:10:08PM +0200, David Hildenbrand wrote:
> We soon want to pass flags, e.g., to mark added System RAM resources.
> mergeable. Prepare for that.

What are these random "flags", and how do we know what should be passed
to them?

Why not make this an enumerated type so that we know it all works
properly, like the GPF_* flags are?  Passing around a random unsigned
long feels very odd/broken...

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/7] kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED

2020-09-09 Thread Greg Kroah-Hartman
On Tue, Sep 08, 2020 at 10:10:07PM +0200, David Hildenbrand wrote:
> IORESOURCE_MEM_DRIVER_MANAGED currently uses an unused PnP bit, which is
> always set to 0 by hardware. This is far from beautiful (and confusing),
> and the bit only applies to SYSRAM. So let's move it out of the
> bus-specific (PnP) defined bits.
> 
> We'll add another SYSRAM specific bit soon. If we ever need more bits for
> other purposes, we can steal some from "desc", or reshuffle/regroup what we
> have.
> 
> Cc: Andrew Morton 
> Cc: Michal Hocko 
> Cc: Dan Williams 
> Cc: Jason Gunthorpe 
> Cc: Kees Cook 
> Cc: Ard Biesheuvel 
> Cc: Pankaj Gupta 
> Cc: Baoquan He 
> Cc: Wei Yang 
> Cc: Eric Biederman 
> Cc: Thomas Gleixner 
> Cc: Greg Kroah-Hartman 
> Cc: ke...@lists.infradead.org
> Signed-off-by: David Hildenbrand 
> ---
>  include/linux/ioport.h | 4 +++-
>  kernel/kexec_file.c| 2 +-
>  mm/memory_hotplug.c| 4 ++--
>  3 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 52a91f5fa1a36..d7620d7c941a0 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -58,6 +58,9 @@ struct resource {
>  #define IORESOURCE_EXT_TYPE_BITS 0x0100  /* Resource extended types */
>  #define IORESOURCE_SYSRAM0x0100  /* System RAM (modifier) */
>  
> +/* IORESOURCE_SYSRAM specific bits. */
> +#define IORESOURCE_SYSRAM_DRIVER_MANAGED 0x0200 /* Always detected 
> via a driver. */
> +

Can't you use BIT() here?

thanks,

greg k-h
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization