from:"Michael Kelley \(EOSG\)"

RE: [PATCH V3 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

2018-09-26 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Wednesday, September 26, 2018 8:50 PM
> 
> Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb
> with specified ranges. This patch is to add the hypercall support.
> 
> Signed-off-by: Lan Tianyu 
>

Looks good!

Reviewed-by:  Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH -next] x86/hyper-v: Remove unused including

2018-09-23 Thread Michael Kelley (EOSG)

From: YueHaibing   Sent: Sunday, September 23, 2018 1:20 
AM

> Remove including  that don't need it.
> 
> Signed-off-by: YueHaibing 
> ---
>  arch/x86/hyperv/hv_apic.c | 1 -
>  1 file changed, 1 deletion(-)
> 
Reviewed-by:  Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

2018-09-20 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Thursday, September 20, 2018 7:30 AM
> On 9/20/2018 12:08 AM, Michael Kelley (EOSG) wrote:
> > From: Tianyu Lan  Sent: Monday, September 17, 2018 8:19 PM
> >> +
> >> +  if (ret && kvm_available_flush_tlb_with_range()) {
> >> +  kvm_flush_remote_tlbs_with_address(kvm,
> >> +  gfn_start,
> >> +  gfn_end - gfn_start);
> >
> > Does the above need to be gfn_end - gfn_start + 1?
> 
> The flush range depends on the input parameter frame start and frame end
> of for_each_slot_rmap_range().
> 
>  for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
>   PT_MAX_HUGEPAGE_LEVEL,
>   gfn_start, gfn_end - 1,
>   )
>  ret |= handler(kvm, iterator.rmap, memslot,
> iterator.gfn, iterator.level, data);
> 
> 
> The start is "gfn_start" and the end is "gfn_end - 1". The flush size is
> (gfn_end - 1) - gfn_start + 1 = gfn_end - gfn_start.
> 

Got it.  I agree.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

2018-09-19 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Monday, September 17, 2018 8:19 PM
>
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 

Hopefully asm/kvm_host.h does not need to be #included, given
the new code structure.

> 
>  #include 
> 
> +/*
> + *  MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited
> + *  by the bitwidth of "additional_pages" in union hv_gpa_page_range.
> + */
> +#define MAX_FLUSH_PAGES (2048)
> +
> +/*
> + * All input flush parameters are in single page. The max flush count
> + * is equal with how many entries of union hv_gpa_page_range can be
> + * populated in the input parameter page. MAX_FLUSH_REP_COUNT
> + * = (4096 - 16) / 8. (“Page Size” - "Address Space" - "Flags") /
> + * "GPA Range".
> + */
> +#define MAX_FLUSH_REP_COUNT (510)
> +

I would recommend putting the above two definitions in
hyperv-tlfs.h.  They are directly tied to the data structures defined
by Hyper-V in the TLFS.  Put MAX_FLUSH_PAGES immediately after
the definition for hv_gpa_page_range so that the dependency is
obvious.

For MAX_FLUSH_REP_COUNT, can you do the calculation in
the #define rather than just in the comment?  Alternatively, define
the gpa_list[] array to be of MAX_FLUSH_REP_COUNT size, and then
add a compile time assert that the size of struct
hv_guest_mapping_flush_list is exactly one page in size.   It's just
a good way to use the compiler to help check for mistakes.

Also prefix them both with HV_ since they will be more
globally visible as part of hyperv-tlfs.h.

>  int hyperv_flush_guest_mapping(u64 as)
>  {
>   struct hv_guest_mapping_flush **flush_pcpu;
> @@ -54,3 +71,89 @@ int hyperv_flush_guest_mapping(u64 as)
>   return ret;
>  }
>  EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
> +
> +static int fill_flush_list(union hv_gpa_page_range gpa_list[],
> + int offset, u64 start_gfn, u64 pages)
> +{
> + int gpa_n = offset;
> + u64 cur = start_gfn;
> + u64 additional_pages;
> +
> + do {
> + if (gpa_n >= MAX_FLUSH_REP_COUNT) {
> + pr_warn("Request exceeds HvFlushGuestList max flush 
> count.");
> + return -ENOSPC;

I wonder if the warning is really needed.  When the error is returned up
through the higher levels of code, won't the higher levels just fallback to
the non-enlightened flush code?  So nothing is actually goes wrong; it's just
that a slower code path gets taken.  A comment about such expectations
might be helpful.

> + }
> +
> + if (pages > MAX_FLUSH_PAGES) {
> + additional_pages = MAX_FLUSH_PAGES - 1;
> + pages -= MAX_FLUSH_PAGES;
> + } else {
> + additional_pages = pages - 1;
> + pages = 0;
> + }

The above code is really doing:

additional_pages = min(pages, MAX_FLUSH_PAGES) - 1;
pages -= additional_pages + 1;

And you might want to move the decrement of 'pages' down to the
bottom of the loop where you update the other loop variables.

> +
> + gpa_list[gpa_n].page.additional_pages = additional_pages;
> + gpa_list[gpa_n].page.largepage = false;
> + gpa_list[gpa_n].page.basepfn = cur;
> +
> + cur += additional_pages + 1;
> + gpa_n++;
> + } while (pages > 0);
> +
> + return gpa_n;
> +}
> +
> +int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range)
> +{
> + struct hv_guest_mapping_flush_list **flush_pcpu;
> + struct hv_guest_mapping_flush_list *flush;
> + u64 status = 0;
> + unsigned long flags;
> + int ret = -ENOTSUPP;
> + int gpa_n = 0;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush_list **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + if (!range->flush_list)
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + range->start_gfn, range->pages);
> + else if (range->parse_flush_list_func)
> + gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n,
> + range->flush_list, fill_flush_list);
> + else
> + gpa_n = -1;
> +
> + if (gpa_n < 0) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST,
> +  gpa_n, 0, flush, NULL);
> +
> + local_irq_restore(flags);
> +
> + if (!(status & HV_HYPERCALL_RESULT_MASK))
> + ret = 0;
> + else
> + ret = status;
> +fault:
> + return ret;
> +}
>

RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()

2018-09-19 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Monday, September 17, 2018 8:19 PM
> +
> + if (ret && kvm_available_flush_tlb_with_range()) {
> + kvm_flush_remote_tlbs_with_address(kvm,
> + gfn_start,
> + gfn_end - gfn_start);

Does the above need to be gfn_end - gfn_start + 1?

> + ret = 0;
> + }

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 2/13] KVM/MMU: Add tlb flush with range helper function

2018-09-19 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Monday, September 17, 2018 8:18 PM
>
> +static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm,
> + struct kvm_tlb_range *range)
> +{
> + int ret = -ENOTSUPP;
> +
> + if (range && kvm_x86_ops->tlb_remote_flush_with_range) {
> + /*
> +  * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in
> +  * kvm_make_all_cpus_request.
> +  */
> + long dirty_count = smp_load_acquire(>tlbs_dirty);
> +
> + ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range);
> + cmpxchg(>tlbs_dirty, dirty_count, 0);
> + }

The comment and the code that manipulates kvm->tlbs_dirty appears
to have been copied from kvm_flush_remote_tlbs().  But the above
code doesn't call kvm_make_all_cpus_request().   I haven't traced
all the details, but it seems like the comment should be updated,
or the code isn't needed.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] x86/hyperv: suppress "PCI: Fatal: No config space access function found"

2018-09-19 Thread Michael Kelley (EOSG)

From: Dexuan Cui   Sent: Tuesday, September 18, 2018 3:30 PM
> 
> A Generatin-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and
> users always see the scary warning, which is actually harmless. The patch
> is made to suppress the warning.
> 
> Signed-off-by: Dexuan Cui 
> Cc: K. Y. Srinivasan 
> Cc: Haiyang Zhang 
> Cc: Stephen Hemminger 
> ---
>  arch/x86/hyperv/hv_init.c | 19 +++
>  1 file changed, 19 insertions(+)
> 

Reviewed-by:  Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] Drivers: hv: vmbus: include header for get_irq_regs()

2018-09-15 Thread Michael Kelley (EOSG)

From Sebastian Andrzej Siewior   Sent: Thursday, August 30, 2018 12:55 AM
> 
> On !RT the header file get_irq_regs() gets pulled in via other header files. 
> On
> RT it does not and the build fails:
> 
> drivers/hv/vmbus_drv.c:975 implicit declaration of function 
> ‘get_irq_regs’ [-
> Werror=implicit-function-declaration]
> drivers/hv/hv.c:115 implicit declaration of function ‘get_irq_regs’ 
> [-Werror=implicit-
> function-declaration]
> 
> Add the header file for get_irq_regs() in a common header so it used by
> vmbus_drv.c by hv.c for their get_irq_regs() usage.
> 

get_irq_regs() is not used explicitly in either vmbus_drv.c or in hv.c.  And I
couldn't make the line numbers in the errors above line up with anything
in the source code that might be implicitly using get_irq_regs().  Is it the
calls to add_interrupt_randomness()?   Did you figure out
exactly what line of code is causing the compile error?

I'm wondering whether adding the #include of irq.h into hyperv_vmbus.h
is really the right solution.  More correct might be to have the file where
get_irq_regs() is actually used to #include irq_regs.h.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support

2018-09-11 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Sent: Monday, September 10, 2018 1:39 AM
> +
> +int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range)

I'm really concerned about defining the Hyper-V function to flush
guest mappings in terms of a KVM struct definition.  Your patch puts
this function in arch/x86/hyperv/nested.c.  I haven't investigated all
the details, but on its face this approach seems like it would cause
trouble in the long run, and it doesn't support the case of a
hypervisor other than KVM running at L1.

I know that KVM code has taken a dependency on Hyper-V types and
code, but that's because KVM is emulating a lot of Hyper-V functionality
and it's taking advantage of Hyper-V enlightenments.  Is there a top
level reason I haven't thought of for Hyper-V code to take a
dependency on KVM definitions?  I would think we want Hyper-V
code to be generic, using Hyper-V data structure definitions.  Then in
keeping with what's already been done, KVM code would use those
definitions where it needs to make calls to Hyper-V code.

> +{
> + struct kvm_mmu_page *sp;
> + struct hv_guest_mapping_flush_list **flush_pcpu;
> + struct hv_guest_mapping_flush_list *flush;
> + u64 status = 0;
> + unsigned long flags;
> + int ret = -ENOTSUPP;
> + int gpa_n = 0;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush_list **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + if (!range->flush_list) {
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + range->start_gfn, range->end_gfn);
> + } else {
> + list_for_each_entry(sp, range->flush_list,
> + flush_link) {
> + u64 end_gfn = sp->gfn +
> + KVM_PAGES_PER_HPAGE(sp->role.level) - 1;
> + gpa_n = fill_flush_list(flush->gpa_list, gpa_n,
> + sp->gfn, end_gfn);
> + }

Per the previous comment, if this loop really needs to walk a KVM
data structure, look for a different way to organize things so that
the handling of KVM-specific data structures is in code that’s part
of KVM, rather than in Hyper-V code.

Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v2 2/4] arm64: hyperv: Add support for Hyper-V as a hypervisor

2018-08-31 Thread Michael Kelley (EOSG)

From: KY Srinivasan  Sent: Thursday, August 30, 2018 11:51 AM

> > +   /* Allocate percpu VP index */
> > +   hv_vp_index = kmalloc_array(num_possible_cpus(),
> > sizeof(*hv_vp_index),
> > +   GFP_KERNEL);
> > +   if (!hv_vp_index)
> > +   return 1;
> > +
> We should perhaps set the array so the contents are invalid so we can 
> correctly
> handle enlightenments for TL shootdown and IPI.

Agreed.  Will add the initialization in v3 of the patch.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v2 1/4] arm64: hyperv: Add core Hyper-V include files

2018-08-31 Thread Michael Kelley (EOSG)

From: KY Srinivasan Sent: Thursday, August 30, 2018 11:23 AM

> > +/*
> > + * This file contains definitions from the Hyper-V Hypervisor Top-Level
> > + * Functional Specification (TLFS):
> > + * 
> > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs
> > +
> A lot of TLFS definitions are ISA independent and we are duplicating these
> definitions both for X86_64 and ARM_64.  Perhaps we should look at splitting
> this file into a common and ISA specific header file.

I agree that we want to end up with x86_64 and ARM64 ISA dependent files
that include an ISA independent file.  My thinking was to not make that
separation now, for a couple of reasons:

1) We don't have a Hyper-V TLFS that is explicit about what should be
considered ISA independent and ISA dependent.  I can make some
reasonable guesses, but it will be subject to change as the Hyper-V team
firms up the interface and decides what they want to commit to.

2) Some of the things defined in the TLFS have names that are
x86-specific (TSC, MSR, etc.).  For the ISA independent parts, those names
should be more generic, which is another dependency on the Hyper-V
team defining the ISA independent parts of the TLFS.

My judgment was that we'll end up with less perturbation overall to go
with this cloned version of hyperv-tlfs.h for now, and then come back
and do the separation once we have a definitive TLFS to base it on.  But
it's a judgment call, and if the sense is that we should do the separation
now, I can give it a try.

> > +#define HvRegisterHypervisorVersion0x0100 /*CPUID
> > 0x4002 */
> > +#defineHvRegisterPrivilegesAndFeaturesInfo0x0200 /*CPUID
> > 0x4003 */
> > +#defineHvRegisterFeaturesInfo0x0201
> > /*CPUID 0x4004 */
> > +#defineHvRegisterImplementationLimitsInfo0x0202 /*CPUID
> > 0x4005 */
> > +#define HvARM64RegisterInterfaceVersion0x00090006 /*CPUID
> > 0x4001 */
> 
> Can we avoid the mixed case names.

Agreed. I'll fix this throughout to use all uppercase, with underscore
as the word separator.

> > + * Linux-specific definitions for managing interactions with Microsoft's
> > + * Hyper-V hypervisor. Definitions that are specified in the Hyper-V
> > + * Top Level Functional Spec (TLFS) should not go in this file, but
> > + * should instead go in hyperv-tlfs.h.
> 
> Would it make sense to breakup this header file into ISA independent and 
> dependent files?

Yes, as above I agree the separation make sense.  And since this file is tied
To Linux and not to the Hyper-V TLFS, the separation isn't affected by the
TLFS issues mentioned above.  I'll give it a try and see if any issues arise.

> > +/*
> > + * Define the IRQ numbers/vectors used by Hyper-V VMbus interrupts
> > + * and by STIMER0 Direct Mode interrupts. Hyper-V should be supplying
> > + * these values through ACPI, but there are no other interrupting
> > + * devices in a Hyper-V VM on ARM64, so it's OK to hard code for now.
> > + * The "CALLBACK_VECTOR" terminology is a left-over from the x86/x64
> > + * world that is used in architecture independent Hyper-V code.
> > + */
> When we have direct device assignment for ARM-64 guests, can we still 
> hardcode.

Yes, we can still hardcode.  These values are in the Per-Processor Interrupt
(PPI) range of 16 to 31.  Any IRQ numbers assigned to a Discrete Device
Assignment (DDA) device will be in the Shared Peripheral Interrupt (SPI)
range of 32-1019 or the Locality-specific Peripheral Interrupt (LPI) range
of greater than 8192.   The handling of DDA interrupts is still under
discussion with the Hyper-V team, but there won't be any conflicts with
the PPI values that are hardcoded here.

> > +/*
> > + * The guest OS needs to register the guest ID with the hypervisor.
> > + * The guest ID is a 64 bit entity and the structure of this ID is
> > + * specified in the Hyper-V specification:
> > + *
> > + * msdn.microsoft.com/en-
> > us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
> > + *
> > + * While the current guideline does not specify how Linux guest ID(s)
> > + * need to be generated, our plan is to publish the guidelines for
> > + * Linux and other guest operating systems that currently are hosted
> > + * on Hyper-V. The implementation here conforms to this yet
> > + * unpublished guidelines.
> > + *
> > + *
> > + * Bit(s)
> > + * 63 - Indicates if the OS is Open Source or not; 1 is Open Source
> > + * 62:56 - Os Type; Linux is 0x100
> > + * 55:48 - Distro specific identification
> > + * 47:16 - Linux kernel version number
> > + * 15:0  - Distro specific identification
> > + *
> > + * Generate the guest ID based on the guideline described above.
> > + */
> 
> No need to repeat the above block comment (already included in the TLFS 
> header).

Agreed.  Will make the change in v3 of the patch.

> > +/* Free the message slot and signal end-of-message if required */
> > +static inline void vmbus_signal_eom(struct hv_message *msg, u32
> > old_msg_type)
> > +{
> > +/*
> > +

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Make synic_initialized flag per-cpu

2018-08-28 Thread Michael Kelley (EOSG)

From: Vitaly Kuznetsov   Sent: Wednesday, August 1, 2018 
2:26 AM

> > I was trying to decide if there are any arguments in favor of one
> > approach vs. the other:  a per-cpu flag in memory or checking
> > the synic_control "enable" bit.   Seems like a wash to me, in which
> > case I have a slight preference for the per-cpu flag in memory vs.
> > creating another function to return sctrl.enable.  But I'm completely
> > open to reasons why checking sctrl.enable is better.
> 
> Just a few thoughts: reading MSR is definitely slower but we avoid
> 'shadowing' the state, the reading is always correct. In case there's a
> chance the SynIC will get disabled from host side we can only find this
> out by doing MSR read. This is a purely theoretical possibility, I
> believe, we can go ahead with this patch.

Vitaly -- just to confirm:  you are OK with the patch as is?  (I'll
check, but I may need to rebase on the latest code.)

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/5] vmbus: add driver_override support

2018-08-14 Thread Michael Kelley (EOSG)

From: Stephen Hemminger   Sent: Tuesday, August 14, 
2018 9:35 AM

> On Mon, 13 Aug 2018 19:30:50 +
> "Michael Kelley (EOSG)"  wrote:
> 
> > > +/*
> > > + * Return a matching hv_vmbus_device_id pointer.
> > > + * If there is no match, return NULL.
> > > + */
> > > +static const struct hv_vmbus_device_id *hv_vmbus_get_id(struct hv_driver 
> > > *drv,
> > > + struct hv_device *dev)
> > > +{
> > > + const uuid_le *guid = >dev_type;
> > > + const struct hv_vmbus_device_id *id;
> > >
> > > - return NULL;
> > > + /* When driver_override is set, only bind to the matching driver */
> > > + if (dev->driver_override && strcmp(dev->driver_override, drv->name))
> > > + return NULL;
> >
> > This function needs to be covered by the device lock, so that
> > dev->driver_override can't be set to NULL and the memory freed
> > during the above 'if' statement.  When called from vmbus_probe(),
> > the device lock is held, so it's good. But when called from
> > vmbus_match(), the device lock may not be held: consider the path
> > __driver_attach() -> driver_match_device() -> vmbus_match().
> 
> The function hv_vmbus_get_id is called from that path.
> i.e. __device_attach -> driver-match_device -> vmbus_match.
> and __device_attach always does:
>   device_lock(dev);

Agreed.  The __device_attach() path holds the device lock and all is good.
But the __driver_attach() path does not hold the device lock when the
match function is called, leaving the code open to a potential race.  Same
problem could happen in the pci subsystem, so the issue is more generic
and probably should be evaluated and dealt with separately.

Michael

> 
> The code in driver _override_store uses the same device_lock
> when storing the new value.
> 
> This is same locking as is done in pci-sysfs.c

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/5] vmbus: add driver_override support

2018-08-13 Thread Michael Kelley (EOSG)

From: k...@linuxonhyperv.com   Sent: Friday, August 10, 
2018 4:06 PM

> From: Stephen Hemminger 
> 
> Add support for overriding the default driver for a VMBus device
> in the same way that it can be done for PCI devices. This patch
> adds the /sys/bus/vmbus/devices/.../driver_override file
> and the logic for matching.
> 
> This is used by driverctl tool to do driver override.
> https://gitlab.com/driverctl/driverctl
> 
> Signed-off-by: Stephen Hemminger 
> Signed-off-by: K. Y. Srinivasan 
> ---
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index b1b548a21f91..e6d8fdac6d8b 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -498,6 +498,54 @@ static ssize_t device_show(struct device *dev,
>  }
>  static DEVICE_ATTR_RO(device);
> 
> +static ssize_t driver_override_store(struct device *dev,
> +  struct device_attribute *attr,
> +  const char *buf, size_t count)
> +{
> + struct hv_device *hv_dev = device_to_hv_device(dev);
> + char *driver_override, *old, *cp;
> +
> + /* We need to keep extra room for a newline */
> + if (count >= (PAGE_SIZE - 1))
> + return -EINVAL;

Does 'count' actually have a relationship to PAGE_SIZE, or
is PAGE_SIZE just used as an arbitrary size limit?  I'm
wondering what happens on ARM64 with a 64K page size,
for example.  If it's just arbitrary, coding such a constant
would be better.

> +/*
> + * Return a matching hv_vmbus_device_id pointer.
> + * If there is no match, return NULL.
> + */
> +static const struct hv_vmbus_device_id *hv_vmbus_get_id(struct hv_driver 
> *drv,
> + struct hv_device *dev)
> +{
> + const uuid_le *guid = >dev_type;
> + const struct hv_vmbus_device_id *id;
> 
> - return NULL;
> + /* When driver_override is set, only bind to the matching driver */
> + if (dev->driver_override && strcmp(dev->driver_override, drv->name))
> + return NULL;

This function needs to be covered by the device lock, so that
dev->driver_override can't be set to NULL and the memory freed
during the above 'if' statement.  When called from vmbus_probe(),
the device lock is held, so it's good. But when called from
vmbus_match(), the device lock may not be held: consider the path
__driver_attach() -> driver_match_device() -> vmbus_match().

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/5] Tools: hv: Fix a bug in the key delete code

2018-08-13 Thread Michael Kelley (EOSG)

From: k...@linuxonhyperv.com  Sent: Friday, August 10, 
2018 4:06 PM
> 
> Fix a bug in the key delete code - the num_records range
> from 0 to num_records-1.
> 
> Signed-off-by: K. Y. Srinivasan 
> Reported-by: David Binderman 
> Cc: 
> ---

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Make synic_initialized flag per-cpu

2018-07-31 Thread Michael Kelley (EOSG)

From: Vitaly Kuznetsov  Sent: Tuesday, July 31, 2018 4:20 
AM
> 
> Reviewed-by: Vitaly Kuznetsov 

Thanks for the review

> 
> Alternatively, we can get rid of synic_initialized flag altogether:
> hv_synic_init() never fails in the first place but we can always
> implement something like:
> 
> int hv_synic_is_initialized(void) {
>   union hv_synic_scontrol sctrl;
> 
>   hv_get_synic_state(sctrl.as_uint64);
> 
>   return sctrl.enable;
> }
> 
> as it doesn't seem that we need to check synic state on _other_ CPUs.
> 
> --
>   Vitaly

I was trying to decide if there are any arguments in favor of one
approach vs. the other:  a per-cpu flag in memory or checking
the synic_control "enable" bit.   Seems like a wash to me, in which
case I have a slight preference for the per-cpu flag in memory vs.
creating another function to return sctrl.enable.  But I'm completely
open to reasons why checking sctrl.enable is better.

Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()

2018-07-13 Thread Michael Kelley (EOSG)

From: Dexuan Cui  Sent: Thursday, July 12, 2018 10:53 PM
> 
> Before setting channel->rescind in vmbus_rescind_cleanup(), we should make
> sure the channel callback won't run any more, otherwise a high-level
> driver like pci_hyperv, which may be infinitely waiting for the host VSP's
> response and notices the channel has been rescinded, can't safely give
> up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's
> unsafe to exit from wait_for_response() and proceed with the on-stack
> variable "comp_pkt" popped. The issue was originally spotted by
> Michael Kelley .
> 
> In vmbus_close_internal(), the patch also minimizes the range protected by
> disabling/enabling channel->callback_event: we don't really need that for
> the whole function.
> 
> Signed-off-by: Dexuan Cui 
> Cc: sta...@vger.kernel.org
> Cc: K. Y. Srinivasan 
> Cc: Stephen Hemminger 
> Cc: Michael Kelley 

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/1] Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic

2018-07-11 Thread Michael Kelley (EOSG)

From: Sunil Muthuswamy Sent: Wednesday, July 11, 2018 9:59 AM
> Thanks, Michael. In which branch should I fix these now that the changes have 
> been
> merged with the char-misc-next branch?

If the original code is already in char-misc-next, you should probably submit a
completely new patch for char-misc-next that just makes the fixes to the 
original code.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/1] Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic

2018-07-10 Thread Michael Kelley (EOSG)

>From k...@linuxonhyperv.com   Sent: Saturday, July 7, 
>2018 7:57 PM
> 
> From: Sunil Muthuswamy 
> 
> In the VM mode on Hyper-V, currently, when the kernel panics, an error
> code and few register values are populated in an MSR and the Hypervisor
> notified. This information is collected on the host. The amount of
> information currently collected is found to be limited and not very
> actionable. To gather more actionable data, such as stack trace, the
> proposal is to write one page worth of kmsg data on an allocated page
> and the Hypervisor notified of the page address through the MSR.
> 
> - Sysctl option to control the behavior, with ON by default.
> 
> Cc: K. Y. Srinivasan 
> Cc: Stephen Hemminger 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: K. Y. Srinivasan 
> ---

> + /*
> +  * Write dump contents to the page. No need to synchronize; panic should
> +  * be single-threaded.
> +  */
> + if (!kmsg_dump_get_buffer(dumper, true, hv_panic_page,
> +   PAGE_SIZE, _written)) {
> + pr_err("Hyper-V: Unable to get kmsg data for panic\n");
> + return;

>From what I can see, the return value from kmsg_dump_get_buffer()
is not an indication of success or failure -- it's an indication of whether
there is more data available.   There's no reason to output an error
message.

> @@ -1065,6 +1136,32 @@ static int vmbus_bus_init(void)
>* Only register if the crash MSRs are available
>*/
>   if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) {
> + u64 hyperv_crash_ctl;
> + /*
> +  * Sysctl registration is not fatal, since by default
> +  * reporting is enabled.
> +  */
> + hv_ctl_table_hdr = register_sysctl_table(hv_root_table);
> + if (!hv_ctl_table_hdr)
> + pr_err("Hyper-V: sysctl table register error");
> +
> + /*
> +  * Register for panic kmsg callback only if the right
> +  * capability is supported by the hypervisor.
> +  */
> + rdmsrl(HV_X64_MSR_CRASH_CTL, hyperv_crash_ctl);
> + if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG) {

vmbus_drv.c is architecture independent code, and should not be referencing
x86/x64 MSRs.   Reading the MSR (and maybe the test as well?) should go
in a separate function in an x86-specific source file.

And just to confirm, is this the right way to test for the feature?  Usually,
feature determination is based on one of the feature registers.  The
NOTIFY_MSG flag seems to have a dual meaning -- on read it indicates
the feature is present.  On write in hyperv_report_panic_msg(), it evidently
means that the guest is sending a full page of data to Hyper-V.

> @@ -1081,6 +1178,11 @@ static int vmbus_bus_init(void)
>   bus_unregister(_bus);
> + free_page((unsigned long)hv_panic_page);
> + if (!hv_ctl_table_hdr) {

The above test is backwards.  Remove the bang.

> @@ -1785,10 +1887,18 @@ static void __exit vmbus_exit(void)
> + free_page((unsigned long)hv_panic_page);
> + if (!hv_ctl_table_hdr) {

Same here.  Test is backwards.

Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 1/5] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support

2018-07-10 Thread Michael Kelley (EOSG)

From: Tianyu Lan  Monday, July 9, 2018 2:03 AM
> Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to
> flush nested VM address space mapping in l1 hypervisor and it's to
> reduce overhead of flushing ept tlb among vcpus. This patch is to
> implement it.
> 
> Signed-off-by: Lan Tianyu 
> ---
>  arch/x86/hyperv/Makefile   |  2 +-
>  arch/x86/hyperv/nested.c   | 64 
> ++
>  arch/x86/include/asm/hyperv-tlfs.h |  8 +
>  arch/x86/include/asm/mshyperv.h|  2 ++
>  4 files changed, 75 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/nested.c
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +int hyperv_flush_guest_mapping(u64 as)
> +{
> + struct hv_guest_mapping_flush **flush_pcpu;
> + struct hv_guest_mapping_flush *flush;
> + u64 status;
> + unsigned long flags;
> + int ret = -EFAULT;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> +
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
> +  flush, NULL);

Did you consider using a "fast" hypercall?  Unless there's some reason I'm
not aware of, a "fast" hypercall would be perfect here as there are 16 bytes
of input and no output. Vitaly recently added hv_do_fast_hypercall16()
in the linux-next tree. See __send_ipi_mask() in hv_apic.c in linux-next
for an example of usage.  With a fast hypercall, you don't need the code for
getting the per-cpu input arg or the code for local irq save/restore, so the
code that is left is a lot faster and simpler.

Michael

> + local_irq_restore(flags);
> +
> + if (!(status & HV_HYPERCALL_RESULT_MASK))
> + ret = 0;
> +
> +fault:
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/2] x86/hyper-v: check for VP_INVAL in hyperv_flush_tlb_others()

2018-07-10 Thread Michael Kelley (EOSG)

From: Vitaly Kuznetsov   Monday, July 9, 2018 10:40 AM
> Commit 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI
>  enlightenment") pre-filled hv_vp_index with VP_INVAL so it is now
> (theoretically) possible to observe hv_cpu_number_to_vp_number()
> returning VP_INVAL. We need to check for that in hyperv_flush_tlb_others().
> 
> Not checking for VP_INVAL on the first call site where we do
> 
>  if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64)
> goto do_ex_hypercall;
> 
> is OK, in case we're eligible for non-ex hypercall we'll catch the
> issue later in for_each_cpu() cycle and in case we'll be doing ex-
> hypercall cpumask_to_vpset() will fail.
> 
> It would be nice to change hv_cpu_number_to_vp_number() return
> value's type to 'u32' but this will likely be a bigger change as
> all call sites need to be checked first.
> 
> Fixes: 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI 
> enlightenment")
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  arch/x86/hyperv/mmu.c | 5 +
>  1 file changed, 5 insertions(+)

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/2] x86/hyper-v: check cpumask_to_vpset() return value in hyperv_flush_tlb_others_ex()

2018-07-10 Thread Michael Kelley (EOSG)

From: Vitaly Kuznetsov  Monday, July 9, 2018 10:40 AM
> Commit 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI
> enlightenment") made cpumask_to_vpset() return '-1' when there is a CPU
> with unknown VP index in the supplied set. This needs to be checked before
> we pass 'nr_bank' to hypercall.
> 
> Fixes: 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI 
> enlightenment")
> Signed-off-by: Vitaly Kuznetsov 
> ---
>  arch/x86/hyperv/mmu.c | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/1] X86/Hyper-V:: Fix the circular dependency in IPI enlightenment.

2018-07-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Thomas Gleixner 
> Sent: Friday, July 6, 2018 3:42 AM
> To: KY Srinivasan 
> Cc: Ingo Molnar ; x...@kernel.org; 
> gre...@linuxfoundation.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com; h...@zytor.com; Stephen Hemminger 
> ;
> Michael Kelley (EOSG) ; vkuzn...@redhat.com
> Subject: RE: [PATCH 1/1] X86/Hyper-V:: Fix the circular dependency in IPI 
> enlightenment.
> 
> On Fri, 6 Jul 2018, Thomas Gleixner wrote:
> > On Fri, 6 Jul 2018, KY Srinivasan wrote:
> > > >
> > > > The problem is that the wreckage is in Linus tree and needs to be fixed
> > > > there, i.e. via x86/urgent.
> > > >
> > > > Now we have the new bits queued in x86/hyperv already which collide. So
> > > > we
> > > > need to merge x86/urgent into x86/hyperv after applying the fix and mop 
> > > > up
> > > > the merge wreckage in x86/hyperv.
> > > >
> > > > I'll have a look tomorrow morning unless you beat me to it.
> > >
> > > I can rebase this patch against the latest tip and resend (tomorrow).
> >
> > That doesn't help as we need to push the original fix to Linus ...
> 
> Applied it to x86/urgent and fixed up the '-1' sloppy hack as pointed out
> by Vitaly and merged x86/urgent into x86/hyperv.
> 
> Please check both branches for correctness.

Changes look good to me.  Some pre-existing signed/unsigned type sloppiness is
still there in that hv_cpu_number_to_vp_number() should be 'u32' instead
of 'int', along with related local variables, but that's probably best to fix in
linux-next on top of Vitaly's other changes.  This code will work.

Michael

> 
> Thanks,
> 
>   tglx

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 3/4] x86/hyper-v: use cheaper HVCALL_SEND_IPI hypercall when possible

2018-06-25 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Vitaly Kuznetsov 
> Sent: Friday, June 22, 2018 10:06 AM
> To: x...@kernel.org
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan
> ; Haiyang Zhang ; Stephen 
> Hemminger
> ; Thomas Gleixner ; Ingo Molnar
> ; H. Peter Anvin ; Tianyu Lan
> ; Michael Kelley (EOSG) 
> 
> Subject: [PATCH 3/4] x86/hyper-v: use cheaper HVCALL_SEND_IPI hypercall when 
> possible
> 
> When there is no need to send an IPI to a CPU with VP number > 64
> we can do the job with fast HVCALL_SEND_IPI hypercall.
> 
> Signed-off-by: Vitaly Kuznetsov 

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/4] x86/hyper-v: use 'fast' hypercall for HVCALL_SEND_IPI

2018-06-25 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Vitaly Kuznetsov 
> Sent: Friday, June 22, 2018 10:06 AM
> To: x...@kernel.org
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan
> ; Haiyang Zhang ; Stephen 
> Hemminger
> ; Thomas Gleixner ; Ingo Molnar
> ; H. Peter Anvin ; Tianyu Lan
> ; Michael Kelley (EOSG) 
> 
> Subject: [PATCH 2/4] x86/hyper-v: use 'fast' hypercall for HVCALL_SEND_IPI
> 
> Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi
> hypercall can't be 'fast' (passing parameters through registers) but
> apparently this is not true, Windows always uses 'fast' version. We can
> do the same in Linux too.
> 
> Signed-off-by: Vitaly Kuznetsov 

Reviewed-by: Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] x86/hyper-v: use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible

2018-06-19 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Vitaly Kuznetsov
> Sent: Friday, June 15, 2018 9:30 AM
> To: x...@kernel.org
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan
> ; Haiyang Zhang ; Stephen 
> Hemminger
> ; Thomas Gleixner ; Ingo Molnar
> ; H. Peter Anvin ; Tianyu Lan
> 
> Subject: [PATCH] x86/hyper-v: use cheaper 
> HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE}
> hypercalls when possible
> 
> While working on Hyper-V style PV TLB flush support in KVM I noticed that
> real Windows guests use TLB flush hypercall in a somewhat smarter way: when
> the flush needs to be performed on a subset of first 64 vCPUs or on all
> present vCPUs Windows avoids more expensive hypercalls which support
> sparse CPU sets and uses their 'cheap' counterparts. This means that
> HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED name is actually a misnomer: EX
> hypercalls (which support sparse CPU sets) are "available", not
> "recommended". This makes sense as they are actually harder to parse.
> 
> Nothing stops us from being equally 'smart' in Linux too. Switch to
> doing cheaper hypercalls whenever possible.
> 
> Signed-off-by: Vitaly Kuznetsov 
> ---

This is a good idea.  We should probably do the same with the hypercalls for 
sending
IPIs -- try the simpler version first and move to the more complex _EX version 
only
if necessary.

A complication:  We've recently found a problem with the code for doing IPI
hypercalls, and the bug affects the TLB flush code as well.  As secondary CPUs
are started, there's a window of time where the hv_vp_index entry for a
secondary CPU is uninitialized.  We are seeing IPIs happening in that window, 
and
the IPI hypercall code uses the uninitialized hv_vp_index entry.   Same thing 
could
happen with the TLB flush hypercall code.  I didn't actually see any 
occurrences of
the TLB case in my tracing, but we should fix it anyway in case a TLB flush gets
added at some point in the future.

KY has a patch coming.  In the patch, hv_cpu_number_to_vp_number()
and cpumask_to_vpset() can both return U32_MAX if they encounter an
uninitialized hv_vp_index entry, and the code needs to be able to bail out to
the native functions for that particular IPI or TLB flush operation.  Once the
initialization of secondary CPUs is complete, the uninitialized situation won't
happen again, and the hypercall path will always be used.

We'll need to coordinate on these patches.  Be aware that the IPI flavor of the
bug is currently causing random failures when booting 4.18 RC1 on Hyper-V VMs
with large vCPU counts.

Reviewed-by:  Michael Kelley 
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [RFC Patch 1/3] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support

2018-06-05 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Tianyu Lan
> Sent: Monday, June 4, 2018 2:08 AM
> Cc: Tianyu Lan ; KY Srinivasan 
> ; Haiyang
> Zhang ; Stephen Hemminger ;
> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org;
> pbonz...@redhat.com; rkrc...@redhat.com; de...@linuxdriverproject.org; linux-
> ker...@vger.kernel.org; k...@vger.kernel.org; vkuzn...@redhat.com
> Subject: [RFC Patch 1/3] X86/Hyper-V: Add flush 
> HvFlushGuestPhysicalAddressSpace hypercall
> support
> 
> Hyper-V provides a pv hypercall HvFlushGuestPhysicalAddressSpace to flush
> nested VM address space mapping in l1 hypervisor and it's to reduce overhead
> of flushing ept tlb among vcpus. This patch is to implement it.
> 
> Signed-off-by: Lan Tianyu 
> ---
> diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c
> new file mode 100644
> index ..17f7c288eccc
> --- /dev/null
> +++ b/arch/x86/hyperv/nested.c
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +int hyperv_flush_guest_mapping(u64 as)
> +{
> + struct hv_guest_mapping_flush **flush_pcpu;
> + struct hv_guest_mapping_flush *flush;
> + u64 status = U64_MAX;

Initializing status to U64_MAX doesn't seem necessary.

> + unsigned long flags;
> + int ret = -EFAULT;
> +
> + if (!hv_hypercall_pg)
> + goto fault;
> +
> + local_irq_save(flags);
> +
> + flush_pcpu = (struct hv_guest_mapping_flush **)
> + this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + flush = *flush_pcpu;
> +
> + if (unlikely(!flush)) {
> + local_irq_restore(flags);
> + goto fault;
> + }
> +
> + flush->address_space = as;
> + flush->flags = 0;
> +
> + status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE,
> +  flush, NULL);
> + local_irq_restore(flags);
> +
> + if (!(status & HV_HYPERCALL_RESULT_MASK))
> + ret = 0;
> +
> +fault:
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping);
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index b8c89265baf0..53bbeb08faea 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -309,6 +309,7 @@ struct ms_hyperv_tsc_page {
>  #define HV_X64_MSR_REENLIGHTENMENT_CONTROL   0x4106
> 
>  /* Nested features (CPUID 0x400A) EAX */
> +#define HV_X64_NESTED_GUSET_MAPPING_FLUSHBIT(18)

The #define name is misspelled.  "_GUSET_" should be "_GUEST_".
And the matching usage in patch 3/3 will need to be updated as well.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared

2018-05-31 Thread Michael Kelley (EOSG)

> From: Dexuan Cui
> Sent: Tuesday, May 29, 2018 12:58 PM
> Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has 
> disappeared
> 
> > From: Michael Kelley (EOSG)
> > Sent: Monday, May 28, 2018 17:19
> >
> > While this patch solves the immediate problem of getting hung waiting
> > for a response from Hyper-V that will never come, there's another scenario
> > to look at that I think introduces a race.  Suppose the guest VM issues a
> > vmbus_sendpacket() request in one of the cases covered by this patch,
> > and suppose that Hyper-V queues a response to the request, and then
> > immediately follows with a rescind request.   Processing the response will
> > get queued to a tasklet associated with the channel, while processing the
> > rescind will get queued to a tasklet associated with the top-level vmbus
> > connection.   From what I can see, the code doesn't impose any ordering
> > on processing the two.  If the rescind is processed first, the new
> > wait_for_response() function may wake up, notice the rescind flag, and
> > return an error.  Its caller will return an error, and in doing so pop the
> > completion packet off the stack.   When the response is processed later,
> > it will try to signal completion via a completion packet that no longer
> > exists, and memory corruption will likely occur.
> >
> > Am I missing anything that would prevent this scenario from happening?
> > It is admittedly low probability, and a solution seems non-trivial.  I 
> > haven't
> > looked specifically, but a similar scenario is probably possible with the
> > drivers for other VMbus devices.  We should work on a generic solution.
> >
> > Michael
> 
> Thanks for spotting the race!
> 
> IMO we can disable the per-channel tasklet to exclude the race:
> 
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -565,6 +565,7 @@ static int wait_for_response(struct hv_device *hdev,
>  {
> while (true) {
> if (hdev->channel->rescind) {
> +   tasklet_disable(>channel->callback_event);
> dev_warn_once(>device, "The device is gone.\n");
> return -ENODEV;
> }
> 
> This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can not
> run anymore. What do you think of this?

I've stared at this and the tasklet code over a couple of days now.  Adding the
call to tasklet_disable() solves the immediate problem of preventing 
hv_pci_onchannelcallback() from calling complete() against a completion packet
that has been popped off the stack.  But in doing so, it simply pushes the core
problem further down the road and leaves it unresolved.

tasklet_disable() does not prevent the tasklet from being scheduled.  So if 
there
is a response from Hyper-V to the original message, the tasklet still gets
scheduled.  Because it is disabled, it will sit in the tasklet queue and be 
skipped
over each time the queue is processed.  Later,  when the channel is eventually
deleted in free_channel(), tasklet_kill() is called.  Unfortunately, 
tasklet_kill()
will get stuck in an infinite loop, waiting for the tasklet to run.   There 
aren't
any tasklet interfaces to dequeue an already scheduled tasklet.

Please double-check my reasoning.  To solve this problem, I think the VMbus
driver code will need some additional synchronization between rescind
notifications and a response, which may or may not have been sent, and
which could be processed after the rescind.  I haven't yet thought about
what this synchronization might need to look like.

Michael

> 
> 
> It looks the list of the other vmbus devices that can be hot-removed is:
> 
> the hv_utils devices
> hv_sock devices
> storvsc device
> netvsc device
> 
> As I checked, the first 3 types of devices don't have this "send a request to 
> the
> host and wait for the response forever" pattern. NetVSC should be fixed as it 
> has
> the same pattern.
> 
> -- Dexuan
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared

2018-05-28 Thread Michael Kelley (EOSG)

> 
> Before the guest finishes the device initialization, the device can be
> removed anytime by the host, and after that the host won't respond to
> the guest's request, so the guest should be prepared to handle this
> case.
> 
> Signed-off-by: Dexuan Cui 
> Cc: Stephen Hemminger 
> Cc: K. Y. Srinivasan 
> ---
>  drivers/pci/host/pci-hyperv.c | 46 
> ---
>  1 file changed, 34 insertions(+), 12 deletions(-)
> 

While this patch solves the immediate problem of getting hung waiting
for a response from Hyper-V that will never come, there's another scenario
to look at that I think introduces a race.  Suppose the guest VM issues a
vmbus_sendpacket() request in one of the cases covered by this patch,
and suppose that Hyper-V queues a response to the request, and then
immediately follows with a rescind request.   Processing the response will
get queued to a tasklet associated with the channel, while processing the
rescind will get queued to a tasklet associated with the top-level vmbus
connection.   From what I can see, the code doesn't impose any ordering
on processing the two.  If the rescind is processed first, the new
wait_for_response() function may wake up, notice the rescind flag, and
return an error.  Its caller will return an error, and in doing so pop the
completion packet off the stack.   When the response is processed later,
it will try to signal completion via a completion packet that no longer
exists, and memory corruption will likely occur.

Am I missing anything that would prevent this scenario from happening?
It is admittedly low probability, and a solution seems non-trivial.  I haven't
looked specifically, but a similar scenario is probably possible with the
drivers for other VMbus devices.  We should work on a generic solution.

Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH char-misc 1/2] Drivers: hv: vmbus: Remove x86 MSR refs in arch independent code

2018-05-12 Thread Michael Kelley (EOSG)

> > -#define hv_init_timer(timer, tick) wrmsrl(timer, tick)
> > -#define hv_init_timer_config(config, val) wrmsrl(config, val)
> > +#define hv_init_timer(timer, tick) \
> > +   wrmsrl(HV_X64_MSR_STIMER0_COUNT + (2*timer), tick)
> > +#define hv_init_timer_config(timer, val) \
> > +   wrmsrl(HV_X64_MSR_STIMER0_CONFIG + (2*timer), val)
>
> Why are we stepping in units of 2?
> 
> K. Y

The register addresses for  the *_CONFIG and *_COUNT registers
are interleaved.  From hyperv-tlfs.h:

#define HV_X64_MSR_STIMER0_CONFIG   0x40B0
#define HV_X64_MSR_STIMER0_COUNT0x40B1
#define HV_X64_MSR_STIMER1_CONFIG   0x40B2
#define HV_X64_MSR_STIMER1_COUNT0x40B3
#define HV_X64_MSR_STIMER2_CONFIG   0x40B4
#define HV_X64_MSR_STIMER2_COUNT0x40B5
#define HV_X64_MSR_STIMER3_CONFIG   0x40B6
#define HV_X64_MSR_STIMER3_COUNT0x40B7

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 5/5] X86: Hyper-V: Consolidate the allocation of the hypercall input page

2018-05-09 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Thursday, May 3, 2018 11:08 PM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH V2 5/5] X86: Hyper-V: Consolidate the allocation of the 
> hypercall input page
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Consolidate the allocation of the hypercall input page.
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 4/5] X86: Hyper-V: Consolidate code for converting cpumask to vpset

2018-05-09 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Thursday, May 3, 2018 11:08 PM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH V2 4/5] X86: Hyper-V: Consolidate code for converting cpumask 
> to vpset
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Consolidate code for converting cpumask to vpset.
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 3/5] X86: Hyper-V: Enhanced IPI enlightenment

2018-05-09 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Thursday, May 3, 2018 11:08 PM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH V2 3/5] X86: Hyper-V: Enhanced IPI enlightenment
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Support enhanced IPI enlightenments (to target more than 64 CPUs).
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 2/5] X86: Hyper-V: Enable IPI enlightenments

2018-05-09 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Thursday, May 3, 2018 11:08 PM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH V2 2/5] X86: Hyper-V: Enable IPI enlightenments
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Hyper-V supports hypercalls to implement IPI; use them.
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH V2 1/5] X86: Hyper-V: Enlighten APIC access

2018-05-09 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Thursday, May 3, 2018 11:08 PM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH V2 1/5] X86: Hyper-V: Enlighten APIC access
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Hyper-V supports MSR based APIC access; implement
> the enlightenment.
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 3/5] X86: Hyper-V: Enhanced IPI enlightenment

2018-04-26 Thread Michael Kelley (EOSG)

On Wed, 25 Apr 2018, KY Srinivasan  wrote:
> 
> +struct ipi_arg_ex {
> + u32 vector;
> + u32  reserved;
> + struct hv_vpset vp_set;
> +};

Again, suggest moving to hyperv-tlfs.h.  And the 5.0b version
of the TLFS has:

u32 vector;
u8  targetvtl;
u8  reserved[3];
struct hv_vpset vp_set;


 
> 
> +struct hv_vpset {
> + u64 format;
> + u64 valid_bank_mask;
> + u64 bank_contents[];
> +};

And this as well.

Michael
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/5] X86: Hyper-V: Enlighten APIC access

2018-04-26 Thread Michael Kelley (EOSG)

> -Original Message-
> From: k...@linuxonhyperv.com <k...@linuxonhyperv.com>
> Sent: Wednesday, April 25, 2018 11:13 AM
> To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> t...@linutronix.de; h...@zytor.com; Stephen Hemminger 
> <sthem...@microsoft.com>;
> Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; vkuzn...@redhat.com
> Cc: KY Srinivasan <k...@microsoft.com>
> Subject: [PATCH 1/5] X86: Hyper-V: Enlighten APIC access
> 
> From: "K. Y. Srinivasan" <k...@microsoft.com>
> 
> Hyper-V supports MSR based APIC access; implement
> the enlightenment.
> 
> Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
> ---
>  arch/x86/hyperv/Makefile|  2 +-
>  arch/x86/hyperv/hv_apic.c   | 98 
> +
>  arch/x86/hyperv/hv_init.c   |  5 ++-
>  arch/x86/include/asm/mshyperv.h |  6 ++-
>  4 files changed, 107 insertions(+), 4 deletions(-)
>  create mode 100644 arch/x86/hyperv/hv_apic.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 367a8203cfcf..00ce4df01a09 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1 +1 @@
> -obj-y:= hv_init.o mmu.o
> +obj-y:= hv_init.o mmu.o hv_apic.o
> diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c
> new file mode 100644
> index ..e0a5b36208fc
> --- /dev/null
> +++ b/arch/x86/hyperv/hv_apic.c
> @@ -0,0 +1,98 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Hyper-V specific APIC code.
> + *
> + * Copyright (C) 2018, Microsoft, Inc.
> + *
> + * Author : K. Y. Srinivasan <k...@microsoft.com>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static u64 hv_apic_icr_read(void)
> +{
> + u64 reg_val;
> +
> + rdmsrl(HV_X64_MSR_ICR, reg_val);
> + return reg_val;
> +}
> +
> +static void hv_apic_icr_write(u32 low, u32 id)
> +{
> + u64 reg_val;
> +
> + reg_val = SET_APIC_DEST_FIELD(id);
> + reg_val = reg_val << 32;
> + reg_val |= low;
> +
> + wrmsrl(HV_X64_MSR_ICR, reg_val);
> +}
> +
> +static u32 hv_apic_read(u32 reg)
> +{
> + u32 reg_val, hi;
> +
> + switch (reg) {
> + case APIC_EOI:
> + rdmsr(HV_X64_MSR_EOI, reg_val, hi);
> + return reg_val;
> + case APIC_TASKPRI:
> + rdmsr(HV_X64_MSR_TPR, reg_val, hi);
> + return reg_val;
> +
> + default:
> + return native_apic_mem_read(reg);
> + }
> +}
> +
> +static void hv_apic_write(u32 reg, u32 val)
> +{
> + switch (reg) {
> + case APIC_EOI:
> + wrmsr(HV_X64_MSR_EOI, val, 0);
> + break;
> + case APIC_TASKPRI:
> + wrmsr(HV_X64_MSR_TPR, val, 0);
> + break;
> + default:
> + native_apic_mem_write(reg, val);
> + }
> +}
> +
> +static void hv_apic_eoi_write(u32 reg, u32 val)
> +{
> + wrmsr(HV_X64_MSR_EOI, val, 0);
> +}
> +
> +void __init hv_apic_init(void)
> +{
> + if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) {
> + pr_info("Hyper-V: Using MSR ased APIC access\n");

Typo here.  "ased" should be "based".

> + apic_set_eoi_write(hv_apic_eoi_write);
> + apic->read  = hv_apic_read;
> + apic->write = hv_apic_write;
> + apic->icr_write = hv_apic_icr_write;
> + apic->icr_read  = hv_apic_icr_read;
> + }
> +}
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index cfecc2272f2d..71e50fc2b7ef 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -242,8 +242,9 @@ static int hv_cpu_die(unsigned int cpu)
>   *
>   * 1. Setup the hypercall page.
>   * 2. Register Hyper-V specific clocksource.
> +

RE: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write

2018-04-22 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Long Li
> Sent: Thursday, April 19, 2018 2:54 PM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; James E . J . Bottomley 
> ;
> Martin K . Petersen ; 
> de...@linuxdriverproject.org; linux-
> s...@vger.kernel.org; linux-ker...@vger.kernel.org
> Cc: Long Li 
> Subject: [Patch v2] Storvsc: Select channel based on available percentage of 
> ring buffer to
> write
> 
> From: Long Li 
> 
> This is a best effort for estimating on how busy the ring buffer is for
> that channel, based on available buffer to write in percentage. It is still
> possible that at the time of actual ring buffer write, the space may not be
> available due to other processes may be writing at the time.
> 
> Selecting a channel based on how full it is can reduce the possibility that
> a ring buffer write will fail, and avoid the situation a channel is over
> busy.
> 
> Now it's possible that storvsc can use a smaller ring buffer size
> (e.g. 40k bytes) to take advantage of cache locality.
> 
> Changes.
> v2: Pre-allocate struct cpumask on the heap.
> Struct cpumask is a big structure (1k bytes) when CONFIG_NR_CPUS=8192 (default
> value when CONFIG_MAXSMP=y). Don't use kernel stack for it by pre-allocating
> them using kmalloc when channels are first initialized.
> 
> Signed-off-by: Long Li 
> ---
>  drivers/scsi/storvsc_drv.c | 90 
> --
>  1 file changed, 72 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index a2ec0bc9e9fa..2a9fff94dd1a 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer 
> size
> (bytes)");
> 
>  module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
>  MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to 
> subchannels");
> +
> +static int ring_avail_percent_lowater = 10;
> +module_param(ring_avail_percent_lowater, int, S_IRUGO);
> +MODULE_PARM_DESC(ring_avail_percent_lowater,
> + "Select a channel if available ring size > this in percent");
> +
>  /*
>   * Timeout in seconds for all devices managed by this driver.
>   */
> @@ -468,6 +474,13 @@ struct storvsc_device {
>* Mask of CPUs bound to subchannels.
>*/
>   struct cpumask alloced_cpus;
> + /*
> +  * Pre-allocated struct cpumask for each hardware queue.
> +  * struct cpumask is used by selecting out-going channels. It is a
> +  * big structure, default to 1024k bytes when CONFIG_MAXSMP=y.

I think you mean "1024 bytes" or "1k bytes" in the above comment.

> +  * Pre-allocate it to avoid allocation on the kernel stack.
> +  */
> + struct cpumask *cpumask_chns;
>   /* Used for vsc/vsp channel reset process */
>   struct storvsc_cmd_request init_request;
>   struct storvsc_cmd_request reset_request;
> @@ -872,6 +885,13 @@ static int storvsc_channel_init(struct hv_device 
> *device, bool is_fc)
>   if (stor_device->stor_chns == NULL)
>   return -ENOMEM;
> 
> + stor_device->cpumask_chns = kcalloc(num_possible_cpus(),
> + sizeof(struct cpumask), GFP_KERNEL);

Note that num_possible_cpus() is 240 for a Hyper-V 2016 guest unless 
overridden on the kernel boot line, so this is going to allocate 240 Kbytes for 
each
synthetic SCSI controller.  On an Azure VM, which has two IDE and two SCSI
controllers, this is nearly 1 Mbyte.  It's unfortunate to have to allocate this 
much
memory for a what is essentially a temporary variable.   Further down in these
comments, I've proposed an alternate implementation of the code that avoids
the need for the temporary variable, and hence avoids the need for this
allocation.

> + if (stor_device->cpumask_chns == NULL) {
> + kfree(stor_device->stor_chns);
> + return -ENOMEM;
> + }
> +
>   stor_device->stor_chns[device->channel->target_cpu] = device->channel;
>   cpumask_set_cpu(device->channel->target_cpu,
>   _device->alloced_cpus);
> @@ -1232,6 +1252,7 @@ static int storvsc_dev_remove(struct hv_device *device)
>   vmbus_close(device->channel);
> 
>   kfree(stor_device->stor_chns);
> + kfree(stor_device->cpumask_chns);
>   kfree(stor_device);
>   return 0;
>  }
> @@ -1241,7 +1262,7 @@ static struct vmbus_channel *get_og_chn(struct 
> storvsc_device
> *stor_device,
>  {1G/
>   u16 slot = 0;
>   u16 hash_qnum;
> - struct cpumask alloced_mask;
> + struct cpumask *alloced_mask = _device->cpumask_chns[q_num];
>   int num_channels, tgt_cpu;
> 
>   if (stor_device->num_sc == 0)
> @@ -1257,10

RE: [Resend Patch 2/3] Netvsc: Use the vmbus functiton to calculate ring buffer percentage

2018-04-13 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Long Li
> Sent: Tuesday, March 27, 2018 5:49 PM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; James E . J . Bottomley 
> ;
> Martin K . Petersen ; 
> de...@linuxdriverproject.org; linux-
> s...@vger.kernel.org; linux-ker...@vger.kernel.org; net...@vger.kernel.org
> Cc: Long Li 
> Subject: [Resend Patch 2/3] Netvsc: Use the vmbus functiton to calculate ring 
> buffer
> percentage
> 
> From: Long Li 
> 
> In Vmbus, we have defined a function to calculate available ring buffer
> percentage to write.
> 
> Use that function and remove netvsc's private version.
> 
> Signed-off-by: Long Li 

Reviewed-by:  Michael Kelley 

> ---
>  drivers/net/hyperv/hyperv_net.h |  1 -
>  drivers/net/hyperv/netvsc.c | 17 +++--
>  drivers/net/hyperv/netvsc_drv.c |  3 ---
>  3 files changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
> index cd538d5a7986..a0199ab13d67 100644
> --- a/drivers/net/hyperv/hyperv_net.h
> +++ b/drivers/net/hyperv/hyperv_net.h
> @@ -189,7 +189,6 @@ struct netvsc_device;
>  struct net_device_context;
> 
>  extern u32 netvsc_ring_bytes;
> -extern struct reciprocal_value netvsc_ring_reciprocal;
> 
>  struct netvsc_device *netvsc_device_add(struct hv_device *device,
>   const struct netvsc_device_info *info);
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index 0265d703eb03..8af0069e4d8c 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -31,7 +31,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
> 
>  #include 
> 
> @@ -590,17 +589,6 @@ void netvsc_device_remove(struct hv_device *device)
>  #define RING_AVAIL_PERCENT_HIWATER 20
>  #define RING_AVAIL_PERCENT_LOWATER 10
> 
> -/*
> - * Get the percentage of available bytes to write in the ring.
> - * The return value is in range from 0 to 100.
> - */
> -static u32 hv_ringbuf_avail_percent(const struct hv_ring_buffer_info 
> *ring_info)
> -{
> - u32 avail_write = hv_get_bytes_to_write(ring_info);
> -
> - return reciprocal_divide(avail_write  * 100, netvsc_ring_reciprocal);
> -}
> -
>  static inline void netvsc_free_send_slot(struct netvsc_device *net_device,
>u32 index)
>  {
> @@ -649,7 +637,8 @@ static void netvsc_send_tx_complete(struct netvsc_device
> *net_device,
>   wake_up(_device->wait_drain);
> 
>   if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) &&
> - (hv_ringbuf_avail_percent(>outbound) >
> RING_AVAIL_PERCENT_HIWATER ||
> + (hv_get_avail_to_write_percent(>outbound) >
> +  RING_AVAIL_PERCENT_HIWATER ||
>queue_sends < 1)) {
>   netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx));
>   ndev_ctx->eth_stats.wake_queue++;
> @@ -757,7 +746,7 @@ static inline int netvsc_send_pkt(
>   struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
>   u64 req_id;
>   int ret;
> - u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
> + u32 ring_avail = hv_get_avail_to_write_percent(_channel->outbound);
> 
>   nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
>   if (skb)
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index faea0be18924..b0b1c2fd2b7b 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -35,7 +35,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
> 
>  #include 
>  #include 
> @@ -55,7 +54,6 @@ static unsigned int ring_size __ro_after_init = 128;
>  module_param(ring_size, uint, S_IRUGO);
>  MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)");
>  unsigned int netvsc_ring_bytes __ro_after_init;
> -struct reciprocal_value netvsc_ring_reciprocal __ro_after_init;
> 
>  static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE |
>   NETIF_MSG_LINK | NETIF_MSG_IFUP |
> @@ -2186,7 +2184,6 @@ static int __init netvsc_drv_init(void)
>   ring_size);
>   }
>   netvsc_ring_bytes = ring_size * PAGE_SIZE;
> - netvsc_ring_reciprocal = reciprocal_value(netvsc_ring_bytes);
> 
>   ret = vmbus_driver_register(_drv);
>   if (ret)
> --
> 2.14.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write

2018-04-13 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Long Li
> Sent: Tuesday, March 27, 2018 5:49 PM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; James E . J . Bottomley 
> ;
> Martin K . Petersen ; 
> de...@linuxdriverproject.org; linux-
> s...@vger.kernel.org; linux-ker...@vger.kernel.org; net...@vger.kernel.org
> Cc: Long Li 
> Subject: [Resend Patch 3/3] Storvsc: Select channel based on available 
> percentage of ring
> buffer to write
> 
> From: Long Li 
> 
> This is a best effort for estimating on how busy the ring buffer is for
> that channel, based on available buffer to write in percentage. It is still
> possible that at the time of actual ring buffer write, the space may not be
> available due to other processes may be writing at the time.
> 
> Selecting a channel based on how full it is can reduce the possibility that
> a ring buffer write will fail, and avoid the situation a channel is over
> busy.
> 
> Now it's possible that storvsc can use a smaller ring buffer size
> (e.g. 40k bytes) to take advantage of cache locality.
> 
> Signed-off-by: Long Li 
> ---
>  drivers/scsi/storvsc_drv.c | 62 
> +-
>  1 file changed, 50 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index a2ec0bc9e9fa..b1a87072b3ab 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer 
> size
> (bytes)");
> 
>  module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO);
>  MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to 
> subchannels");
> +
> +static int ring_avail_percent_lowater = 10;

Reserving 10% of each ring buffer by default seems like more than is needed
in the storvsc driver.  That would be about 4Kbytes for the 40K ring buffer
you suggest, and even more for a ring buffer of 128K.  Each outgoing record
is only about 344 bytes (I'd have to check exactly).  With the new channel
selection algorithm below, the only time we use a channel that is already
below the low water mark is when no channel could be found that is above
the low water mark.   There could be a case of two or more threads deciding
that a channel is above the low water mark at the same time and both
choosing it, but that's likely to be rare.  So it seems like we could set the
default low water mark to 5 percent or even 3 percent, which will let more
of the ring buffer be used, and let a channel be assigned according to the
algorithm, rather than falling through to the default because all channels
appear to be "full".

> +module_param(ring_avail_percent_lowater, int, S_IRUGO);
> +MODULE_PARM_DESC(ring_avail_percent_lowater,
> + "Select a channel if available ring size > this in percent");
> +
>  /*
>   * Timeout in seconds for all devices managed by this driver.
>   */
> @@ -1285,9 +1291,9 @@ static int storvsc_do_io(struct hv_device *device,
>  {
>   struct storvsc_device *stor_device;
>   struct vstor_packet *vstor_packet;
> - struct vmbus_channel *outgoing_channel;
> + struct vmbus_channel *outgoing_channel, *channel;
>   int ret = 0;
> - struct cpumask alloced_mask;
> + struct cpumask alloced_mask, other_numa_mask;
>   int tgt_cpu;
> 
>   vstor_packet = >vstor_packet;
> @@ -1301,22 +1307,53 @@ static int storvsc_do_io(struct hv_device *device,
>   /*
>* Select an an appropriate channel to send the request out.
>*/
> -
>   if (stor_device->stor_chns[q_num] != NULL) {
>   outgoing_channel = stor_device->stor_chns[q_num];
> - if (outgoing_channel->target_cpu == smp_processor_id()) {
> + if (outgoing_channel->target_cpu == q_num) {
>   /*
>* Ideally, we want to pick a different channel if
>* available on the same NUMA node.
>*/
>   cpumask_and(_mask, _device->alloced_cpus,
>   cpumask_of_node(cpu_to_node(q_num)));
> - for_each_cpu_wrap(tgt_cpu, _mask,
> - outgoing_channel->target_cpu + 1) {
> - if (tgt_cpu != outgoing_channel->target_cpu) {
> - outgoing_channel =
> - stor_device->stor_chns[tgt_cpu];
> - break;
> +
> + for_each_cpu_wrap(tgt_cpu, _mask, q_num + 1) {
> + if (tgt_cpu == q_num)
> + continue;
> + channel =

RE: [PATCH v2 char-misc 1/1] x86/hyperv: Add interrupt handler annotations

2018-04-04 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Greg KH <gre...@linuxfoundation.org>
> Sent: Wednesday, April 4, 2018 1:16 AM
> To: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
> o...@aepfle.de;
> a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com;
> leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger
> <sthem...@microsoft.com>; KY Srinivasan <k...@microsoft.com>
> Subject: Re: [PATCH v2 char-misc 1/1] x86/hyperv: Add interrupt handler 
> annotations
> 
> On Tue, Apr 03, 2018 at 01:59:08PM -0700, mhkelle...@gmail.com wrote:
> > From: Michael Kelley <mikel...@microsoft.com>
> >
> > Add standard interrupt handler annotations to
> > hyperv_vector_handler().
> >
> > Signed-off-by: Michael Kelley <mikel...@microsoft.com>
> > ---
> > Changes in v2:
> > * Fixed From: line
> > ---
> >  arch/x86/kernel/cpu/mshyperv.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> > index 4488cf0..20f6849 100644
> > --- a/arch/x86/kernel/cpu/mshyperv.c
> > +++ b/arch/x86/kernel/cpu/mshyperv.c
> > @@ -41,7 +41,7 @@ static void (*hv_stimer0_handler)(void);
> >  static void (*hv_kexec_handler)(void);
> >  static void (*hv_crash_handler)(struct pt_regs *regs);
> >
> > -void hyperv_vector_handler(struct pt_regs *regs)
> > +__visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs)
> 
> What bug does this solve?  What is wrong with the existing markings?
> What does __visible and __irq_entry give us that we don't already have
> and we need?
> 
> Are you really using LTO that requires this marking to prevent the code
> from being removed?

Thomas Gleixner commented on Vitaly Kuznetsov's Hyper-V reenlightenment patch
that the interrupt handler should have these annotations: see
https://lkml.org/lkml/2018/1/14/145

I put the same annotations on the interrupt handler for stimer0 Direct Mode,
So this change makes the hyperv_vector_handler() consistent with
hv_stimer0_vector_handler() in the same source file.  It does not fix any
immediate bug -- it's for consistency and alignment with what is apparently
standard practice.

Not sure what LTO is ...

Michael

> 
> thanks,
> 
> greg k-h
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v2] storvsc: Set up correct queue depth values for IDE devices

2018-03-24 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Long Li
> Sent: Thursday, March 22, 2018 2:47 PM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; James E . J . Bottomley 
> ;
> Martin K . Petersen ; 
> de...@linuxdriverproject.org; linux-
> s...@vger.kernel.org; linux-ker...@vger.kernel.org
> Cc: Long Li 
> Subject: [PATCH v2] storvsc: Set up correct queue depth values for IDE devices
> 
> From: Long Li 
> 
> Unlike SCSI and FC, we don't use multiple channels for IDE.
> Also fix the calculation for sub-channels.
> 
> Change log:
> v2: Addressed comment on incorrect number of sub-channels.
> (Michael Kelley )
> 
> Signed-off-by: Long Li 

Reviewed-by: Michael Kelley 

> ---
>  drivers/scsi/storvsc_drv.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 8c51d628b52e..a2ec0bc9e9fa 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1722,11 +1722,14 @@ static int storvsc_probe(struct hv_device *device,
>   max_targets = STORVSC_MAX_TARGETS;
>   max_channels = STORVSC_MAX_CHANNELS;
>   /*
> -  * On Windows8 and above, we support sub-channels for storage.
> +  * On Windows8 and above, we support sub-channels for storage
> +  * on SCSI and FC controllers.
>* The number of sub-channels offerred is based on the number of
>* VCPUs in the guest.
>*/
> - max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel);
> + if (!dev_is_ide)
> + max_sub_channels =
> + (num_cpus - 1) / storvsc_vcpus_per_sub_channel;
>   }
> 
>   scsi_driver.can_queue = (max_outstanding_req_per_channel *
> --
> 2.14.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH net-next,1/2] hv_netvsc: Fix the return status in RX path

2018-03-24 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Haiyang Zhang
> Sent: Thursday, March 22, 2018 12:01 PM
> To: da...@davemloft.net; net...@vger.kernel.org
> Cc: Haiyang Zhang ; KY Srinivasan 
> ; Stephen
> Hemminger ; o...@aepfle.de; vkuzn...@redhat.com;
> de...@linuxdriverproject.org; linux-ker...@vger.kernel.org
> Subject: [PATCH net-next,1/2] hv_netvsc: Fix the return status in RX path
> 
> From: Haiyang Zhang 
> 
> As defined in hyperv_net.h, the NVSP_STAT_SUCCESS is one not zero.
> Some functions returns 0 when it actually means NVSP_STAT_SUCCESS.
> This patch fixes them.
> 
> In netvsc_receive(), it puts the last RNDIS packet's receive status
> for all packets in a vmxferpage which may contain multiple RNDIS
> packets.
> This patch puts NVSP_STAT_FAIL in the receive completion if one of
> the packets in a vmxferpage fails.

This patch changes the status field that is being reported back to
the Hyper-V host in the receive completion message in
enq_receive_complete().   The current code reports 0 on success,
and with the patch, it will report 1 on success.  So does this change
affect anything on the Hyper-V side?  Or is Hyper-V just ignoring
the value?   If this change doesn't have any impact on the
interactions with Hyper-V, perhaps it would be good to explain
why in the commit message.

Michael

> 
> Signed-off-by: Haiyang Zhang 
> ---
>  drivers/net/hyperv/netvsc.c   | 8 ++--
>  drivers/net/hyperv/netvsc_drv.c   | 2 +-
>  drivers/net/hyperv/rndis_filter.c | 4 ++--
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
> index aa95e81af6e5..1ddb2c39b6e4 100644
> --- a/drivers/net/hyperv/netvsc.c
> +++ b/drivers/net/hyperv/netvsc.c
> @@ -1098,12 +1098,16 @@ static int netvsc_receive(struct net_device *ndev,
>   void *data = recv_buf
>   + vmxferpage_packet->ranges[i].byte_offset;
>   u32 buflen = vmxferpage_packet->ranges[i].byte_count;
> + int ret;
> 
>   trace_rndis_recv(ndev, q_idx, data);
> 
>   /* Pass it to the upper layer */
> - status = rndis_filter_receive(ndev, net_device,
> -   channel, data, buflen);
> + ret = rndis_filter_receive(ndev, net_device,
> +channel, data, buflen);
> +
> + if (unlikely(ret != NVSP_STAT_SUCCESS))
> + status = NVSP_STAT_FAIL;
>   }
> 
>   enq_receive_complete(ndev, net_device, q_idx,
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index cdb78eefab67..33607995be62 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -818,7 +818,7 @@ int netvsc_recv_callback(struct net_device *net,
>   u64_stats_update_end(_stats->syncp);
> 
>   napi_gro_receive(>napi, skb);
> - return 0;
> + return NVSP_STAT_SUCCESS;
>  }
> 
>  static void netvsc_get_drvinfo(struct net_device *net,
> diff --git a/drivers/net/hyperv/rndis_filter.c 
> b/drivers/net/hyperv/rndis_filter.c
> index 2dc00f714482..591fb8080f11 100644
> --- a/drivers/net/hyperv/rndis_filter.c
> +++ b/drivers/net/hyperv/rndis_filter.c
> @@ -443,10 +443,10 @@ int rndis_filter_receive(struct net_device *ndev,
>   "unhandled rndis message (type %u len %u)\n",
>  rndis_msg->ndis_msg_type,
>  rndis_msg->msg_len);
> - break;
> + return NVSP_STAT_FAIL;
>   }
> 
> - return 0;
> + return NVSP_STAT_SUCCESS;
>  }
> 
>  static int rndis_filter_query_device(struct rndis_device *dev,
> --
> 2.15.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] vmbus: use put_device() if device_register fail

2018-03-18 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of KY Srinivasan
> Sent: Sunday, March 18, 2018 8:02 PM
> To: Arvind Yadav ; Stephen Hemminger
> ; Haiyang Zhang 
> Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org
> Subject: RE: [PATCH] vmbus: use put_device() if device_register fail
> 
> > -Original Message-
> > From: devel  On Behalf
> > Of Arvind Yadav
> > Sent: Saturday, March 17, 2018 11:48 PM
> > To: Stephen Hemminger ; Haiyang Zhang
> > 
> > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org
> > Subject: [PATCH] vmbus: use put_device() if device_register fail
> >
> > if device_register() returned an error. Always use put_device()
> > to give up the reference initialized.
> >
> > Signed-off-by: Arvind Yadav 
> > ---
> >  drivers/hv/vmbus_drv.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > index bc65c4d..25da2f3 100644
> > --- a/drivers/hv/vmbus_drv.c
> > +++ b/drivers/hv/vmbus_drv.c
> > @@ -1358,6 +1358,7 @@ int vmbus_device_register(struct hv_device
> > *child_device_obj)
> > ret = device_register(_device_obj->device);
> > if (ret) {
> > pr_err("Unable to register child device\n");
> > +   put_device(_device_obj->device);
> 
> If the registration failed; we would not have acquired the reference on the 
> device and so
> we should not be dropping the reference in the failure path.

Looking at the code for device_register() and its constituent parts
device_initialize() and device_add(), Arvind is correct.  device_initialize()
creates the object with a reference count of 1.  device_add() does not
decrement that reference count or free the object, even if it fails.  The
comments for device_register() call this out as well.

Michael  

> 
> K. Y
> > return ret;
> > }
> >
> > --
> > 2.7.4
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with GFP_KERNEL in new_pcichild_device

2018-03-18 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Jia-Ju Bai
> Sent: Sunday, March 18, 2018 7:53 AM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; bhelg...@google.com
> Cc: de...@linuxdriverproject.org; linux-...@vger.kernel.org; 
> linux-ker...@vger.kernel.org;
> Jia-Ju Bai 
> Subject: [PATCH 2/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with 
> GFP_KERNEL in
> new_pcichild_device
> 
> new_pcichild_device() is not called in atomic context.
> 
> The call chain ending up at new_pcichild_device() is:
> [1] new_pcichild_device() <- pci_devices_present_work()
> pci_devices_present_work() is only set in INIT_WORK().
> 
> Despite never getting called from atomic context,
> new_pcichild_device() calls kzalloc with GFP_ATOMIC,
> which waits busily for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL
> to avoid busy waiting.
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai 

Reviewed-by: Michael Kelley 

> ---
>  drivers/pci/host/pci-hyperv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 0fe3ea1..289e31d 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -1500,7 +1500,7 @@ static struct hv_pci_dev *new_pcichild_device(struct
> hv_pcibus_device *hbus,
>   unsigned long flags;
>   int ret;
> 
> - hpdev = kzalloc(sizeof(*hpdev), GFP_ATOMIC);
> + hpdev = kzalloc(sizeof(*hpdev), GFP_KERNEL);
>   if (!hpdev)
>   return NULL;
> 
> --
> 1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with GFP_KERNEL in hv_pci_onchannelcallback

2018-03-18 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Jia-Ju Bai
> Sent: Sunday, March 18, 2018 7:53 AM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; bhelg...@google.com
> Cc: de...@linuxdriverproject.org; linux-...@vger.kernel.org; 
> linux-ker...@vger.kernel.org;
> Jia-Ju Bai 
> Subject: [PATCH 1/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with 
> GFP_KERNEL in
> hv_pci_onchannelcallback
> 
> hv_pci_onchannelcallback() is not called in atomic context.
> 
> The call chain ending up at hv_pci_onchannelcallback() is:
> [1] hv_pci_onchannelcallback() <- hv_pci_probe()
> hv_pci_probe() is only set as ".probe" in hv_driver
> structure "hv_pci_drv".
> 
> Despite never getting called from atomic context,

Not true.   hv_pci_probe() registers hv_pci_onchannelcallback() as
a callback function that is invoked from the VMbus interrupt handler.
So GFP_ATOMIC is appropriate.

Michael

> hv_pci_onchannelcallback() calls kmalloc with GFP_ATOMIC,
> which waits busily for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL
> to avoid busy waiting.
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai 
> ---
>  drivers/pci/host/pci-hyperv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 0fe3ea1..c5c8a99 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -1887,7 +1887,7 @@ static void hv_pci_onchannelcallback(void *context)
>   struct pci_dev_incoming *dev_message;
>   struct hv_pci_dev *hpdev;
> 
> - buffer = kmalloc(bufferlen, GFP_ATOMIC);
> + buffer = kmalloc(bufferlen, GFP_KERNEL);
>   if (!buffer)
>   return;
> 
> @@ -1899,7 +1899,7 @@ static void hv_pci_onchannelcallback(void *context)
>   kfree(buffer);
>   /* Handle large packet */
>   bufferlen = bytes_recvd;
> - buffer = kmalloc(bytes_recvd, GFP_ATOMIC);
> + buffer = kmalloc(bytes_recvd, GFP_KERNEL);
>   if (!buffer)
>   return;
>   continue;
> --
> 1.9.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] vmbus: use put_device() if device_register fail

2018-03-18 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Arvind Yadav
> Sent: Saturday, March 17, 2018 11:48 PM
> To: Stephen Hemminger ; Haiyang Zhang
> 
> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org
> Subject: [PATCH] vmbus: use put_device() if device_register fail
> 
> if device_register() returned an error. Always use put_device()
> to give up the reference initialized.
> 
> Signed-off-by: Arvind Yadav 

Reviewed-by: Michael Kelley 

> ---
>  drivers/hv/vmbus_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index bc65c4d..25da2f3 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -1358,6 +1358,7 @@ int vmbus_device_register(struct hv_device 
> *child_device_obj)
>   ret = device_register(_device_obj->device);
>   if (ret) {
>   pr_err("Unable to register child device\n");
> + put_device(_device_obj->device);
>   return ret;
>   }
> 
> --
> 2.7.4

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH] storvsc: Set up correct queue depth values for IDE devices

2018-03-16 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Long Li
> Sent: Thursday, March 15, 2018 4:52 PM
> To: KY Srinivasan ; Haiyang Zhang 
> ; Stephen
> Hemminger ; James E . J . Bottomley 
> ;
> Martin K . Petersen ; 
> de...@linuxdriverproject.org; linux-
> s...@vger.kernel.org; linux-ker...@vger.kernel.org
> Cc: Long Li 
> Subject: [PATCH] storvsc: Set up correct queue depth values for IDE devices
> 
> From: Long Li 
> 
> Unlike SCSI and FC, we don't use multiple channels for IDE. So set queue depth
> correctly for IDE.
> 
> Also set the correct cmd_per_lun for all devices.
> 
> Signed-off-by: Long Li 
> ---
>  drivers/scsi/storvsc_drv.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index 8c51d628b52e..fba170640e9c 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1722,15 +1722,19 @@ static int storvsc_probe(struct hv_device *device,
>   max_targets = STORVSC_MAX_TARGETS;
>   max_channels = STORVSC_MAX_CHANNELS;
>   /*
> -  * On Windows8 and above, we support sub-channels for storage.
> +  * On Windows8 and above, we support sub-channels for storage
> +  * on SCSI and FC controllers.
>* The number of sub-channels offerred is based on the number of
>* VCPUs in the guest.
>*/
> - max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel);
> + if (!dev_is_ide)
> + max_sub_channels =
> + num_cpus / storvsc_vcpus_per_sub_channel;

This calculation of the # of sub-channels doesn't get the right answer (and it
didn't before this patch either).  storvsc_vcpus_per_sub_channel defaults to 4.
If num_cpus is 8, max_sub_channels will be 2, but it should be 1.  The
sub-channel count should not include the main channel since we add 1 to the
sub-channel count below when calculating can_queue.

Furthermore, this is calculation is just a guess, in the sense that we're
replicating the algorithm we think Hyper-V is using to determine the number
of sub-channels to offer.   It turns out Hyper-V is changing that algorithm for
particular devices in an upcoming new Azure VM size.  But the only use of
max_sub_channels is in the calculation of can_queue below, so the impact
is minimal.

>   }
> 
>   scsi_driver.can_queue = (max_outstanding_req_per_channel *
>(max_sub_channels + 1));
> + scsi_driver.cmd_per_lun = scsi_driver.can_queue;

can_queue is defined as "int", while cmd_per_lun is defined as "short".
The calculated value of can_queue could easily be over 32,767 with
15 sub-channels and max_outstanding_req_per_channel being 3036
for the default 1 Mbyte ring buffer.  So the assignment to cmd_per_lun
could produce truncation and even a negative number.

More broadly, since max_outstanding_req_per_channel is based on
the ring buffer size, these calculations imply that Hyper-V storvsp's
queuing capacity is based on the ring buffer size.  I don't think that's
the case.  From conversations with the storvsp folks, I think Hyper-V
always removes entries from the guest->host ring buffer and then
lets storvsp queue them separately.   So we don't want to be linking
cmd_per_lun (or even can_queue, for that matter) to the ring buffer
size.  The current default ring buffer size of 1 Mbyte is probably 10x
bigger than needed, and we want to be able to adjust that without
ending up with can_queue and cmd_per_lun values that are too
small.

We would probably do better to set can_queue to a constant, and
leave cmd_per_lun at its current value of 2048.   The can_queue
value is already capped at 10240 in the blk-mq layer, so maybe
that's a reasonable constant to use.

Thoughts?

> 
>   host = scsi_host_alloc(_driver,
>  sizeof(struct hv_host_device));
> --
> 2.14.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v2]PCI: hv: fix PCI-BUS domainID corruption

2018-03-14 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Sridhar Pitchai
> Sent: Wednesday, March 14, 2018 11:08 AM
> To: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>; Michael Kelley (EOSG)
> <michael.h.kel...@microsoft.com>
> Cc: Bjorn Helgaas <bhelg...@google.com>; Jake Oshins <ja...@microsoft.com>; 
> Haiyang
> Zhang <haiya...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>; 
> Dexuan
> Cui <de...@microsoft.com>; KY Srinivasan <k...@microsoft.com>;
> de...@linuxdriverproject.org; linux-...@vger.kernel.org; 
> linux-ker...@vger.kernel.org
> Subject: [PATCH v2]PCI: hv: fix PCI-BUS domainID corruption
> 
> Whenever PCI bus is added, HyperV guarantees the BUS id is unique. Even 
> with
> that when a first device is added to the bus, it overrides bus domain ID 
> with
> the device serial number. Sometime this can result in BUS ID not being 
> unique.
> In this case, when PCI_BUS and a device added to the bus, even before the 
> PCI
> BUS is added to kernel, the first device tends to overwrite the domain ID 
> with
> 0. Since there exists a PCI bus with domain ID 0 already the PCI bus 
> addition
> fails. This patch make sure when a device is added to a bus, it never 
> updated
> the bus domain ID. Since we have the transparent SRIOV mode now, the 
> short VF
> device name is no longer needed.
> 
> Fixes: 4a9b0933bdfc("PCI:hv:Use device serial number as PCI domain")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Sridhar Pitchai <srpit...@microsoft.com>
> ---

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 3/6] PCI: hv: serialize the present/eject work items

2018-03-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>; Jack
> Morgenstein <ja...@mellanox.com>; sta...@vger.kernel.org
> Subject: [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> 
> When we hot-remove the device, we first receive a PCI_EJECT message and
> then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
> 
> The first message is offloaded to hv_eject_device_work(), and the second
> is offloaded to pci_devices_present_work(). Both the paths can be running
> list_del(>list_entry), causing general protection fault, because
> system_wq can run them concurrently.
> 
> The patch eliminates the race condition.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Tested-by: Adrian Suhov <v-ads...@microsoft.com>
> Tested-by: Chris Valean <v-chv...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 2/6] PCI: hv: hv_eject_device_work(): remove the bogus test

2018-03-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>; Jack
> Morgenstein <ja...@mellanox.com>; sta...@vger.kernel.org
> Subject: [PATCH v3 2/6] PCI: hv: hv_eject_device_work(): remove the bogus test
> 
> When we're in the function, hpdev->state must be hv_pcichild_ejecting:
> see hv_pci_eject_device().
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 1233300f41c6..04edb24c92ee 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -1796,10 +1796,7 @@ static void hv_eject_device_work(struct work_struct 
> *work)
> 
>   hpdev = container_of(work, struct hv_pci_dev, wrk);
> 
> - if (hpdev->state != hv_pcichild_ejecting) {
> - put_pcichild(hpdev, hv_pcidev_ref_pnp);
> - return;
> - }
> + WARN_ON(hpdev->state != hv_pcichild_ejecting);
> 
>   /*
>* Ejection can come before or after the PCI bus has been set up, so
> --
> 2.7.4

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()

2018-03-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>;
> sta...@vger.kernel.org; Jack Morgenstein <ja...@mellanox.com>
> Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> 
> 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg()
> by request_irq()  -> request_threaded_irq() -> __setup_irq()->irq_startup()
>  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> disabled in __setup_irq().
> 
> Fix this by polling the channel.
> 
> 2. If the host is ejecting the VF device before we reach
> hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg()
> forever, because at this time the host doesn't respond to the
> CREATE_INTERRUPT request. This issue also happens to old kernels like
> v4.14, v4.13, etc.
> 
> Fix this by polling the channel for the PCI_EJECT message and
> hpdev->state, and by checking the PCI vendor ID.
> 
> Note: actually the above issues also happen to a SMP VM, if
> "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Tested-by: Adrian Suhov <v-ads...@microsoft.com>
> Tested-by: Chris Valean <v-chv...@microsoft.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 58 
> ++-
>  1 file changed, 57 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary

2018-03-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>; Jack
> Morgenstein <ja...@mellanox.com>; sta...@vger.kernel.org
> Subject: [PATCH v3 5/6] PCI: hv: hv_pci_devices_present(): only queue a new 
> work when
> necessary
> 
> If there is a pending work, we just need to add the new dr into
> the dr_list.
> 
> This is suggested by Michael Kelley.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 4/6] PCI: hv: remove hbus->enum_sem

2018-03-06 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Tuesday, March 6, 2018 10:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>; Jack
> Morgenstein <ja...@mellanox.com>; sta...@vger.kernel.org
> Subject: [PATCH v3 4/6] PCI: hv: remove hbus->enum_sem
> 
> Since we serialize the present/eject work items now, we don't need the
> semaphore any more.
> 
> This is suggested by Michael Kelley.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 17 ++---
>  1 file changed, 2 insertions(+), 15 deletions(-)
> 

Reviewed-by: Michael Kelley <mikel...@microsoft.com>
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v2 5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary

2018-03-05 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dexuan Cui
> Sent: Monday, March 5, 2018 11:22 AM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> <k...@microsoft.com>;
> Stephen Hemminger <sthem...@microsoft.com>; o...@aepfle.de; 
> a...@canonical.com;
> jasow...@redhat.com
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; marcelo.ce...@canonical.com; 
> Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Dexuan Cui 
> <de...@microsoft.com>; Jack
> Morgenstein <ja...@mellanox.com>; sta...@vger.kernel.org
> Subject: [PATCH v2 5/6] PCI: hv: hv_pci_devices_present(): only queue a new 
> work when
> necessary
> 
> If there is a pending work, we just need to add the new dr into
> the dr_list.
> 
> This is suggested by Michael Kelley.
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: Jack Morgenstein <ja...@mellanox.com>
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger <sthem...@microsoft.com>
> Cc: K. Y. Srinivasan <k...@microsoft.com>
> Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> ---
>  drivers/pci/host/pci-hyperv.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 3a385212f666..d3aa6736a9bb 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -1733,6 +1733,7 @@ static void hv_pci_devices_present(struct 
> hv_pcibus_device *hbus,
>   struct hv_dr_state *dr;
>   struct hv_dr_work *dr_wrk;
>   unsigned long flags;
> + bool pending_dr;
> 
>   dr_wrk = kzalloc(sizeof(*dr_wrk), GFP_NOWAIT);
>   if (!dr_wrk)
> @@ -1756,11 +1757,23 @@ static void hv_pci_devices_present(struct 
> hv_pcibus_device
> *hbus,
>   }
> 
>   spin_lock_irqsave(>device_list_lock, flags);
> +
> + /*
> +  * If pending_dr is true, we have already queued a work,
> +  * which will see the new dr. Otherwise, we need to
> +  * queue a new work.
> +  */
> + pending_dr = !list_empty(>dr_list);
>   list_add_tail(>list_entry, >dr_list);
> - spin_unlock_irqrestore(>device_list_lock, flags);

A minor point:  The spin_unlock_irqrestore() call can
stay here.   Once we have the list status in a local variable
and the new entry is added to the list, nothing bad can
happen if we drop the spin lock.   At worst, and very unlikely,
we'll queue work when some other thread has already queued
work to process the list entry, but that's no big deal.   I'd argue
for keeping the code covered by a spin lock as small as possible.

Michael

> 
> - get_hvpcibus(hbus);
> - queue_work(hbus->wq, _wrk->wrk);
> + if (pending_dr) {
> + kfree(dr_wrk);
> + } else {
> + get_hvpcibus(hbus);
> + queue_work(hbus->wq, _wrk->wrk);
> + }
> +
> + spin_unlock_irqrestore(>device_list_lock, flags);
>  }
> 
>  /**
> --
> 2.7.4
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 2/3] PCI: hv: serialize the present/eject work items

2018-03-03 Thread Michael Kelley (EOSG)

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  
> On Behalf
> Of Dexuan Cui
> Sent: Friday, March 2, 2018 4:21 PM
> To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan 
> ;
> Stephen Hemminger 
> Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; 
> Haiyang Zhang
> ; o...@aepfle.de; a...@canonical.com; 
> jasow...@redhat.com;
> vkuzn...@redhat.com; marcelo.ce...@canonical.com; Dexuan Cui 
> ;
> Jack Morgenstein ; sta...@vger.kernel.org
> Subject: [PATCH 2/3] PCI: hv: serialize the present/eject work items
> 
> When we hot-remove the device, we first receive a PCI_EJECT message and
> then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
> 
> The first message is offloaded to hv_eject_device_work(), and the second
> is offloaded to pci_devices_present_work(). Both the paths can be running
> list_del(>list_entry), causing general protection fault, because
> system_wq can run them concurrently.
> 
> The patch eliminates the race condition.

With this patch, the enum_sem field in struct hv_pcibus_device
is no longer needed.  The semaphore serializes execution in
hv_pci_devices_present_work(), and that serialization is now done
with the ordered workqueue.  Also, the last paragraph of the top level
comment for hv_pci_devices_present_work() should be updated to
reflect the new ordering assumptions.

Separately, an unrelated bug:  At the top of hv_eject_device_work(),
the first test may do a put_pcichild() and return.  This exit path also
needs to do put_hvpcibus() to balance the ref counts, or do a goto
the last two lines at the bottom of the function.

> 
> Signed-off-by: Dexuan Cui 
> Tested-by: Adrian Suhov 
> Tested-by: Chris Valean 
> Cc: Vitaly Kuznetsov 
> Cc: Jack Morgenstein 
> Cc: sta...@vger.kernel.org
> Cc: Stephen Hemminger 
> Cc: K. Y. Srinivasan 
> ---
>  drivers/pci/host/pci-hyperv.c | 19 ---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
> index 1233300f41c6..57b1fb3ebdb9 100644
> --- a/drivers/pci/host/pci-hyperv.c
> +++ b/drivers/pci/host/pci-hyperv.c
> @@ -461,6 +461,8 @@ struct hv_pcibus_device {
>   struct retarget_msi_interrupt retarget_msi_interrupt_params;
> 
>   spinlock_t retarget_msi_interrupt_lock;
> +
> + struct workqueue_struct *wq;
>  };
> 
>  /*
> @@ -1770,7 +1772,7 @@ static void hv_pci_devices_present(struct 
> hv_pcibus_device *hbus,
>   spin_unlock_irqrestore(>device_list_lock, flags);
> 
>   get_hvpcibus(hbus);
> - schedule_work(_wrk->wrk);
> + queue_work(hbus->wq, _wrk->wrk);

This invocation of get_hvpcibus() and queue_work() could be made
conditional on whether the preceding list_add_tail() transitioned
the list from empty to non-empty.  If the list was already non-empty,
a previously queued invocation of hv_pci_devices_present_work()
will find the new entry and process it.   But this is an
optimization in a non-perf sensitive code path, so may not be
worth it.

>  }
> 
>  /**
> @@ -1848,7 +1850,7 @@ static void hv_pci_eject_device(struct hv_pci_dev 
> *hpdev)
>   get_pcichild(hpdev, hv_pcidev_ref_pnp);
>   INIT_WORK(>wrk, hv_eject_device_work);
>   get_hvpcibus(hpdev->hbus);
> - schedule_work(>wrk);
> + queue_work(hpdev->hbus->wq, >wrk);
>  }
> 
>  /**
> @@ -2463,11 +2465,17 @@ static int hv_pci_probe(struct hv_device *hdev,
>   spin_lock_init(>retarget_msi_interrupt_lock);
>   sema_init(>enum_sem, 1);
>   init_completion(>remove_event);
> + hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0,
> +hbus->sysdata.domain);
> + if (!hbus->wq) {
> + ret = -ENOMEM;
> + goto free_bus;
> + }
> 
>   ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0,
>hv_pci_onchannelcallback, hbus);
>   if (ret)
> - goto free_bus;
> + goto destroy_wq;
> 
>   hv_set_drvdata(hdev, hbus);
> 
> @@ -2536,6 +2544,9 @@ static int hv_pci_probe(struct hv_device *hdev,
>   hv_free_config_window(hbus);
>  close:
>   vmbus_close(hdev->channel);
> +destroy_wq:
> + drain_workqueue(hbus->wq);

The drain_workqueue() call isn't necessary.  destroy_workqueue() calls
drain_workqueue() and there better not be anything in the workqueue
anyway since all the ref counts are zero.

> + destroy_workqueue(hbus->wq);
>  free_bus:
>   free_page((unsigned long)hbus);
>   return ret;
> @@ -2615,6 +2626,8 @@ static int hv_pci_remove(struct hv_device *hdev)
>   irq_domain_free_fwnode(hbus->sysdata.fwnode);
>

RE: [PATCH 08/12] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2018-02-13 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Dan Carpenter <dan.carpen...@oracle.com>
> Sent: Monday, February 12, 2018 12:42 AM
> To: KY Srinivasan <k...@microsoft.com>; Stephen Hemminger
> <step...@networkplumber.org>
> Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; 
> de...@linuxdriverproject.org;
> o...@aepfle.de; a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com;
> leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger
> <sthem...@microsoft.com>; Michael Kelley (EOSG) 
> <michael.h.kel...@microsoft.com>
> Subject: Re: [PATCH 08/12] Drivers: hv: vmbus: Implement Direct Mode for 
> stimer0
> 
> On Sun, Feb 11, 2018 at 05:33:16PM -0700, k...@exchange.microsoft.com wrote:
> > @@ -116,9 +146,29 @@ static int hv_ce_set_oneshot(struct clock_event_device 
> > *evt)
> >  {
> > union hv_timer_config timer_cfg;
> >
> > +   timer_cfg.as_uint64 = 0;
> > timer_cfg.enable = 1;
> > timer_cfg.auto_enable = 1;
> > -   timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> > +   if (direct_mode_enabled)
> > +   /*
> > +* When it expires, the timer will directly interrupt
> > +* on the specified hardware vector/IRQ.
> > +*/
> > +   {
> > +   timer_cfg.direct_mode = 1;
> > +   timer_cfg.apic_vector = stimer0_vector;
> > +   hv_enable_stimer0_percpu_irq(stimer0_irq);
> > +   }
> > +   else
> > +   /*
> > +* When it expires, the timer will generate a VMbus message,
> > +* to be handled by the normal VMbus interrupt handler.
> > +*/
> > +   {
> > +   timer_cfg.direct_mode = 0;
> > +   timer_cfg.sintx = VMBUS_MESSAGE_SINT;
> > +   }
> > +
> 
> This indenting isn't right.  We should probably zero out .apic_vector
> if .direct_mode is zero.  Or maybe it's fine.  I don't know if any
> static analysis tools will complain...

I'll fix the indenting.  Old habits 

The " timer_cfg.as_uint64 = 0" statement already zero's out .apic_vector
along with all the other unused fields in the 64-bit value, as required by
the Hyper-V spec.

> 
> > hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64);
> >
> > return 0;
> > @@ -191,6 +241,10 @@ int hv_synic_alloc(void)
> > INIT_LIST_HEAD(_cpu->chan_list);
> > }
> >
> > +   if (direct_mode_enabled && hv_setup_stimer0_irq(
> > +   _irq, _vector, hv_stimer0_isr))
> > +   goto err;
> 
> 
> Can you indent it like this:
> 
>   if (direct_mode_enabled &&
>   hv_setup_stimer0_irq(_irq, _vector,
>hv_stimer0_isr))
>   goto err;

OK -- will change.

> 
> 
> [ What follows is a long rant not directed at you ]
> 
> It's annoying because as soon as I see the "goto err;", I know the error
> handling for this function is going to be buggy...
> 
> Some rules of error handling are:
> 
> 1)  Each function should clean up after itself instead returning
> partially allocated structures.
> 2)  Each allocation function should have a matching free function.
> 3)  Give meaningful label names based on what the label location so that
> we can tell what the goto does just by looking at it, such as,
> "goto free_some_variable".  This way we can just keep a mental tally
> of the most recently allocated resource and verify based on the
> "goto free_resource;" statemetn that it frees the correct thing.  We
> don't need to scroll to the bottom of the function.
> 
> Using good names means that we should avoid do-nothing gotos
> because, by definition, the label name for a do-nothing goto is
> going to be vague.
> 
> In this case the label looks like this:
> 
> > +
> > return 0;
> >  err:
> > return -ENOMEM;
> 
> We allocate a bunch of stuff in this function so at first glance this
> looks like we leak everything but, actually, the cleanup is done in
> vmbus_bus_init().  This is a layering violation.
> 
> Later on, we changed hv_synic_alloc() in 37cdd991fac8 ("vmbus: put
> related per-cpu variable together") and we started allocating:
> 
>   hv_cpu->clk_evt = kzalloc(...
> 
> but we forgot to update the error handling because it was in the wrong
> place.  It's a very predictable, avoidable bug if we just use proper
> error handling style.

We'll fix this is a separate patch.  There are a couple other minor things
that should be cleaned up in hv.c as well, and can do them together.

Michael

> 
> Anyway...  Sorry for the long rant.  Summary:  Always distrust vague
> label names.
> 
> regards,
> dan carpenter

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling

2018-02-11 Thread Michael Kelley (EOSG)

> -Original Message-
> From: KY Srinivasan
> Sent: Sunday, February 11, 2018 5:14 PM

--- snip ---

> > if (rbi->ring_buffer->feature_bits.feat_pending_send_sz) {
> > u32 pending_sz = READ_ONCE(rbi->ring_buffer-
> > >pending_send_sz);
> >
> > /*
> > +* Ensure the read of write_index in
> > hv_get_bytes_to_write()
> > +* happens after the read of pending_send_sz.
> > +*/
> > +   virt_rmb();
> We can avoid the read barrier by making the initialization of curr_write_sz 
> conditional
> on pending_send_sz being non-zero. Indeed you can make all the signaling code 
> conditional
> on pending_send_sz being non-zero.
> 
> 

I agree that we can immediately test pending_send_sz for zero, and exit
if zero.  A  zero value will be by far the most common path, and would
save executing the read barrier and hv_get_bytes_to_write().  Can also
move the calculation of "delta" to after the test for zero.

But I believe the read barrier is still needed on the path where
pending_send_sz is non-zero.  Just the testing the value of pending_send_sz
doesn't guarantee that the write_index wouldn't be speculatively read,
and potentially out-of-order.   And we have to consider the out-of-order
behaviors on ARM64 as well.

> 
> > +   curr_write_sz = hv_get_bytes_to_write(rbi);
> > +
> > +   /*
> >  * If there was space before we began iteration,
> >  * then host was not blocked. Also handles case where
> >  * pending_sz is zero then host has nothing pending
> >  * and does not need to be signaled.
> >  */
> > -   if (orig_write_sz > pending_sz)
> > +   if (curr_write_sz - delta > pending_sz)
> > return;
> >
> > /* If pending write will not fit, don't give false hope. */
> > -   if (hv_get_bytes_to_write(rbi) < pending_sz)
> > +   if (curr_write_sz <= pending_sz)
> > return;
> > +
> > +   vmbus_setevent(channel);
> > }
> >
> > -   vmbus_setevent(channel);
> >  }
> >  EXPORT_SYMBOL_GPL(hv_pkt_iter_close);
> > --
> > 1.8.3.1

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-31 Thread Michael Kelley (EOSG)

> From: Long Li
> Sent: Wednesday, January 31, 2018 12:23 PM
> To: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>; KY Srinivasan
> <k...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>;
> martin.peter...@oracle.com; de...@linuxdriverproject.org; 
> linux-ker...@vger.kernel.org;
> linux-s...@vger.kernel.org; James E . J . Bottomley <j...@linux.vnet.ibm.com>
> Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a 
> channel for I/O
> requests
> 
> > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> > channel for I/O requests
> >
> > Updated/corrected two email addresses ...
> >
> > > -Original Message-
> > > From: Michael Kelley (EOSG)
> > > Sent: Wednesday, January 24, 2018 2:14 PM
> > > To: KY Srinivasan <k...@microsoft.com>; Stephen Hemminger
> > > <sthem...@microsoft.com>; martin.peter...@oracle.com;
> > > lo...@microsoft.com; jbottom...@odin.com;
> > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org;
> > > linux-s...@vger.kernel.org
> > > Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a
> > > channel for I/O requests
> > >
> > > Update the algorithm in storvsc_do_io to look for a channel starting
> > > with the current CPU + 1 and wrap around (within the current NUMA
> > > node). This spreads VMbus interrupts more evenly across CPUs. Previous
> > > code always started with first CPU in the current NUMA node, skewing
> > > the interrupt load to that CPU.
> > >
> > > Signed-off-by: Michael Kelley <mikel...@microsoft.com>
> > > ---
> > >  drivers/scsi/storvsc_drv.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> > > index e07907d..f3264c4 100644
> > > --- a/drivers/scsi/storvsc_drv.c
> > > +++ b/drivers/scsi/storvsc_drv.c
> > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device,
> > >*/
> > >   cpumask_and(_mask, _device-
> > >alloced_cpus,
> > >
> > cpumask_of_node(cpu_to_node(q_num)));
> > > - for_each_cpu(tgt_cpu, _mask) {
> > > + for_each_cpu_wrap(tgt_cpu, _mask,
> > > + outgoing_channel->target_cpu + 1) {
> 
> Does it work when target_cpu is the last CPU on the system?
> 
> Otherwise, looking good.

Yes, it works.  for_each_cpu_wrap() correctly wraps in the case where
the 3rd parameter ('start') is one past the end of the mask.  Arguably,
we shouldn't rely on that, and should do the wrap to 0 before calling
for_each_cpu_wrap().

> 
> > >   if (tgt_cpu != outgoing_channel->target_cpu)
> > {
> > >   outgoing_channel =
> > >   stor_device->stor_chns[tgt_cpu];
> > > --
> > > 1.8.3.1
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

[PATCH v4 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2018-01-29 Thread Michael Kelley (EOSG)

From: Michael Kelley 

The 2016 version of Hyper-V offers the option to operate the guest VM
per-vcpu stimer's in Direct Mode, which means the timer interupts on its
own vector rather than queueing a VMbus message. Direct Mode reduces
timer processing overhead in both the hypervisor and the guest, and
avoids having timer interrupts pollute the VMbus interrupt stream for
the synthetic NIC and storage.  This patch enables Direct Mode by
default on stimer0 when running on a version of Hyper-V that supports
it.

In prep for coming support of Hyper-V on ARM64, the arch independent
portion of the code contains calls to routines that will be populated
on ARM64 but are not needed and do nothing on x86.

Signed-off-by: Michael Kelley 
---
Changes since v1:
* Major rework to allocate and use a fixed interrupt vector
* Fixed block comment style
* Removed minor changes unrelated to Direct Mode

Changes since v2:
* Remove module parameter in drivers/hv/hv.c [Greg KH]
* Add declaration for hv_stimer0_vector_handler in mshyperv.h 

Changes since v3:
* Remove vestiges of module parameter in commit message and
  #include list in hv.c [Greg KH]

---
 arch/x86/entry/entry_32.S  |  3 ++
 arch/x86/entry/entry_64.S  |  2 ++
 arch/x86/include/asm/hardirq.h |  3 ++
 arch/x86/include/asm/irq_vectors.h |  7 -
 arch/x86/include/asm/mshyperv.h| 13 
 arch/x86/include/uapi/asm/hyperv.h |  3 ++
 arch/x86/kernel/cpu/mshyperv.c | 41 -
 arch/x86/kernel/irq.c  |  9 ++
 drivers/hv/hv.c| 61 --
 drivers/hv/hyperv_vmbus.h  |  4 ++-
 10 files changed, 141 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f32..1814991 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -882,6 +882,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, 
HYPERVISOR_CALLBACK_VECTOR,
 BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 hyperv_vector_handler)
 
+BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
+hv_stimer0_vector_handler)
+
 #endif /* CONFIG_HYPERV */
 
 ENTRY(page_fault)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f048e38..23afac9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1227,6 +1227,8 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #if IS_ENABLED(CONFIG_HYPERV)
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
hyperv_callback_vector hyperv_vector_handler
+apicinterrupt3 HYPERV_STIMER0_VECTOR \
+   hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
 idtentry debug do_debughas_error_code=0
paranoid=1 shift_ist=DEBUG_STACK
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 51cc979..c788343 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -38,6 +38,9 @@
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
unsigned int irq_hv_callback_count;
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+   unsigned int hyperv_stimer0_count;
+#endif
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 67421f6..6accf0b 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,7 +103,12 @@
 #endif
 
 #define MANAGED_IRQ_SHUTDOWN_VECTOR0xef
-#define LOCAL_TIMER_VECTOR 0xee
+
+#if IS_ENABLED(CONFIG_HYPERV)
+#define HYPERV_STIMER0_VECTOR  0xee
+#endif
+
+#define LOCAL_TIMER_VECTOR 0xed
 
 #define NR_VECTORS  256
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index b623a42..5a170e5 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -171,6 +171,19 @@ static inline void vmbus_signal_eom(struct hv_message 
*msg, u32 old_msg_type)
 void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs));
 void hv_remove_crash_handler(void);
 
+/*
+ * Routines for stimer0 Direct Mode handling.
+ * On x86/x64, there are no percpu actions to take.
+ */
+void hv_stimer0_vector_handler(struct pt_regs *regs);
+void hv_stimer0_callback_vector(void);
+int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void));
+void hv_remove_stimer0_irq(int irq);
+
+static inline void hv_enable_stimer0_percpu_irq(int irq) {}
+static inline void hv_disable_stimer0_percpu_irq(int irq) {}
+
+
 #if IS_ENABLED(CONFIG_HYPERV)
 extern struct clocksource *hyperv_cs;
 extern void *hv_hypercall_pg;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfea..7213cb8 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++

[PATCH v3 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2018-01-28 Thread Michael Kelley (EOSG)

From: Michael Kelley 

The 2016 version of Hyper-V offers the option to operate the guest VM
per-vcpu stimer's in Direct Mode, which means the timer interupts on its
own vector rather than queueing a VMbus message. Direct Mode reduces
timer processing overhead in both the hypervisor and the guest, and
avoids having timer interrupts pollute the VMbus interrupt stream for
the synthetic NIC and storage.  This patch enables Direct Mode by
default on stimer0 when running on a version of Hyper-V that supports it,
with a hv_vmbus module parameter for disabling Direct Mode and reverting
to the old behavior.

In prep for coming support of Hyper-V on ARM64, the arch independent
portion of the code contains calls to routines that will be populated
on ARM64 but are not needed and do nothing on x86.

Signed-off-by: Michael Kelley 
---
Changes since v1:
* Major rework to allocate and use a fixed interrupt vector
* Fixed block comment style
* Removed minor changes unrelated to Direct Mode

Changes since v2:
* Remove module parameter in drivers/hv/hv.c [Greg KH]
* Add declaration for hv_stimer0_vector_handler in mshyperv.h 

---
 arch/x86/entry/entry_32.S  |  3 ++
 arch/x86/entry/entry_64.S  |  2 ++
 arch/x86/include/asm/hardirq.h |  3 ++
 arch/x86/include/asm/irq_vectors.h |  7 -
 arch/x86/include/asm/mshyperv.h| 13 
 arch/x86/include/uapi/asm/hyperv.h |  3 ++
 arch/x86/kernel/cpu/mshyperv.c | 41 -
 arch/x86/kernel/irq.c  |  9 ++
 drivers/hv/hv.c| 62 --
 drivers/hv/hyperv_vmbus.h  |  4 ++-
 10 files changed, 142 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ace8f32..1814991 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -882,6 +882,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, 
HYPERVISOR_CALLBACK_VECTOR,
 BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
 hyperv_vector_handler)
 
+BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR,
+hv_stimer0_vector_handler)
+
 #endif /* CONFIG_HYPERV */
 
 ENTRY(page_fault)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index f048e38..23afac9 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1227,6 +1227,8 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #if IS_ENABLED(CONFIG_HYPERV)
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
hyperv_callback_vector hyperv_vector_handler
+apicinterrupt3 HYPERV_STIMER0_VECTOR \
+   hv_stimer0_callback_vector hv_stimer0_vector_handler
 #endif /* CONFIG_HYPERV */
 
 idtentry debug do_debughas_error_code=0
paranoid=1 shift_ist=DEBUG_STACK
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 51cc979..c788343 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -38,6 +38,9 @@
 #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
unsigned int irq_hv_callback_count;
 #endif
+#if IS_ENABLED(CONFIG_HYPERV)
+   unsigned int hyperv_stimer0_count;
+#endif
 } cacheline_aligned irq_cpustat_t;
 
 DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat);
diff --git a/arch/x86/include/asm/irq_vectors.h 
b/arch/x86/include/asm/irq_vectors.h
index 67421f6..6accf0b 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -103,7 +103,12 @@
 #endif
 
 #define MANAGED_IRQ_SHUTDOWN_VECTOR0xef
-#define LOCAL_TIMER_VECTOR 0xee
+
+#if IS_ENABLED(CONFIG_HYPERV)
+#define HYPERV_STIMER0_VECTOR  0xee
+#endif
+
+#define LOCAL_TIMER_VECTOR 0xed
 
 #define NR_VECTORS  256
 
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index b623a42..5a170e5 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -171,6 +171,19 @@ static inline void vmbus_signal_eom(struct hv_message 
*msg, u32 old_msg_type)
 void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs));
 void hv_remove_crash_handler(void);
 
+/*
+ * Routines for stimer0 Direct Mode handling.
+ * On x86/x64, there are no percpu actions to take.
+ */
+void hv_stimer0_vector_handler(struct pt_regs *regs);
+void hv_stimer0_callback_vector(void);
+int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void));
+void hv_remove_stimer0_irq(int irq);
+
+static inline void hv_enable_stimer0_percpu_irq(int irq) {}
+static inline void hv_disable_stimer0_percpu_irq(int irq) {}
+
+
 #if IS_ENABLED(CONFIG_HYPERV)
 extern struct clocksource *hyperv_cs;
 extern void *hv_hypercall_pg;
diff --git a/arch/x86/include/uapi/asm/hyperv.h 
b/arch/x86/include/uapi/asm/hyperv.h
index 1a5bfea..7213cb8 100644
--- a/arch/x86/include/uapi/asm/hyperv.h
+++ b/arch/x86/include/uapi/asm/hyperv.h

RE: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2018-01-26 Thread Michael Kelley (EOSG)

> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Thursday, January 25, 2018 2:00 AM
> To: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; 
> o...@aepfle.de;
> a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com;
> leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger
> <sthem...@microsoft.com>; KY Srinivasan <k...@microsoft.com>; x...@kernel.org;
> j...@pun.org
> Subject: Re: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Implement Direct 
> Mode for stimer0
> 
> On Mon, Jan 22, 2018 at 03:29:29PM -0700, mikel...@exchange.microsoft.com 
> wrote:
> > +/*
> > + * If false, we're using the old mechanism for stimer0 interrupts
> > + * where it sends a VMbus message when it expires. The old
> > + * mechanism is used if Direct Mode is explicitly disabled
> > + * by the module parameter, or on older versions of Hyper-V
> > + * that don't support Direct Mode. While Hyper-V provides
> > + * four stimer's per CPU, Linux uses only stimer0.
> > + */
> > +static bool direct_mode_enabled = true;
> > +module_param(direct_mode_enabled, bool, 0444);
> > +MODULE_PARM_DESC(direct_mode_enabled,
> > +   "Set to N to disable stimer Direct Mode.");
> 
> Ick ick ick.  How is a distro or user supposed to know if/when to enable
> this and not to enable it?  This isn't ok, can you please make this
> "automatic" to always do the right thing based on what it is running on?

To be clear, this patch already does the "automatic" thing based on the
capabilities of the hypervisor.  But it's often handy to be able to override
the automatic behavior without having to rebuild the kernel.

> 
> Module parameters are not a good idea as they are impossible to maintain
> and document and use over time.  Please never add new ones to the
> kernel.
> 

Is there a better alternative for providing an "override" capability?
Having an override isn't super important in this case, so I'll drop the
parameter, but it would be nice to have something easier than a rebuild.

Michael

> thanks,
> 
> greg k-h
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

[PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices

2018-01-24 Thread Michael Kelley (EOSG)

Increase cmd_per_lun to allow more I/Os in progress per device,
particularly for NVMe's.  The Hyper-V host side can handle the
higher count with no issues.

Signed-off-by: Michael Kelley 
---
 drivers/scsi/storvsc_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index f3264c4..6205107 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1657,7 +1657,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
.eh_timed_out = storvsc_eh_timed_out,
.slave_alloc =  storvsc_device_alloc,
.slave_configure =  storvsc_device_configure,
-   .cmd_per_lun =  255,
+   .cmd_per_lun =  2048,
.this_id =  -1,
.use_clustering =   ENABLE_CLUSTERING,
/* Make sure we dont get a sg segment crosses a page boundary */
-- 
1.8.3.1
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-24 Thread Michael Kelley (EOSG)

Updated/corrected two email addresses ...

> -Original Message-
> From: Michael Kelley (EOSG)
> Sent: Wednesday, January 24, 2018 2:14 PM
> To: KY Srinivasan <k...@microsoft.com>; Stephen Hemminger 
> <sthem...@microsoft.com>;
> martin.peter...@oracle.com; lo...@microsoft.com; jbottom...@odin.com;
> de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; 
> linux-s...@vger.kernel.org
> Cc: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
> Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel 
> for I/O requests
> 
> Update the algorithm in storvsc_do_io to look for a channel
> starting with the current CPU + 1 and wrap around (within the
> current NUMA node). This spreads VMbus interrupts more evenly
> across CPUs. Previous code always started with first CPU in
> the current NUMA node, skewing the interrupt load to that CPU.
> 
> Signed-off-by: Michael Kelley <mikel...@microsoft.com>
> ---
>  drivers/scsi/storvsc_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
> index e07907d..f3264c4 100644
> --- a/drivers/scsi/storvsc_drv.c
> +++ b/drivers/scsi/storvsc_drv.c
> @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device,
>*/
>   cpumask_and(_mask, _device->alloced_cpus,
>   cpumask_of_node(cpu_to_node(q_num)));
> - for_each_cpu(tgt_cpu, _mask) {
> + for_each_cpu_wrap(tgt_cpu, _mask,
> + outgoing_channel->target_cpu + 1) {
>   if (tgt_cpu != outgoing_channel->target_cpu) {
>   outgoing_channel =
>   stor_device->stor_chns[tgt_cpu];
> --
> 1.8.3.1
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

[PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests

2018-01-24 Thread Michael Kelley (EOSG)

Update the algorithm in storvsc_do_io to look for a channel
starting with the current CPU + 1 and wrap around (within the
current NUMA node). This spreads VMbus interrupts more evenly
across CPUs. Previous code always started with first CPU in
the current NUMA node, skewing the interrupt load to that CPU.

Signed-off-by: Michael Kelley 
---
 drivers/scsi/storvsc_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index e07907d..f3264c4 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device,
 */
cpumask_and(_mask, _device->alloced_cpus,
cpumask_of_node(cpu_to_node(q_num)));
-   for_each_cpu(tgt_cpu, _mask) {
+   for_each_cpu_wrap(tgt_cpu, _mask,
+   outgoing_channel->target_cpu + 1) {
if (tgt_cpu != outgoing_channel->target_cpu) {
outgoing_channel =
stor_device->stor_chns[tgt_cpu];
-- 
1.8.3.1
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support

2018-01-23 Thread Michael Kelley (EOSG)

Vitaly Kuznetsov <vkuzn...@redhat.com> writes:
> 
> > "Michael Kelley (EOSG)" <michael.h.kel...@microsoft.com> writes:
> >
> >> On Fri, 19 Jan 2018, Thomas Gleixner wrote:
> >>
> >>>
> >>> You added '#include ' to mshyperv.h which is included 
> >>> in
> >>> vclock_gettime.c and pulls in other stuff which fails to expand
> >>
> >> Is the declaration of hyperv_reenlightenment_intr() even needed in
> >> mshyperv.h?  The '#include ' is there for the 
> >> __irq_entry
> >> annotation on that declaration.   There's a declaration of the parallel 
> >> (and
> >> unannotated) hyperv_vector_handler(), but that looks like a fossil that
> >> isn't needed either.
> >>
> >
> > True,
> >
> > the only need for the declaration in mshyperv.h is to silence "warning:
> > no previous prototype for ‘hyperv_reenlightenment_intr’"; I'm not sure
> > if this actually needs fixing.
> 
> It seems we can get away with dropping '__visible' and '__irq_entry'
> qualifiers from 'hyperv_reenlightenment_intr' declaration in mshyperv.h
> while keeping them in hv_init.c for the implementation. This way we can
> avoid the nasty 'no previous prototype' warning and not have to add
> '#include ' to mshyperv.h breaking vdso.

Where exactly is the 'no previous prototype' warning being generated?
I was thinking the only references to hyperv_reenlightenment_intr()
would be in entry_32.S and entry_64.S, but maybe there's another one
I missed.

Michael

> 
> I'm inclined to go ahead with this in v4 if nobody objects.
> 
> --
>   Vitaly
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support

2018-01-21 Thread Michael Kelley (EOSG)

On Fri, 19 Jan 2018, Thomas Gleixner wrote:

> -Original Message-
> From: Thomas Gleixner [mailto:t...@linutronix.de]
> Sent: Friday, January 19, 2018 11:48 PM
> To: Vitaly Kuznetsov <vkuzn...@redhat.com>
> Cc: kbuild test robot <l...@intel.com>; kbuild-...@01.org; 
> k...@vger.kernel.org;
> x...@kernel.org; Stephen Hemminger <sthem...@microsoft.com>; Radim Krčmář
> <rkrc...@redhat.com>; Haiyang Zhang <haiya...@microsoft.com>; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; Michael Kelley (EOSG)
> <michael.h.kel...@microsoft.com>; Ingo Molnar <mi...@redhat.com>; Roman Kagan
> <rka...@virtuozzo.com>; Andy Lutomirski <l...@kernel.org>; H. Peter Anvin
> <h...@zytor.com>; Paolo Bonzini <pbonz...@redhat.com>; Mohammed Gamal
> <mmo...@redhat.com>
> Subject: Re: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support
> 
> On Fri, 19 Jan 2018, Vitaly Kuznetsov wrote:
> > kbuild test robot <l...@intel.com> writes:
> >
> > > Hi Vitaly,
> > >
> > > Thank you for the patch! Perhaps something to improve:
> > >
> > > [auto build test WARNING on tip/auto-latest]
> > > [also build test WARNING on v4.15-rc8 next-20180118]
> > > [cannot apply to tip/x86/core kvm/linux-next]
> > > [if your patch is applied to the wrong git tree, please drop us a note to 
> > > help improve the
> system]
> > >
> > > url:
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F0day-
> ci%2Flinux%2Fcommits%2FVitaly-Kuznetsov%2Fx86-kvm-hyperv-stable-clocksorce-for-L2-
> guests-when-running-nested-KVM-on-Hyper-V%2F20180119-
> 160814=02%7C01%7CMichael.H.Kelley%40microsoft.com%7Ce95c66107da6446826830
> 8d55fda2c2b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636520313062777623
> data=kAXl3mLVUdJi%2BsB4Ub0fmUHQfl6NuUDjW%2FAY9%2BFLZE4%3D=0
> > > config: x86_64-allmodconfig (attached as .config)
> > > compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
> > > reproduce:
> > > # save the attached .config to linux build tree
> > > make ARCH=x86_64
> > >
> > > All warnings (new ones prefixed by >>):
> > >
> > >In file included from include/linux/kasan.h:17:0,
> > > from include/linux/slab.h:129,
> > > from include/linux/irq.h:26,
> > > from arch/x86/include/asm/hardirq.h:6,
> > > from include/linux/hardirq.h:9,
> > > from include/linux/interrupt.h:13,
> > > from arch/x86/include/asm/mshyperv.h:8,
> > > from 
> > > arch/x86//entry/vdso/vdso32/../vclock_gettime.c:20,
> > > from arch/x86//entry/vdso/vdso32/vclock_gettime.c:33:
> > >arch/x86/include/asm/pgtable.h: In function 'clone_pgd_range':
> > >arch/x86/include/asm/pgtable.h:1129:9: error: implicit declaration of 
> > > function
> 'kernel_to_user_pgdp'; did you mean 'u64_to_user_ptr'? 
> [-Werror=implicit-function-
> declaration]
> > >  memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src),
> > > ^~~
> >
> > Sorry but I'm failing to see how this (and all the rest) is related to
> > my patch ...
> 
> You added '#include ' to mshyperv.h which is included in
> vclock_gettime.c and pulls in other stuff which fails to expand

Is the declaration of hyperv_reenlightenment_intr() even needed in
mshyperv.h?  The '#include ' is there for the __irq_entry
annotation on that declaration.   There's a declaration of the parallel (and
unannotated) hyperv_vector_handler(), but that looks like a fossil that
isn't needed either.

Michael

> 
> Thanks,
> 
>   tglx

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH 3/6] x86/hyper-v: reenlightenment notifications support

2017-12-12 Thread Michael Kelley (EOSG)

On Fri, Dec 08, 2017 at 11:49:57AM +0100, Vitaly Kuznetsov wrote:

> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Friday, December 8, 2017 2:50 AM
> To: k...@vger.kernel.org; x...@kernel.org
> Cc: Paolo Bonzini <pbonz...@redhat.com>; Radim Krčmář <rkrc...@redhat.com>; 
> Thomas
> Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; Stephen Hemminger <sthem...@microsoft.com>; Michael
> Kelley (EOSG) <michael.h.kel...@microsoft.com>; Andy Lutomirski 
> <l...@kernel.org>;
> Mohammed Gamal <mmo...@redhat.com>; Cathy Avery <cav...@redhat.com>; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org
> Subject: [PATCH 3/6] x86/hyper-v: reenlightenment notifications support
> 
> Hyper-V supports Live Migration notification. This is supposed to be used in 
> conjunction with
> TSC emulation: when we are migrated to a host with different TSC frequency 
> for some short
> period host emulates our accesses to TSC and sends us an interrupt to notify 
> about the event.
> When we're done updating everything we can disable TSC emulation and 
> everything will start
> working fast again.
> 
> We didn't need these notifications before as Hyper-V guests are not supposed 
> to use TSC as a
> clocksource: in Linux we even mark it as unstable on boot. Guests normally 
> use 'tsc page'
> clocksouce and host updates its values on migrations automatically.
> 
> Things change when we want to run nested virtualization: even when we pass 
> through PV
> clocksources (kvm-clock or tsc page) to our guests we need to know TSC 
> frequency and when it
> changes.
> 
> Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies
> EAX:BIT(12) of CPUID:0x4009 as the feature identification bit. The right 
> one to check is
> EAX:BIT(13) of CPUID:0x4003. I was assured that the fix in on the way.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com>

[snip]

> diff --git a/arch/x86/include/asm/irq_vectors.h
> b/arch/x86/include/asm/irq_vectors.h
> index 67421f649cfa..e71c1120426b 100644
> --- a/arch/x86/include/asm/irq_vectors.h
> +++ b/arch/x86/include/asm/irq_vectors.h
> @@ -103,7 +103,12 @@
>  #endif
> 
>  #define MANAGED_IRQ_SHUTDOWN_VECTOR  0xef
> -#define LOCAL_TIMER_VECTOR   0xee
> +
> +#if IS_ENABLED(CONFIG_HYPERV)
> +#define HYPERV_REENLIGHTENMENT_VECTOR0xee
> +#endif
> +
> +#define LOCAL_TIMER_VECTOR   0xed
> 
>  #define NR_VECTORS256

[snip]

Since you are pre-allocating a new vector, would you want to update the 
irq_cpustat_t
data structure and your interrupt handler to count the occurrences of these 
interrupts,
and update arch_show_interrupts() to show the count?  Then cat /proc/interrupts
will show the count.   The reenlightenment interrupts will presumably be rare, 
but so
are some of the others that are already counted and displayed, and it seems like
consistency should be maintained.

Michael

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2017-11-30 Thread Michael Kelley (EOSG)

Thomas Gleixner  writes:

> On Tue, 31 Oct 2017, mikel...@exchange.microsoft.com wrote:
> > diff --git a/arch/x86/include/uapi/asm/hyperv.h
> > b/arch/x86/include/uapi/asm/hyperv.h
> > index f65d125..408cf3e 100644
> > --- a/arch/x86/include/uapi/asm/hyperv.h
> > +++ b/arch/x86/include/uapi/asm/hyperv.h
> > @@ -112,6 +112,22 @@
> >  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE  (1 << 5)
> >  /* Guest crash data handler available */
> >  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE   (1 << 10)
> > +/* Debug MSRs available */
> > +#define HV_X64_DEBUG_MSR_AVAILABLE (1 << 11)
> > +/* Support for Non-Privileged Instruction Execution Prevention is 
> > available */
> > +#define HV_X64_NPIEP_AVAILABLE (1 << 12)
> > +/* Support for DisableHypervisor is available */
> > +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE(1 << 13)
> > +/* Extended GVA Ranges for Flush Virtual Address list is available */
> > +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE(1 << 14)
> > +/* Return Hypercall output via XMM registers is available */
> > +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE  (1 << 15)
> > +/* SINT polling mode available */
> > +#define HV_X64_SINT_POLLING_MODE_AVAILABLE (1 << 17)
> > +/* Hypercall MSR lock is available */
> > +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE(1 << 18)
> > +/* stimer direct mode is available */
> > +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE(1 << 19)
> 
> How are all these defines (except the last one related to that patch?

Will move to a separate patch.

> 
> > +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ
> > +is a fake
> > + * because stimer's in Direct Mode simply interrupt on the specified
> > +vector,
> > + * without using a particular IOAPIC pin. But we use the IRQ
> > +allocation
> > + * machinery, so we need a hardware IRQ #.  This value is somewhat
> > +arbitrary,
> > + * but it should not be a legacy IRQ (0 to 15), and should fit within
> > +the
> > + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So
> > +any value
> > + * between 16 and 23 should be good.
> > + */
> > +#define HV_STIMER0_IRQNR   18
> 
> Why would you want abuse an IOAPIC interrupt if all you need is a vector?

Allocating a vector up-front like the existing HYPERVISOR_CALLBACK_VECTOR would 
certainly be more straightforward, and in fact, my original internal testing of 
stimer Direct Mode used that approach.   Vectors are a limited resource, so I 
wanted to find a way to allocate one on-the-fly rather than use fixed 
pre-allocation, even under CONFIG_HYPERV.   But I've got no problem with 
allocating a vector up-front and skipping all the irq/APIC manipulation and 
related issues.

Any guidance on which vector to use?

> 
> > +/* Routines to do per-architecture handling of stimer0 when in Direct
> > +Mode */
> > +
> > +void hv_ack_stimer0_interrupt(struct irq_desc *desc) {
> > +   ack_APIC_irq();
> > +}
> > +
> > +static void allonline_vector_allocation_domain(int cpu, struct cpumask 
> > *retmask,
> > +   const struct cpumask *mask)
> > +{
> > +   cpumask_copy(retmask, cpu_online_mask); }
> > +
> > +int hv_allocate_stimer0_irq(int *irq, int *vector) {
> > +   int localirq;
> > +   int result;
> > +   struct irq_data *irq_data;
> > +
> > +   /* The normal APIC vector allocation domain allows allocation of
> > +vectors
> 
> Please fix you comment style. Multi line comments are:
> 
>/*
> * Bla
>   * foo...
>   */
> 

Will do.

> > +* only for the calling CPU.  So we change the allocation domain to one
> > +* that allows vectors to be allocated in all online CPUs.  This
> > +* change is fine in a Hyper-V VM because VMs don't have the usual
> > +* complement of interrupting devices.
> > +*/
> > +   apic->vector_allocation_domain = allonline_vector_allocation_domain;
> 
> This does not work anymore. vector_allocation_domain is gone as of 4.15. 
> Please check the
> vector rework in
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic
> 
> Aside of that what guarantees that all cpus are online at the point where you 
> allocate that
> interrupt? Nothing, so the vector will be not reserved or allocated on 
> offline CPUs. Now guess
> what happens if you bring the offline CPUs online later, it will drown in 
> spurious interrupts.
> Worse, the vector can also be reused for a device interrupt. Great plan.
> 
> > +   localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR,
> > +   ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH);
> > +   if (localirq < 0) {
> > +   pr_err("Cannot register stimer0 gsi. Error %d", localirq);
> > +   return -1;
> > +   }
> > +
> > +   /* We pass in a dummy IRQ handler because architecture independent code
> > +* will later override the IRQ domain

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2017-11-30 Thread Michael Kelley (EOSG)

Vitaly Kuznetsov  writes:

> Vitaly Kuznetsov  writes:
> 
> > mikel...@exchange.microsoft.com writes:
> >
> >> From: Michael Kelley 
> >>
> >> The 2016 version of Hyper-V offers the option to operate the guest VM
> >> per-vcpu stimer's in Direct Mode, which means the timer interupts on
> >> its own vector rather than queueing a VMbus message.  Direct Mode
> >> reduces timer processing overhead in both the hypervisor and the
> >> guest, and avoids having timer interrupts pollute the VMbus interrupt
> >> stream for the synthetic NIC and storage.  This patch enables Direct
> >> Mode by default on stimer0 (which is the only stimer used in Linux on
> >> Hyper-V) when running on a version of Hyper-V that supports it, with
> >> a hv_vmbus module parameter for disabling Direct Mode and reverting
> >> to the old behavior.
> >>
> >> Signed-off-by: Michael Kelley 
> >> ---
> >>  arch/x86/include/asm/mshyperv.h| 14 
> >>  arch/x86/include/uapi/asm/hyperv.h | 26 ++
> >>  arch/x86/kernel/cpu/mshyperv.c | 64 +-
> >>  drivers/hv/hv.c| 71 
> >> --
> >>  drivers/hv/hyperv_vmbus.h  |  4 ++-
> >>  5 files changed, 175 insertions(+), 4 deletions(-)
> >>
> 
> (in addition to my previous comment)
> 
> This patch is also x86-heavy so it probably makes sense to make x86 
> maintainers (Thomas,
> Peter, Ingo) aware no matter which tree it will go through.
> 
> CC: x...@kernel.org
> 
> >> diff --git a/arch/x86/include/asm/mshyperv.h
> >> b/arch/x86/include/asm/mshyperv.h index 740dc97..1bba1d7 100644
> >> --- a/arch/x86/include/asm/mshyperv.h
> >> +++ b/arch/x86/include/asm/mshyperv.h
> >> @@ -4,6 +4,8 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >> +#include 
> >>  #include 
> >>  #include 
> >>
> >> @@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page 
> >> *hv_get_tsc_page(void)
> >>return NULL;
> >>  }
> >>  #endif
> >> +
> >> +/* Per architecture routines for stimer0 Direct Mode handling.  On
> >> +x86/x64
> >> + * there are no percpu actions to take.
> >> + */
> >> +#if IS_ENABLED(CONFIG_HYPERV)
> >> +static inline void hv_enable_stimer0_percpu_irq(int irq) { } static
> >> +inline void hv_disable_stimer0_percpu_irq(int irq) { } extern int
> >> +hv_allocate_stimer0_irq(int *irq, int *vector); extern void
> >> +hv_deallocate_stimer0_irq(int irq); extern void
> >> +hv_ack_stimer0_interrupt(struct irq_desc *desc); #endif
> >> +
> >>  #endif
> >> diff --git a/arch/x86/include/uapi/asm/hyperv.h
> >> b/arch/x86/include/uapi/asm/hyperv.h
> >> index f65d125..408cf3e 100644
> >> --- a/arch/x86/include/uapi/asm/hyperv.h
> >> +++ b/arch/x86/include/uapi/asm/hyperv.h
> >> @@ -112,6 +112,22 @@
> >>  #define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 << 5)
> >>  /* Guest crash data handler available */
> >>  #define HV_X64_GUEST_CRASH_MSR_AVAILABLE  (1 << 10)
> >> +/* Debug MSRs available */
> >> +#define HV_X64_DEBUG_MSR_AVAILABLE(1 << 11)
> >> +/* Support for Non-Privileged Instruction Execution Prevention is 
> >> available */
> >> +#define HV_X64_NPIEP_AVAILABLE(1 << 12)
> >> +/* Support for DisableHypervisor is available */
> >> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE   (1 << 13)
> >> +/* Extended GVA Ranges for Flush Virtual Address list is available */
> >> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE   (1 << 14)
> >> +/* Return Hypercall output via XMM registers is available */
> >> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (1 << 15)
> >> +/* SINT polling mode available */
> >> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE(1 << 17)
> >> +/* Hypercall MSR lock is available */
> >> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE   (1 << 18)
> >> +/* stimer direct mode is available */
> >> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE   (1 << 19)
> >>
> >>  /*
> >>   * Implementation recommendations. Indicates which behaviors the
> >> hypervisor @@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT {
> >>
> >>  #define HV_SYNIC_STIMER_COUNT (4)
> >>
> >> +/* Hardware IRQ number to use for stimer0 in Direct Mode.  This IRQ
> >> +is a fake
> >> + * because stimer's in Direct Mode simply interrupt on the specified
> >> +vector,
> >> + * without using a particular IOAPIC pin. But we use the IRQ
> >> +allocation
> >> + * machinery, so we need a hardware IRQ #.  This value is somewhat
> >> +arbitrary,
> >> + * but it should not be a legacy IRQ (0 to 15), and should fit
> >> +within the
> >> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So
> >> +any value
> >> + * between 16 and 23 should be good.
> >> + */
> >
> > (nitpick): all comments in the patch are in 'net' style:
> >
> >  /* This is a
> >   * comment.
> >   */
> >
> > While

RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0

2017-11-01 Thread Michael Kelley (EOSG)

It's fixed.

Michael

-Original Message-
From: Thomas Gleixner [mailto:t...@linutronix.de] 
Sent: Wednesday, November 1, 2017 9:20 AM
To: Michael Kelley (EOSG) <michael.h.kel...@microsoft.com>
Cc: gre...@linuxfoundation.org; LKML <linux-ker...@vger.kernel.org>; 
de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; 
vkuzn...@redhat.com; jasow...@redhat.com; leann.ogasaw...@canonical.com; 
marcelo.ce...@canonical.com; Stephen Hemminger <sthem...@microsoft.com>; KY 
Srinivasan <k...@microsoft.com>
Subject: Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode 
for stimer0

On Wed, 1 Nov 2017, Thomas Gleixner wrote:

> On Tue, 31 Oct 2017, mikel...@exchange.microsoft.com wrote:

   ^^^

Please fix your mail process. mikel...@exchange.microsoft.com is not your real 
mail address, but shows up as From and obviously the reply bounces:

  mikel...@exchange.microsoft.com
  Remote Server returned '550 5.1.10 RESOLVER.ADR.RecipientNotFound;
  Recipient not found by SMTP address lookup'

Sigh,

tglx
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

71 matches

Mail list logo