RE: [PATCH V3 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support
From: Tianyu Lan Sent: Wednesday, September 26, 2018 8:50 PM > > Hyper-V provides HvFlushGuestAddressList() hypercall to flush EPT tlb > with specified ranges. This patch is to add the hypercall support. > > Signed-off-by: Lan Tianyu > Looks good! Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH -next] x86/hyper-v: Remove unused including
From: YueHaibing Sent: Sunday, September 23, 2018 1:20 AM > Remove including that don't need it. > > Signed-off-by: YueHaibing > --- > arch/x86/hyperv/hv_apic.c | 1 - > 1 file changed, 1 deletion(-) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()
From: Tianyu Lan Sent: Thursday, September 20, 2018 7:30 AM > On 9/20/2018 12:08 AM, Michael Kelley (EOSG) wrote: > > From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM > >> + > >> + if (ret && kvm_available_flush_tlb_with_range()) { > >> + kvm_flush_remote_tlbs_with_address(kvm, > >> + gfn_start, > >> + gfn_end - gfn_start); > > > > Does the above need to be gfn_end - gfn_start + 1? > > The flush range depends on the input parameter frame start and frame end > of for_each_slot_rmap_range(). > > for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL, > PT_MAX_HUGEPAGE_LEVEL, > gfn_start, gfn_end - 1, > &iterator) > ret |= handler(kvm, iterator.rmap, memslot, > iterator.gfn, iterator.level, data); > > > The start is "gfn_start" and the end is "gfn_end - 1". The flush size is > (gfn_end - 1) - gfn_start + 1 = gfn_end - gfn_start. > Got it. I agree. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support
From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM > > #include > #include > #include > #include > +#include Hopefully asm/kvm_host.h does not need to be #included, given the new code structure. > > #include > > +/* > + * MAX_FLUSH_PAGES = "additional_pages" + 1. It's limited > + * by the bitwidth of "additional_pages" in union hv_gpa_page_range. > + */ > +#define MAX_FLUSH_PAGES (2048) > + > +/* > + * All input flush parameters are in single page. The max flush count > + * is equal with how many entries of union hv_gpa_page_range can be > + * populated in the input parameter page. MAX_FLUSH_REP_COUNT > + * = (4096 - 16) / 8. (“Page Size” - "Address Space" - "Flags") / > + * "GPA Range". > + */ > +#define MAX_FLUSH_REP_COUNT (510) > + I would recommend putting the above two definitions in hyperv-tlfs.h. They are directly tied to the data structures defined by Hyper-V in the TLFS. Put MAX_FLUSH_PAGES immediately after the definition for hv_gpa_page_range so that the dependency is obvious. For MAX_FLUSH_REP_COUNT, can you do the calculation in the #define rather than just in the comment? Alternatively, define the gpa_list[] array to be of MAX_FLUSH_REP_COUNT size, and then add a compile time assert that the size of struct hv_guest_mapping_flush_list is exactly one page in size. It's just a good way to use the compiler to help check for mistakes. Also prefix them both with HV_ since they will be more globally visible as part of hyperv-tlfs.h. > int hyperv_flush_guest_mapping(u64 as) > { > struct hv_guest_mapping_flush **flush_pcpu; > @@ -54,3 +71,89 @@ int hyperv_flush_guest_mapping(u64 as) > return ret; > } > EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping); > + > +static int fill_flush_list(union hv_gpa_page_range gpa_list[], > + int offset, u64 start_gfn, u64 pages) > +{ > + int gpa_n = offset; > + u64 cur = start_gfn; > + u64 additional_pages; > + > + do { > + if (gpa_n >= MAX_FLUSH_REP_COUNT) { > + pr_warn("Request exceeds HvFlushGuestList max flush > count."); > + return -ENOSPC; I wonder if the warning is really needed. When the error is returned up through the higher levels of code, won't the higher levels just fallback to the non-enlightened flush code? So nothing is actually goes wrong; it's just that a slower code path gets taken. A comment about such expectations might be helpful. > + } > + > + if (pages > MAX_FLUSH_PAGES) { > + additional_pages = MAX_FLUSH_PAGES - 1; > + pages -= MAX_FLUSH_PAGES; > + } else { > + additional_pages = pages - 1; > + pages = 0; > + } The above code is really doing: additional_pages = min(pages, MAX_FLUSH_PAGES) - 1; pages -= additional_pages + 1; And you might want to move the decrement of 'pages' down to the bottom of the loop where you update the other loop variables. > + > + gpa_list[gpa_n].page.additional_pages = additional_pages; > + gpa_list[gpa_n].page.largepage = false; > + gpa_list[gpa_n].page.basepfn = cur; > + > + cur += additional_pages + 1; > + gpa_n++; > + } while (pages > 0); > + > + return gpa_n; > +} > + > +int hyperv_flush_guest_mapping_range(u64 as, struct hyperv_tlb_range *range) > +{ > + struct hv_guest_mapping_flush_list **flush_pcpu; > + struct hv_guest_mapping_flush_list *flush; > + u64 status = 0; > + unsigned long flags; > + int ret = -ENOTSUPP; > + int gpa_n = 0; > + > + if (!hv_hypercall_pg) > + goto fault; > + > + local_irq_save(flags); > + > + flush_pcpu = (struct hv_guest_mapping_flush_list **) > + this_cpu_ptr(hyperv_pcpu_input_arg); > + > + flush = *flush_pcpu; > + if (unlikely(!flush)) { > + local_irq_restore(flags); > + goto fault; > + } > + > + flush->address_space = as; > + flush->flags = 0; > + > + if (!range->flush_list) > + gpa_n = fill_flush_list(flush->gpa_list, gpa_n, > + range->start_gfn, range->pages); > + else if (range->parse_flush_list_func) > + gpa_n = range->parse_flush_list_func(flush->gpa_list, gpa_n, > + range->flush_list, fill_flush_list); > + else > + gpa_n = -1; > + > + if (gpa_n < 0) { > + local_irq_restore(flags); > + goto fault; > + } > + > + status = hv_do_rep_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST, > + gpa_n, 0, flush, NULL); > + > + local_irq_restore(flags); > + > + if (!(status & HV_HYPERCALL_RESULT_MASK)) > + ret = 0; > + else > + ret = status; > +fault: > + return ret; > +} > +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping_r
RE: [PATCH V2 4/13] KVM/MMU: Flush tlb directly in the kvm_handle_hva_range()
From: Tianyu Lan Sent: Monday, September 17, 2018 8:19 PM > + > + if (ret && kvm_available_flush_tlb_with_range()) { > + kvm_flush_remote_tlbs_with_address(kvm, > + gfn_start, > + gfn_end - gfn_start); Does the above need to be gfn_end - gfn_start + 1? > + ret = 0; > + } Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 2/13] KVM/MMU: Add tlb flush with range helper function
From: Tianyu Lan Sent: Monday, September 17, 2018 8:18 PM > > +static void kvm_flush_remote_tlbs_with_range(struct kvm *kvm, > + struct kvm_tlb_range *range) > +{ > + int ret = -ENOTSUPP; > + > + if (range && kvm_x86_ops->tlb_remote_flush_with_range) { > + /* > + * Read tlbs_dirty before setting KVM_REQ_TLB_FLUSH in > + * kvm_make_all_cpus_request. > + */ > + long dirty_count = smp_load_acquire(&kvm->tlbs_dirty); > + > + ret = kvm_x86_ops->tlb_remote_flush_with_range(kvm, range); > + cmpxchg(&kvm->tlbs_dirty, dirty_count, 0); > + } The comment and the code that manipulates kvm->tlbs_dirty appears to have been copied from kvm_flush_remote_tlbs(). But the above code doesn't call kvm_make_all_cpus_request(). I haven't traced all the details, but it seems like the comment should be updated, or the code isn't needed. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] x86/hyperv: suppress "PCI: Fatal: No config space access function found"
From: Dexuan Cui Sent: Tuesday, September 18, 2018 3:30 PM > > A Generatin-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and > users always see the scary warning, which is actually harmless. The patch > is made to suppress the warning. > > Signed-off-by: Dexuan Cui > Cc: K. Y. Srinivasan > Cc: Haiyang Zhang > Cc: Stephen Hemminger > --- > arch/x86/hyperv/hv_init.c | 19 +++ > 1 file changed, 19 insertions(+) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] Drivers: hv: vmbus: include header for get_irq_regs()
From Sebastian Andrzej Siewior Sent: Thursday, August 30, 2018 12:55 AM > > On !RT the header file get_irq_regs() gets pulled in via other header files. > On > RT it does not and the build fails: > > drivers/hv/vmbus_drv.c:975 implicit declaration of function > ‘get_irq_regs’ [- > Werror=implicit-function-declaration] > drivers/hv/hv.c:115 implicit declaration of function ‘get_irq_regs’ > [-Werror=implicit- > function-declaration] > > Add the header file for get_irq_regs() in a common header so it used by > vmbus_drv.c by hv.c for their get_irq_regs() usage. > get_irq_regs() is not used explicitly in either vmbus_drv.c or in hv.c. And I couldn't make the line numbers in the errors above line up with anything in the source code that might be implicitly using get_irq_regs(). Is it the calls to add_interrupt_randomness()? Did you figure out exactly what line of code is causing the compile error? I'm wondering whether adding the #include of irq.h into hyperv_vmbus.h is really the right solution. More correct might be to have the file where get_irq_regs() is actually used to #include irq_regs.h. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 10/13] x86/hyper-v: Add HvFlushGuestAddressList hypercall support
From: Tianyu Lan Sent: Monday, September 10, 2018 1:39 AM > + > +int hyperv_flush_guest_mapping_range(u64 as, struct kvm_tlb_range *range) I'm really concerned about defining the Hyper-V function to flush guest mappings in terms of a KVM struct definition. Your patch puts this function in arch/x86/hyperv/nested.c. I haven't investigated all the details, but on its face this approach seems like it would cause trouble in the long run, and it doesn't support the case of a hypervisor other than KVM running at L1. I know that KVM code has taken a dependency on Hyper-V types and code, but that's because KVM is emulating a lot of Hyper-V functionality and it's taking advantage of Hyper-V enlightenments. Is there a top level reason I haven't thought of for Hyper-V code to take a dependency on KVM definitions? I would think we want Hyper-V code to be generic, using Hyper-V data structure definitions. Then in keeping with what's already been done, KVM code would use those definitions where it needs to make calls to Hyper-V code. > +{ > + struct kvm_mmu_page *sp; > + struct hv_guest_mapping_flush_list **flush_pcpu; > + struct hv_guest_mapping_flush_list *flush; > + u64 status = 0; > + unsigned long flags; > + int ret = -ENOTSUPP; > + int gpa_n = 0; > + > + if (!hv_hypercall_pg) > + goto fault; > + > + local_irq_save(flags); > + > + flush_pcpu = (struct hv_guest_mapping_flush_list **) > + this_cpu_ptr(hyperv_pcpu_input_arg); > + > + flush = *flush_pcpu; > + if (unlikely(!flush)) { > + local_irq_restore(flags); > + goto fault; > + } > + > + flush->address_space = as; > + flush->flags = 0; > + > + if (!range->flush_list) { > + gpa_n = fill_flush_list(flush->gpa_list, gpa_n, > + range->start_gfn, range->end_gfn); > + } else { > + list_for_each_entry(sp, range->flush_list, > + flush_link) { > + u64 end_gfn = sp->gfn + > + KVM_PAGES_PER_HPAGE(sp->role.level) - 1; > + gpa_n = fill_flush_list(flush->gpa_list, gpa_n, > + sp->gfn, end_gfn); > + } Per the previous comment, if this loop really needs to walk a KVM data structure, look for a different way to organize things so that the handling of KVM-specific data structures is in code that’s part of KVM, rather than in Hyper-V code. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v2 2/4] arm64: hyperv: Add support for Hyper-V as a hypervisor
From: KY Srinivasan Sent: Thursday, August 30, 2018 11:51 AM > > + /* Allocate percpu VP index */ > > + hv_vp_index = kmalloc_array(num_possible_cpus(), > > sizeof(*hv_vp_index), > > + GFP_KERNEL); > > + if (!hv_vp_index) > > + return 1; > > + > We should perhaps set the array so the contents are invalid so we can > correctly > handle enlightenments for TL shootdown and IPI. Agreed. Will add the initialization in v3 of the patch. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v2 1/4] arm64: hyperv: Add core Hyper-V include files
From: KY Srinivasan Sent: Thursday, August 30, 2018 11:23 AM > > +/* > > + * This file contains definitions from the Hyper-V Hypervisor Top-Level > > + * Functional Specification (TLFS): > > + * > > https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs > > + > A lot of TLFS definitions are ISA independent and we are duplicating these > definitions both for X86_64 and ARM_64. Perhaps we should look at splitting > this file into a common and ISA specific header file. I agree that we want to end up with x86_64 and ARM64 ISA dependent files that include an ISA independent file. My thinking was to not make that separation now, for a couple of reasons: 1) We don't have a Hyper-V TLFS that is explicit about what should be considered ISA independent and ISA dependent. I can make some reasonable guesses, but it will be subject to change as the Hyper-V team firms up the interface and decides what they want to commit to. 2) Some of the things defined in the TLFS have names that are x86-specific (TSC, MSR, etc.). For the ISA independent parts, those names should be more generic, which is another dependency on the Hyper-V team defining the ISA independent parts of the TLFS. My judgment was that we'll end up with less perturbation overall to go with this cloned version of hyperv-tlfs.h for now, and then come back and do the separation once we have a definitive TLFS to base it on. But it's a judgment call, and if the sense is that we should do the separation now, I can give it a try. > > +#define HvRegisterHypervisorVersion0x0100 /*CPUID > > 0x4002 */ > > +#defineHvRegisterPrivilegesAndFeaturesInfo0x0200 /*CPUID > > 0x4003 */ > > +#defineHvRegisterFeaturesInfo0x0201 > > /*CPUID 0x4004 */ > > +#defineHvRegisterImplementationLimitsInfo0x0202 /*CPUID > > 0x4005 */ > > +#define HvARM64RegisterInterfaceVersion0x00090006 /*CPUID > > 0x4001 */ > > Can we avoid the mixed case names. Agreed. I'll fix this throughout to use all uppercase, with underscore as the word separator. > > + * Linux-specific definitions for managing interactions with Microsoft's > > + * Hyper-V hypervisor. Definitions that are specified in the Hyper-V > > + * Top Level Functional Spec (TLFS) should not go in this file, but > > + * should instead go in hyperv-tlfs.h. > > Would it make sense to breakup this header file into ISA independent and > dependent files? Yes, as above I agree the separation make sense. And since this file is tied To Linux and not to the Hyper-V TLFS, the separation isn't affected by the TLFS issues mentioned above. I'll give it a try and see if any issues arise. > > +/* > > + * Define the IRQ numbers/vectors used by Hyper-V VMbus interrupts > > + * and by STIMER0 Direct Mode interrupts. Hyper-V should be supplying > > + * these values through ACPI, but there are no other interrupting > > + * devices in a Hyper-V VM on ARM64, so it's OK to hard code for now. > > + * The "CALLBACK_VECTOR" terminology is a left-over from the x86/x64 > > + * world that is used in architecture independent Hyper-V code. > > + */ > When we have direct device assignment for ARM-64 guests, can we still > hardcode. Yes, we can still hardcode. These values are in the Per-Processor Interrupt (PPI) range of 16 to 31. Any IRQ numbers assigned to a Discrete Device Assignment (DDA) device will be in the Shared Peripheral Interrupt (SPI) range of 32-1019 or the Locality-specific Peripheral Interrupt (LPI) range of greater than 8192. The handling of DDA interrupts is still under discussion with the Hyper-V team, but there won't be any conflicts with the PPI values that are hardcoded here. > > +/* > > + * The guest OS needs to register the guest ID with the hypervisor. > > + * The guest ID is a 64 bit entity and the structure of this ID is > > + * specified in the Hyper-V specification: > > + * > > + * msdn.microsoft.com/en- > > us/library/windows/hardware/ff542653%28v=vs.85%29.aspx > > + * > > + * While the current guideline does not specify how Linux guest ID(s) > > + * need to be generated, our plan is to publish the guidelines for > > + * Linux and other guest operating systems that currently are hosted > > + * on Hyper-V. The implementation here conforms to this yet > > + * unpublished guidelines. > > + * > > + * > > + * Bit(s) > > + * 63 - Indicates if the OS is Open Source or not; 1 is Open Source > > + * 62:56 - Os Type; Linux is 0x100 > > + * 55:48 - Distro specific identification > > + * 47:16 - Linux kernel version number > > + * 15:0 - Distro specific identification > > + * > > + * Generate the guest ID based on the guideline described above. > > + */ > > No need to repeat the above block comment (already included in the TLFS > header). Agreed. Will make the change in v3 of the patch. > > +/* Free the message slot and signal end-of-message if required */ > > +static inline void vmbus_signal_eom(struct hv_message *msg, u32 > > old_msg_type) > > +{ > > +/* > > +
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Make synic_initialized flag per-cpu
From: Vitaly Kuznetsov Sent: Wednesday, August 1, 2018 2:26 AM > > I was trying to decide if there are any arguments in favor of one > > approach vs. the other: a per-cpu flag in memory or checking > > the synic_control "enable" bit. Seems like a wash to me, in which > > case I have a slight preference for the per-cpu flag in memory vs. > > creating another function to return sctrl.enable. But I'm completely > > open to reasons why checking sctrl.enable is better. > > Just a few thoughts: reading MSR is definitely slower but we avoid > 'shadowing' the state, the reading is always correct. In case there's a > chance the SynIC will get disabled from host side we can only find this > out by doing MSR read. This is a purely theoretical possibility, I > believe, we can go ahead with this patch. Vitaly -- just to confirm: you are OK with the patch as is? (I'll check, but I may need to rebase on the latest code.) Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/5] vmbus: add driver_override support
From: Stephen Hemminger Sent: Tuesday, August 14, 2018 9:35 AM > On Mon, 13 Aug 2018 19:30:50 + > "Michael Kelley (EOSG)" wrote: > > > > +/* > > > + * Return a matching hv_vmbus_device_id pointer. > > > + * If there is no match, return NULL. > > > + */ > > > +static const struct hv_vmbus_device_id *hv_vmbus_get_id(struct hv_driver > > > *drv, > > > + struct hv_device *dev) > > > +{ > > > + const uuid_le *guid = &dev->dev_type; > > > + const struct hv_vmbus_device_id *id; > > > > > > - return NULL; > > > + /* When driver_override is set, only bind to the matching driver */ > > > + if (dev->driver_override && strcmp(dev->driver_override, drv->name)) > > > + return NULL; > > > > This function needs to be covered by the device lock, so that > > dev->driver_override can't be set to NULL and the memory freed > > during the above 'if' statement. When called from vmbus_probe(), > > the device lock is held, so it's good. But when called from > > vmbus_match(), the device lock may not be held: consider the path > > __driver_attach() -> driver_match_device() -> vmbus_match(). > > The function hv_vmbus_get_id is called from that path. > i.e. __device_attach -> driver-match_device -> vmbus_match. > and __device_attach always does: > device_lock(dev); Agreed. The __device_attach() path holds the device lock and all is good. But the __driver_attach() path does not hold the device lock when the match function is called, leaving the code open to a potential race. Same problem could happen in the pci subsystem, so the issue is more generic and probably should be evaluated and dealt with separately. Michael > > The code in driver _override_store uses the same device_lock > when storing the new value. > > This is same locking as is done in pci-sysfs.c ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/5] vmbus: add driver_override support
From: k...@linuxonhyperv.com Sent: Friday, August 10, 2018 4:06 PM > From: Stephen Hemminger > > Add support for overriding the default driver for a VMBus device > in the same way that it can be done for PCI devices. This patch > adds the /sys/bus/vmbus/devices/.../driver_override file > and the logic for matching. > > This is used by driverctl tool to do driver override. > https://gitlab.com/driverctl/driverctl > > Signed-off-by: Stephen Hemminger > Signed-off-by: K. Y. Srinivasan > --- > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index b1b548a21f91..e6d8fdac6d8b 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -498,6 +498,54 @@ static ssize_t device_show(struct device *dev, > } > static DEVICE_ATTR_RO(device); > > +static ssize_t driver_override_store(struct device *dev, > + struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + struct hv_device *hv_dev = device_to_hv_device(dev); > + char *driver_override, *old, *cp; > + > + /* We need to keep extra room for a newline */ > + if (count >= (PAGE_SIZE - 1)) > + return -EINVAL; Does 'count' actually have a relationship to PAGE_SIZE, or is PAGE_SIZE just used as an arbitrary size limit? I'm wondering what happens on ARM64 with a 64K page size, for example. If it's just arbitrary, coding such a constant would be better. > +/* > + * Return a matching hv_vmbus_device_id pointer. > + * If there is no match, return NULL. > + */ > +static const struct hv_vmbus_device_id *hv_vmbus_get_id(struct hv_driver > *drv, > + struct hv_device *dev) > +{ > + const uuid_le *guid = &dev->dev_type; > + const struct hv_vmbus_device_id *id; > > - return NULL; > + /* When driver_override is set, only bind to the matching driver */ > + if (dev->driver_override && strcmp(dev->driver_override, drv->name)) > + return NULL; This function needs to be covered by the device lock, so that dev->driver_override can't be set to NULL and the memory freed during the above 'if' statement. When called from vmbus_probe(), the device lock is held, so it's good. But when called from vmbus_match(), the device lock may not be held: consider the path __driver_attach() -> driver_match_device() -> vmbus_match(). Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/5] Tools: hv: Fix a bug in the key delete code
From: k...@linuxonhyperv.com Sent: Friday, August 10, 2018 4:06 PM > > Fix a bug in the key delete code - the num_records range > from 0 to num_records-1. > > Signed-off-by: K. Y. Srinivasan > Reported-by: David Binderman > Cc: > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Make synic_initialized flag per-cpu
From: Vitaly Kuznetsov Sent: Tuesday, July 31, 2018 4:20 AM > > Reviewed-by: Vitaly Kuznetsov Thanks for the review > > Alternatively, we can get rid of synic_initialized flag altogether: > hv_synic_init() never fails in the first place but we can always > implement something like: > > int hv_synic_is_initialized(void) { > union hv_synic_scontrol sctrl; > > hv_get_synic_state(sctrl.as_uint64); > > return sctrl.enable; > } > > as it doesn't seem that we need to check synic state on _other_ CPUs. > > -- > Vitaly I was trying to decide if there are any arguments in favor of one approach vs. the other: a per-cpu flag in memory or checking the synic_control "enable" bit. Seems like a wash to me, in which case I have a slight preference for the per-cpu flag in memory vs. creating another function to return sctrl.enable. But I'm completely open to reasons why checking sctrl.enable is better. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] Drivers: hv: vmbus: Reset the channel callback in vmbus_onoffer_rescind()
From: Dexuan Cui Sent: Thursday, July 12, 2018 10:53 PM > > Before setting channel->rescind in vmbus_rescind_cleanup(), we should make > sure the channel callback won't run any more, otherwise a high-level > driver like pci_hyperv, which may be infinitely waiting for the host VSP's > response and notices the channel has been rescinded, can't safely give > up: e.g., in hv_pci_protocol_negotiation() -> wait_for_response(), it's > unsafe to exit from wait_for_response() and proceed with the on-stack > variable "comp_pkt" popped. The issue was originally spotted by > Michael Kelley . > > In vmbus_close_internal(), the patch also minimizes the range protected by > disabling/enabling channel->callback_event: we don't really need that for > the whole function. > > Signed-off-by: Dexuan Cui > Cc: sta...@vger.kernel.org > Cc: K. Y. Srinivasan > Cc: Stephen Hemminger > Cc: Michael Kelley Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/1] Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic
From: Sunil Muthuswamy Sent: Wednesday, July 11, 2018 9:59 AM > Thanks, Michael. In which branch should I fix these now that the changes have > been > merged with the char-misc-next branch? If the original code is already in char-misc-next, you should probably submit a completely new patch for char-misc-next that just makes the fixes to the original code. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/1] Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic
>From k...@linuxonhyperv.com Sent: Saturday, July 7, >2018 7:57 PM > > From: Sunil Muthuswamy > > In the VM mode on Hyper-V, currently, when the kernel panics, an error > code and few register values are populated in an MSR and the Hypervisor > notified. This information is collected on the host. The amount of > information currently collected is found to be limited and not very > actionable. To gather more actionable data, such as stack trace, the > proposal is to write one page worth of kmsg data on an allocated page > and the Hypervisor notified of the page address through the MSR. > > - Sysctl option to control the behavior, with ON by default. > > Cc: K. Y. Srinivasan > Cc: Stephen Hemminger > Signed-off-by: Sunil Muthuswamy > Signed-off-by: K. Y. Srinivasan > --- > + /* > + * Write dump contents to the page. No need to synchronize; panic should > + * be single-threaded. > + */ > + if (!kmsg_dump_get_buffer(dumper, true, hv_panic_page, > + PAGE_SIZE, &bytes_written)) { > + pr_err("Hyper-V: Unable to get kmsg data for panic\n"); > + return; >From what I can see, the return value from kmsg_dump_get_buffer() is not an indication of success or failure -- it's an indication of whether there is more data available. There's no reason to output an error message. > @@ -1065,6 +1136,32 @@ static int vmbus_bus_init(void) >* Only register if the crash MSRs are available >*/ > if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE) { > + u64 hyperv_crash_ctl; > + /* > + * Sysctl registration is not fatal, since by default > + * reporting is enabled. > + */ > + hv_ctl_table_hdr = register_sysctl_table(hv_root_table); > + if (!hv_ctl_table_hdr) > + pr_err("Hyper-V: sysctl table register error"); > + > + /* > + * Register for panic kmsg callback only if the right > + * capability is supported by the hypervisor. > + */ > + rdmsrl(HV_X64_MSR_CRASH_CTL, hyperv_crash_ctl); > + if (hyperv_crash_ctl & HV_CRASH_CTL_CRASH_NOTIFY_MSG) { vmbus_drv.c is architecture independent code, and should not be referencing x86/x64 MSRs. Reading the MSR (and maybe the test as well?) should go in a separate function in an x86-specific source file. And just to confirm, is this the right way to test for the feature? Usually, feature determination is based on one of the feature registers. The NOTIFY_MSG flag seems to have a dual meaning -- on read it indicates the feature is present. On write in hyperv_report_panic_msg(), it evidently means that the guest is sending a full page of data to Hyper-V. > @@ -1081,6 +1178,11 @@ static int vmbus_bus_init(void) > bus_unregister(&hv_bus); > + free_page((unsigned long)hv_panic_page); > + if (!hv_ctl_table_hdr) { The above test is backwards. Remove the bang. > @@ -1785,10 +1887,18 @@ static void __exit vmbus_exit(void) > + free_page((unsigned long)hv_panic_page); > + if (!hv_ctl_table_hdr) { Same here. Test is backwards. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 1/5] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support
From: Tianyu Lan Monday, July 9, 2018 2:03 AM > Hyper-V supports a pv hypercall HvFlushGuestPhysicalAddressSpace to > flush nested VM address space mapping in l1 hypervisor and it's to > reduce overhead of flushing ept tlb among vcpus. This patch is to > implement it. > > Signed-off-by: Lan Tianyu > --- > arch/x86/hyperv/Makefile | 2 +- > arch/x86/hyperv/nested.c | 64 > ++ > arch/x86/include/asm/hyperv-tlfs.h | 8 + > arch/x86/include/asm/mshyperv.h| 2 ++ > 4 files changed, 75 insertions(+), 1 deletion(-) > create mode 100644 arch/x86/hyperv/nested.c > +#include > +#include > +#include > +#include > + > +int hyperv_flush_guest_mapping(u64 as) > +{ > + struct hv_guest_mapping_flush **flush_pcpu; > + struct hv_guest_mapping_flush *flush; > + u64 status; > + unsigned long flags; > + int ret = -EFAULT; > + > + if (!hv_hypercall_pg) > + goto fault; > + > + local_irq_save(flags); > + > + flush_pcpu = (struct hv_guest_mapping_flush **) > + this_cpu_ptr(hyperv_pcpu_input_arg); > + > + flush = *flush_pcpu; > + > + if (unlikely(!flush)) { > + local_irq_restore(flags); > + goto fault; > + } > + > + flush->address_space = as; > + flush->flags = 0; > + > + status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE, > + flush, NULL); Did you consider using a "fast" hypercall? Unless there's some reason I'm not aware of, a "fast" hypercall would be perfect here as there are 16 bytes of input and no output. Vitaly recently added hv_do_fast_hypercall16() in the linux-next tree. See __send_ipi_mask() in hv_apic.c in linux-next for an example of usage. With a fast hypercall, you don't need the code for getting the per-cpu input arg or the code for local irq save/restore, so the code that is left is a lot faster and simpler. Michael > + local_irq_restore(flags); > + > + if (!(status & HV_HYPERCALL_RESULT_MASK)) > + ret = 0; > + > +fault: > + return ret; > +} > +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping); ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/2] x86/hyper-v: check for VP_INVAL in hyperv_flush_tlb_others()
From: Vitaly Kuznetsov Monday, July 9, 2018 10:40 AM > Commit 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI > enlightenment") pre-filled hv_vp_index with VP_INVAL so it is now > (theoretically) possible to observe hv_cpu_number_to_vp_number() > returning VP_INVAL. We need to check for that in hyperv_flush_tlb_others(). > > Not checking for VP_INVAL on the first call site where we do > > if (hv_cpu_number_to_vp_number(cpumask_last(cpus)) >= 64) > goto do_ex_hypercall; > > is OK, in case we're eligible for non-ex hypercall we'll catch the > issue later in for_each_cpu() cycle and in case we'll be doing ex- > hypercall cpumask_to_vpset() will fail. > > It would be nice to change hv_cpu_number_to_vp_number() return > value's type to 'u32' but this will likely be a bigger change as > all call sites need to be checked first. > > Fixes: 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI > enlightenment") > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/hyperv/mmu.c | 5 + > 1 file changed, 5 insertions(+) Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/2] x86/hyper-v: check cpumask_to_vpset() return value in hyperv_flush_tlb_others_ex()
From: Vitaly Kuznetsov Monday, July 9, 2018 10:40 AM > Commit 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI > enlightenment") made cpumask_to_vpset() return '-1' when there is a CPU > with unknown VP index in the supplied set. This needs to be checked before > we pass 'nr_bank' to hypercall. > > Fixes: 1268ed0c474a ("x86/hyper-v: Fix the circular dependency in IPI > enlightenment") > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/hyperv/mmu.c | 2 ++ > 1 file changed, 2 insertions(+) Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/1] X86/Hyper-V:: Fix the circular dependency in IPI enlightenment.
> -Original Message- > From: Thomas Gleixner > Sent: Friday, July 6, 2018 3:42 AM > To: KY Srinivasan > Cc: Ingo Molnar ; x...@kernel.org; > gre...@linuxfoundation.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Subject: RE: [PATCH 1/1] X86/Hyper-V:: Fix the circular dependency in IPI > enlightenment. > > On Fri, 6 Jul 2018, Thomas Gleixner wrote: > > On Fri, 6 Jul 2018, KY Srinivasan wrote: > > > > > > > > The problem is that the wreckage is in Linus tree and needs to be fixed > > > > there, i.e. via x86/urgent. > > > > > > > > Now we have the new bits queued in x86/hyperv already which collide. So > > > > we > > > > need to merge x86/urgent into x86/hyperv after applying the fix and mop > > > > up > > > > the merge wreckage in x86/hyperv. > > > > > > > > I'll have a look tomorrow morning unless you beat me to it. > > > > > > I can rebase this patch against the latest tip and resend (tomorrow). > > > > That doesn't help as we need to push the original fix to Linus ... > > Applied it to x86/urgent and fixed up the '-1' sloppy hack as pointed out > by Vitaly and merged x86/urgent into x86/hyperv. > > Please check both branches for correctness. Changes look good to me. Some pre-existing signed/unsigned type sloppiness is still there in that hv_cpu_number_to_vp_number() should be 'u32' instead of 'int', along with related local variables, but that's probably best to fix in linux-next on top of Vitaly's other changes. This code will work. Michael > > Thanks, > > tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 3/4] x86/hyper-v: use cheaper HVCALL_SEND_IPI hypercall when possible
> -Original Message- > From: Vitaly Kuznetsov > Sent: Friday, June 22, 2018 10:06 AM > To: x...@kernel.org > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan > ; Haiyang Zhang ; Stephen > Hemminger > ; Thomas Gleixner ; Ingo Molnar > ; H. Peter Anvin ; Tianyu Lan > ; Michael Kelley (EOSG) > > Subject: [PATCH 3/4] x86/hyper-v: use cheaper HVCALL_SEND_IPI hypercall when > possible > > When there is no need to send an IPI to a CPU with VP number > 64 > we can do the job with fast HVCALL_SEND_IPI hypercall. > > Signed-off-by: Vitaly Kuznetsov Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/4] x86/hyper-v: use 'fast' hypercall for HVCALL_SEND_IPI
> -Original Message- > From: Vitaly Kuznetsov > Sent: Friday, June 22, 2018 10:06 AM > To: x...@kernel.org > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan > ; Haiyang Zhang ; Stephen > Hemminger > ; Thomas Gleixner ; Ingo Molnar > ; H. Peter Anvin ; Tianyu Lan > ; Michael Kelley (EOSG) > > Subject: [PATCH 2/4] x86/hyper-v: use 'fast' hypercall for HVCALL_SEND_IPI > > Current Hyper-V TLFS (v5.0b) claims that HvCallSendSyntheticClusterIpi > hypercall can't be 'fast' (passing parameters through registers) but > apparently this is not true, Windows always uses 'fast' version. We can > do the same in Linux too. > > Signed-off-by: Vitaly Kuznetsov Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] x86/hyper-v: use cheaper HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} hypercalls when possible
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Vitaly Kuznetsov > Sent: Friday, June 15, 2018 9:30 AM > To: x...@kernel.org > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; KY Srinivasan > ; Haiyang Zhang ; Stephen > Hemminger > ; Thomas Gleixner ; Ingo Molnar > ; H. Peter Anvin ; Tianyu Lan > > Subject: [PATCH] x86/hyper-v: use cheaper > HVCALL_FLUSH_VIRTUAL_ADDRESS_{LIST,SPACE} > hypercalls when possible > > While working on Hyper-V style PV TLB flush support in KVM I noticed that > real Windows guests use TLB flush hypercall in a somewhat smarter way: when > the flush needs to be performed on a subset of first 64 vCPUs or on all > present vCPUs Windows avoids more expensive hypercalls which support > sparse CPU sets and uses their 'cheap' counterparts. This means that > HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED name is actually a misnomer: EX > hypercalls (which support sparse CPU sets) are "available", not > "recommended". This makes sense as they are actually harder to parse. > > Nothing stops us from being equally 'smart' in Linux too. Switch to > doing cheaper hypercalls whenever possible. > > Signed-off-by: Vitaly Kuznetsov > --- This is a good idea. We should probably do the same with the hypercalls for sending IPIs -- try the simpler version first and move to the more complex _EX version only if necessary. A complication: We've recently found a problem with the code for doing IPI hypercalls, and the bug affects the TLB flush code as well. As secondary CPUs are started, there's a window of time where the hv_vp_index entry for a secondary CPU is uninitialized. We are seeing IPIs happening in that window, and the IPI hypercall code uses the uninitialized hv_vp_index entry. Same thing could happen with the TLB flush hypercall code. I didn't actually see any occurrences of the TLB case in my tracing, but we should fix it anyway in case a TLB flush gets added at some point in the future. KY has a patch coming. In the patch, hv_cpu_number_to_vp_number() and cpumask_to_vpset() can both return U32_MAX if they encounter an uninitialized hv_vp_index entry, and the code needs to be able to bail out to the native functions for that particular IPI or TLB flush operation. Once the initialization of secondary CPUs is complete, the uninitialized situation won't happen again, and the hypercall path will always be used. We'll need to coordinate on these patches. Be aware that the IPI flavor of the bug is currently causing random failures when booting 4.18 RC1 on Hyper-V VMs with large vCPU counts. Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [RFC Patch 1/3] X86/Hyper-V: Add flush HvFlushGuestPhysicalAddressSpace hypercall support
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Tianyu Lan > Sent: Monday, June 4, 2018 2:08 AM > Cc: Tianyu Lan ; KY Srinivasan > ; Haiyang > Zhang ; Stephen Hemminger ; > t...@linutronix.de; mi...@redhat.com; h...@zytor.com; x...@kernel.org; > pbonz...@redhat.com; rkrc...@redhat.com; de...@linuxdriverproject.org; linux- > ker...@vger.kernel.org; k...@vger.kernel.org; vkuzn...@redhat.com > Subject: [RFC Patch 1/3] X86/Hyper-V: Add flush > HvFlushGuestPhysicalAddressSpace hypercall > support > > Hyper-V provides a pv hypercall HvFlushGuestPhysicalAddressSpace to flush > nested VM address space mapping in l1 hypervisor and it's to reduce overhead > of flushing ept tlb among vcpus. This patch is to implement it. > > Signed-off-by: Lan Tianyu > --- > diff --git a/arch/x86/hyperv/nested.c b/arch/x86/hyperv/nested.c > new file mode 100644 > index ..17f7c288eccc > --- /dev/null > +++ b/arch/x86/hyperv/nested.c > +#include > +#include > +#include > +#include > + > +int hyperv_flush_guest_mapping(u64 as) > +{ > + struct hv_guest_mapping_flush **flush_pcpu; > + struct hv_guest_mapping_flush *flush; > + u64 status = U64_MAX; Initializing status to U64_MAX doesn't seem necessary. > + unsigned long flags; > + int ret = -EFAULT; > + > + if (!hv_hypercall_pg) > + goto fault; > + > + local_irq_save(flags); > + > + flush_pcpu = (struct hv_guest_mapping_flush **) > + this_cpu_ptr(hyperv_pcpu_input_arg); > + > + flush = *flush_pcpu; > + > + if (unlikely(!flush)) { > + local_irq_restore(flags); > + goto fault; > + } > + > + flush->address_space = as; > + flush->flags = 0; > + > + status = hv_do_hypercall(HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE, > + flush, NULL); > + local_irq_restore(flags); > + > + if (!(status & HV_HYPERCALL_RESULT_MASK)) > + ret = 0; > + > +fault: > + return ret; > +} > +EXPORT_SYMBOL_GPL(hyperv_flush_guest_mapping); > diff --git a/arch/x86/include/asm/hyperv-tlfs.h > b/arch/x86/include/asm/hyperv-tlfs.h > index b8c89265baf0..53bbeb08faea 100644 > --- a/arch/x86/include/asm/hyperv-tlfs.h > +++ b/arch/x86/include/asm/hyperv-tlfs.h > @@ -309,6 +309,7 @@ struct ms_hyperv_tsc_page { > #define HV_X64_MSR_REENLIGHTENMENT_CONTROL 0x4106 > > /* Nested features (CPUID 0x400A) EAX */ > +#define HV_X64_NESTED_GUSET_MAPPING_FLUSHBIT(18) The #define name is misspelled. "_GUSET_" should be "_GUEST_". And the matching usage in patch 3/3 will need to be updated as well. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
> From: Dexuan Cui > Sent: Tuesday, May 29, 2018 12:58 PM > Subject: RE: [PATCH] PCI: hv: Do not wait forever on a device that has > disappeared > > > From: Michael Kelley (EOSG) > > Sent: Monday, May 28, 2018 17:19 > > > > While this patch solves the immediate problem of getting hung waiting > > for a response from Hyper-V that will never come, there's another scenario > > to look at that I think introduces a race. Suppose the guest VM issues a > > vmbus_sendpacket() request in one of the cases covered by this patch, > > and suppose that Hyper-V queues a response to the request, and then > > immediately follows with a rescind request. Processing the response will > > get queued to a tasklet associated with the channel, while processing the > > rescind will get queued to a tasklet associated with the top-level vmbus > > connection. From what I can see, the code doesn't impose any ordering > > on processing the two. If the rescind is processed first, the new > > wait_for_response() function may wake up, notice the rescind flag, and > > return an error. Its caller will return an error, and in doing so pop the > > completion packet off the stack. When the response is processed later, > > it will try to signal completion via a completion packet that no longer > > exists, and memory corruption will likely occur. > > > > Am I missing anything that would prevent this scenario from happening? > > It is admittedly low probability, and a solution seems non-trivial. I > > haven't > > looked specifically, but a similar scenario is probably possible with the > > drivers for other VMbus devices. We should work on a generic solution. > > > > Michael > > Thanks for spotting the race! > > IMO we can disable the per-channel tasklet to exclude the race: > > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -565,6 +565,7 @@ static int wait_for_response(struct hv_device *hdev, > { > while (true) { > if (hdev->channel->rescind) { > + tasklet_disable(&hdev->channel->callback_event); > dev_warn_once(&hdev->device, "The device is gone.\n"); > return -ENODEV; > } > > This way, when we exit the loop, we're sure hv_pci_onchannelcallback() can not > run anymore. What do you think of this? I've stared at this and the tasklet code over a couple of days now. Adding the call to tasklet_disable() solves the immediate problem of preventing hv_pci_onchannelcallback() from calling complete() against a completion packet that has been popped off the stack. But in doing so, it simply pushes the core problem further down the road and leaves it unresolved. tasklet_disable() does not prevent the tasklet from being scheduled. So if there is a response from Hyper-V to the original message, the tasklet still gets scheduled. Because it is disabled, it will sit in the tasklet queue and be skipped over each time the queue is processed. Later, when the channel is eventually deleted in free_channel(), tasklet_kill() is called. Unfortunately, tasklet_kill() will get stuck in an infinite loop, waiting for the tasklet to run. There aren't any tasklet interfaces to dequeue an already scheduled tasklet. Please double-check my reasoning. To solve this problem, I think the VMbus driver code will need some additional synchronization between rescind notifications and a response, which may or may not have been sent, and which could be processed after the rescind. I haven't yet thought about what this synchronization might need to look like. Michael > > > It looks the list of the other vmbus devices that can be hot-removed is: > > the hv_utils devices > hv_sock devices > storvsc device > netvsc device > > As I checked, the first 3 types of devices don't have this "send a request to > the > host and wait for the response forever" pattern. NetVSC should be fixed as it > has > the same pattern. > > -- Dexuan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] PCI: hv: Do not wait forever on a device that has disappeared
> > Before the guest finishes the device initialization, the device can be > removed anytime by the host, and after that the host won't respond to > the guest's request, so the guest should be prepared to handle this > case. > > Signed-off-by: Dexuan Cui > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > --- > drivers/pci/host/pci-hyperv.c | 46 > --- > 1 file changed, 34 insertions(+), 12 deletions(-) > While this patch solves the immediate problem of getting hung waiting for a response from Hyper-V that will never come, there's another scenario to look at that I think introduces a race. Suppose the guest VM issues a vmbus_sendpacket() request in one of the cases covered by this patch, and suppose that Hyper-V queues a response to the request, and then immediately follows with a rescind request. Processing the response will get queued to a tasklet associated with the channel, while processing the rescind will get queued to a tasklet associated with the top-level vmbus connection. From what I can see, the code doesn't impose any ordering on processing the two. If the rescind is processed first, the new wait_for_response() function may wake up, notice the rescind flag, and return an error. Its caller will return an error, and in doing so pop the completion packet off the stack. When the response is processed later, it will try to signal completion via a completion packet that no longer exists, and memory corruption will likely occur. Am I missing anything that would prevent this scenario from happening? It is admittedly low probability, and a solution seems non-trivial. I haven't looked specifically, but a similar scenario is probably possible with the drivers for other VMbus devices. We should work on a generic solution. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH char-misc 1/2] Drivers: hv: vmbus: Remove x86 MSR refs in arch independent code
> > -#define hv_init_timer(timer, tick) wrmsrl(timer, tick) > > -#define hv_init_timer_config(config, val) wrmsrl(config, val) > > +#define hv_init_timer(timer, tick) \ > > + wrmsrl(HV_X64_MSR_STIMER0_COUNT + (2*timer), tick) > > +#define hv_init_timer_config(timer, val) \ > > + wrmsrl(HV_X64_MSR_STIMER0_CONFIG + (2*timer), val) > > Why are we stepping in units of 2? > > K. Y The register addresses for the *_CONFIG and *_COUNT registers are interleaved. From hyperv-tlfs.h: #define HV_X64_MSR_STIMER0_CONFIG 0x40B0 #define HV_X64_MSR_STIMER0_COUNT0x40B1 #define HV_X64_MSR_STIMER1_CONFIG 0x40B2 #define HV_X64_MSR_STIMER1_COUNT0x40B3 #define HV_X64_MSR_STIMER2_CONFIG 0x40B4 #define HV_X64_MSR_STIMER2_COUNT0x40B5 #define HV_X64_MSR_STIMER3_CONFIG 0x40B6 #define HV_X64_MSR_STIMER3_COUNT0x40B7 Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 5/5] X86: Hyper-V: Consolidate the allocation of the hypercall input page
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Thursday, May 3, 2018 11:08 PM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH V2 5/5] X86: Hyper-V: Consolidate the allocation of the > hypercall input page > > From: "K. Y. Srinivasan" > > Consolidate the allocation of the hypercall input page. > > Signed-off-by: K. Y. Srinivasan > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 4/5] X86: Hyper-V: Consolidate code for converting cpumask to vpset
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Thursday, May 3, 2018 11:08 PM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH V2 4/5] X86: Hyper-V: Consolidate code for converting cpumask > to vpset > > From: "K. Y. Srinivasan" > > Consolidate code for converting cpumask to vpset. > > Signed-off-by: K. Y. Srinivasan > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 3/5] X86: Hyper-V: Enhanced IPI enlightenment
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Thursday, May 3, 2018 11:08 PM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH V2 3/5] X86: Hyper-V: Enhanced IPI enlightenment > > From: "K. Y. Srinivasan" > > Support enhanced IPI enlightenments (to target more than 64 CPUs). > > Signed-off-by: K. Y. Srinivasan > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 2/5] X86: Hyper-V: Enable IPI enlightenments
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Thursday, May 3, 2018 11:08 PM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH V2 2/5] X86: Hyper-V: Enable IPI enlightenments > > From: "K. Y. Srinivasan" > > Hyper-V supports hypercalls to implement IPI; use them. > > Signed-off-by: K. Y. Srinivasan > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH V2 1/5] X86: Hyper-V: Enlighten APIC access
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Thursday, May 3, 2018 11:08 PM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH V2 1/5] X86: Hyper-V: Enlighten APIC access > > From: "K. Y. Srinivasan" > > Hyper-V supports MSR based APIC access; implement > the enlightenment. > > Signed-off-by: K. Y. Srinivasan > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 3/5] X86: Hyper-V: Enhanced IPI enlightenment
On Wed, 25 Apr 2018, KY Srinivasan wrote: > > +struct ipi_arg_ex { > + u32 vector; > + u32 reserved; > + struct hv_vpset vp_set; > +}; Again, suggest moving to hyperv-tlfs.h. And the 5.0b version of the TLFS has: u32 vector; u8 targetvtl; u8 reserved[3]; struct hv_vpset vp_set; > > +struct hv_vpset { > + u64 format; > + u64 valid_bank_mask; > + u64 bank_contents[]; > +}; And this as well. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/5] X86: Hyper-V: Enable IPI enlightenments
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Wednesday, April 25, 2018 11:13 AM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH 2/5] X86: Hyper-V: Enable IPI enlightenments > > From: "K. Y. Srinivasan" > > Hyper-V supports hypercalls to implement IPI; use them. > > Signed-off-by: K. Y. Srinivasan > --- > arch/x86/hyperv/hv_apic.c | 125 > + > arch/x86/hyperv/hv_init.c | 17 + > arch/x86/include/asm/hyperv-tlfs.h | 9 +++ > arch/x86/include/asm/mshyperv.h| 1 + > 4 files changed, 152 insertions(+) > > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > index e0a5b36208fc..7f3322ecfb01 100644 > --- a/arch/x86/hyperv/hv_apic.c > +++ b/arch/x86/hyperv/hv_apic.c > @@ -30,6 +30,14 @@ > #include > #include > > +struct ipi_arg_non_ex { > + u32 vector; > + u32 reserved; > + u64 cpu_mask; > +}; I think we'd like to put structures like this, which are defined in the Hyper-V Top Level Functional Spec, in hyperv-tlfs.h. Also, the 5.0b version of the TLFS, which is latest, shows this structure on page 100: u32 vector; u8 targetvtl; u8 reserved[3]; u64 cpu_mask; > + > +static struct apic orig_apic; > + > static u64 hv_apic_icr_read(void) > { > u64 reg_val; > @@ -85,8 +93,125 @@ static void hv_apic_eoi_write(u32 reg, u32 val) > wrmsr(HV_X64_MSR_EOI, val, 0); > } > > +/* > + * IPI implementation on Hyper-V. > + */ > + > +static int __send_ipi_mask(const struct cpumask *mask, int vector) > +{ > + int cur_cpu, vcpu; > + struct ipi_arg_non_ex **arg; > + struct ipi_arg_non_ex *ipi_arg; > + int ret = 1; > + unsigned long flags; > + > + if (cpumask_empty(mask)) > + return 0; > + > + if (!hv_hypercall_pg) > + return ret; > + > + if ((vector < HV_IPI_LOW_VECTOR) || (vector > HV_IPI_HIGH_VECTOR)) > + return ret; > + > + local_irq_save(flags); > + arg = (struct ipi_arg_non_ex **)this_cpu_ptr(hyperv_pcpu_input_arg); > + > + ipi_arg = *arg; > + if (unlikely(!ipi_arg)) > + goto ipi_mask_done; > + > + > + ipi_arg->vector = vector; > + ipi_arg->reserved = 0; > + ipi_arg->cpu_mask = 0; > + > + for_each_cpu(cur_cpu, mask) { > + vcpu = hv_cpu_number_to_vp_number(cur_cpu); > + if (vcpu >= 64) > + goto ipi_mask_done; > + > + __set_bit(vcpu, (unsigned long *)&ipi_arg->cpu_mask); > + } > + > + ret = hv_do_hypercall(HVCALL_SEND_IPI, ipi_arg, NULL); > + > +ipi_mask_done: > + local_irq_restore(flags); > + return ret; > +} > + > +static int __send_ipi_one(int cpu, int vector) > +{ > + struct cpumask mask = CPU_MASK_NONE; > + > + cpumask_set_cpu(cpu, &mask); > + return __send_ipi_mask(&mask, vector); > +} > + > +static void hv_send_ipi(int cpu, int vector) > +{ > + if (__send_ipi_one(cpu, vector)) > + orig_apic.send_IPI(cpu, vector); > +} > + > +static void hv_send_ipi_mask(const struct cpumask *mask, int vector) > +{ > + if (__send_ipi_mask(mask, vector)) > + orig_apic.send_IPI_mask(mask, vector); > +} > + > +static void hv_send_ipi_mask_allbutself(const struct cpumask *mask, int > vector) > +{ > + unsigned int this_cpu = smp_processor_id(); > + struct cpumask new_mask; > + const struct cpumask *local_mask; > + > + cpumask_copy(&new_mask, mask); > + cpumask_clear_cpu(this_cpu, &new_mask); > + local_mask = &new_mask; > + if (__send_ipi_mask(local_mask, vector)) > + orig_apic.send_IPI_mask_allbutself(mask, vector); > +} > + > +static void hv_send_ipi_allbutself(int vector) > +{ > + hv_send_ipi_mask_allbutself(cpu_online_mask, vector); > +} > + > +static void hv_send_ipi_all(int vector) > +{ > + if (__send_ipi_mask(cpu_online_mask, vector)) > + orig_apic.send_IPI_all(vector); > +} > + > +static void hv_send_ipi_self(int vector) > +{ > + if (__send_ipi_one(smp_processor_id(), vector)) > + orig_apic.send_IPI_self(vector); > +} > + > void __init hv_apic_init(void) > { > + if (ms_hy
RE: [PATCH 1/5] X86: Hyper-V: Enlighten APIC access
> -Original Message- > From: k...@linuxonhyperv.com > Sent: Wednesday, April 25, 2018 11:13 AM > To: x...@kernel.org; gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > t...@linutronix.de; h...@zytor.com; Stephen Hemminger > ; > Michael Kelley (EOSG) ; vkuzn...@redhat.com > Cc: KY Srinivasan > Subject: [PATCH 1/5] X86: Hyper-V: Enlighten APIC access > > From: "K. Y. Srinivasan" > > Hyper-V supports MSR based APIC access; implement > the enlightenment. > > Signed-off-by: K. Y. Srinivasan > --- > arch/x86/hyperv/Makefile| 2 +- > arch/x86/hyperv/hv_apic.c | 98 > + > arch/x86/hyperv/hv_init.c | 5 ++- > arch/x86/include/asm/mshyperv.h | 6 ++- > 4 files changed, 107 insertions(+), 4 deletions(-) > create mode 100644 arch/x86/hyperv/hv_apic.c > > diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile > index 367a8203cfcf..00ce4df01a09 100644 > --- a/arch/x86/hyperv/Makefile > +++ b/arch/x86/hyperv/Makefile > @@ -1 +1 @@ > -obj-y:= hv_init.o mmu.o > +obj-y:= hv_init.o mmu.o hv_apic.o > diff --git a/arch/x86/hyperv/hv_apic.c b/arch/x86/hyperv/hv_apic.c > new file mode 100644 > index ..e0a5b36208fc > --- /dev/null > +++ b/arch/x86/hyperv/hv_apic.c > @@ -0,0 +1,98 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +/* > + * Hyper-V specific APIC code. > + * > + * Copyright (C) 2018, Microsoft, Inc. > + * > + * Author : K. Y. Srinivasan > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published > + * by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or > + * NON INFRINGEMENT. See the GNU General Public License for more > + * details. > + * > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static u64 hv_apic_icr_read(void) > +{ > + u64 reg_val; > + > + rdmsrl(HV_X64_MSR_ICR, reg_val); > + return reg_val; > +} > + > +static void hv_apic_icr_write(u32 low, u32 id) > +{ > + u64 reg_val; > + > + reg_val = SET_APIC_DEST_FIELD(id); > + reg_val = reg_val << 32; > + reg_val |= low; > + > + wrmsrl(HV_X64_MSR_ICR, reg_val); > +} > + > +static u32 hv_apic_read(u32 reg) > +{ > + u32 reg_val, hi; > + > + switch (reg) { > + case APIC_EOI: > + rdmsr(HV_X64_MSR_EOI, reg_val, hi); > + return reg_val; > + case APIC_TASKPRI: > + rdmsr(HV_X64_MSR_TPR, reg_val, hi); > + return reg_val; > + > + default: > + return native_apic_mem_read(reg); > + } > +} > + > +static void hv_apic_write(u32 reg, u32 val) > +{ > + switch (reg) { > + case APIC_EOI: > + wrmsr(HV_X64_MSR_EOI, val, 0); > + break; > + case APIC_TASKPRI: > + wrmsr(HV_X64_MSR_TPR, val, 0); > + break; > + default: > + native_apic_mem_write(reg, val); > + } > +} > + > +static void hv_apic_eoi_write(u32 reg, u32 val) > +{ > + wrmsr(HV_X64_MSR_EOI, val, 0); > +} > + > +void __init hv_apic_init(void) > +{ > + if (ms_hyperv.hints & HV_X64_APIC_ACCESS_RECOMMENDED) { > + pr_info("Hyper-V: Using MSR ased APIC access\n"); Typo here. "ased" should be "based". > + apic_set_eoi_write(hv_apic_eoi_write); > + apic->read = hv_apic_read; > + apic->write = hv_apic_write; > + apic->icr_write = hv_apic_icr_write; > + apic->icr_read = hv_apic_icr_read; > + } > +} > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index cfecc2272f2d..71e50fc2b7ef 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -242,8 +242,9 @@ static int hv_cpu_die(unsigned int cpu) > * > * 1. Setup the hypercall page. > * 2. Register Hyper-V specific clocksource. > + * 3. Setup Hyper-V specific APIC entry points. > */ > -void hyperv_init(void) > +void __init hyperv_init(void) > { > u64 guest_id, required_msrs; > union hv_x64_msr_hyperca
RE: [Patch v2] Storvsc: Select channel based on available percentage of ring buffer to write
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Thursday, April 19, 2018 2:54 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org > Cc: Long Li > Subject: [Patch v2] Storvsc: Select channel based on available percentage of > ring buffer to > write > > From: Long Li > > This is a best effort for estimating on how busy the ring buffer is for > that channel, based on available buffer to write in percentage. It is still > possible that at the time of actual ring buffer write, the space may not be > available due to other processes may be writing at the time. > > Selecting a channel based on how full it is can reduce the possibility that > a ring buffer write will fail, and avoid the situation a channel is over > busy. > > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. > > Changes. > v2: Pre-allocate struct cpumask on the heap. > Struct cpumask is a big structure (1k bytes) when CONFIG_NR_CPUS=8192 (default > value when CONFIG_MAXSMP=y). Don't use kernel stack for it by pre-allocating > them using kmalloc when channels are first initialized. > > Signed-off-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 90 > -- > 1 file changed, 72 insertions(+), 18 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index a2ec0bc9e9fa..2a9fff94dd1a 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer > size > (bytes)"); > > module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to > subchannels"); > + > +static int ring_avail_percent_lowater = 10; > +module_param(ring_avail_percent_lowater, int, S_IRUGO); > +MODULE_PARM_DESC(ring_avail_percent_lowater, > + "Select a channel if available ring size > this in percent"); > + > /* > * Timeout in seconds for all devices managed by this driver. > */ > @@ -468,6 +474,13 @@ struct storvsc_device { >* Mask of CPUs bound to subchannels. >*/ > struct cpumask alloced_cpus; > + /* > + * Pre-allocated struct cpumask for each hardware queue. > + * struct cpumask is used by selecting out-going channels. It is a > + * big structure, default to 1024k bytes when CONFIG_MAXSMP=y. I think you mean "1024 bytes" or "1k bytes" in the above comment. > + * Pre-allocate it to avoid allocation on the kernel stack. > + */ > + struct cpumask *cpumask_chns; > /* Used for vsc/vsp channel reset process */ > struct storvsc_cmd_request init_request; > struct storvsc_cmd_request reset_request; > @@ -872,6 +885,13 @@ static int storvsc_channel_init(struct hv_device > *device, bool is_fc) > if (stor_device->stor_chns == NULL) > return -ENOMEM; > > + stor_device->cpumask_chns = kcalloc(num_possible_cpus(), > + sizeof(struct cpumask), GFP_KERNEL); Note that num_possible_cpus() is 240 for a Hyper-V 2016 guest unless overridden on the kernel boot line, so this is going to allocate 240 Kbytes for each synthetic SCSI controller. On an Azure VM, which has two IDE and two SCSI controllers, this is nearly 1 Mbyte. It's unfortunate to have to allocate this much memory for a what is essentially a temporary variable. Further down in these comments, I've proposed an alternate implementation of the code that avoids the need for the temporary variable, and hence avoids the need for this allocation. > + if (stor_device->cpumask_chns == NULL) { > + kfree(stor_device->stor_chns); > + return -ENOMEM; > + } > + > stor_device->stor_chns[device->channel->target_cpu] = device->channel; > cpumask_set_cpu(device->channel->target_cpu, > &stor_device->alloced_cpus); > @@ -1232,6 +1252,7 @@ static int storvsc_dev_remove(struct hv_device *device) > vmbus_close(device->channel); > > kfree(stor_device->stor_chns); > + kfree(stor_device->cpumask_chns); > kfree(stor_device); > return 0; > } > @@ -1241,7 +1262,7 @@ static struct vmbus_channel *get_og_chn(struct > storvsc_device > *stor_device, > {1G/ > u16 slot = 0; > u16 hash_qnum; > - struct cpumask alloced_mask; > + struct cpumask *alloced_mask = &stor_device->cpumask_chns[q_num]; > int num_channels, tgt_cpu; > > if (stor_device->num_sc == 0) > @@ -1257,10 +1278,10 @@ static struct vmbus_channel *get_og_chn(struct > storvsc_device > *stor_device, >* III. Mapping is persistent. >*/ > > - cpumask_and(&alloced_mask, &stor_device->alloced_cpus, > +
RE: [Resend Patch 2/3] Netvsc: Use the vmbus functiton to calculate ring buffer percentage
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Tuesday, March 27, 2018 5:49 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org; net...@vger.kernel.org > Cc: Long Li > Subject: [Resend Patch 2/3] Netvsc: Use the vmbus functiton to calculate ring > buffer > percentage > > From: Long Li > > In Vmbus, we have defined a function to calculate available ring buffer > percentage to write. > > Use that function and remove netvsc's private version. > > Signed-off-by: Long Li Reviewed-by: Michael Kelley > --- > drivers/net/hyperv/hyperv_net.h | 1 - > drivers/net/hyperv/netvsc.c | 17 +++-- > drivers/net/hyperv/netvsc_drv.c | 3 --- > 3 files changed, 3 insertions(+), 18 deletions(-) > > diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h > index cd538d5a7986..a0199ab13d67 100644 > --- a/drivers/net/hyperv/hyperv_net.h > +++ b/drivers/net/hyperv/hyperv_net.h > @@ -189,7 +189,6 @@ struct netvsc_device; > struct net_device_context; > > extern u32 netvsc_ring_bytes; > -extern struct reciprocal_value netvsc_ring_reciprocal; > > struct netvsc_device *netvsc_device_add(struct hv_device *device, > const struct netvsc_device_info *info); > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index 0265d703eb03..8af0069e4d8c 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -31,7 +31,6 @@ > #include > #include > #include > -#include > > #include > > @@ -590,17 +589,6 @@ void netvsc_device_remove(struct hv_device *device) > #define RING_AVAIL_PERCENT_HIWATER 20 > #define RING_AVAIL_PERCENT_LOWATER 10 > > -/* > - * Get the percentage of available bytes to write in the ring. > - * The return value is in range from 0 to 100. > - */ > -static u32 hv_ringbuf_avail_percent(const struct hv_ring_buffer_info > *ring_info) > -{ > - u32 avail_write = hv_get_bytes_to_write(ring_info); > - > - return reciprocal_divide(avail_write * 100, netvsc_ring_reciprocal); > -} > - > static inline void netvsc_free_send_slot(struct netvsc_device *net_device, >u32 index) > { > @@ -649,7 +637,8 @@ static void netvsc_send_tx_complete(struct netvsc_device > *net_device, > wake_up(&net_device->wait_drain); > > if (netif_tx_queue_stopped(netdev_get_tx_queue(ndev, q_idx)) && > - (hv_ringbuf_avail_percent(&channel->outbound) > > RING_AVAIL_PERCENT_HIWATER || > + (hv_get_avail_to_write_percent(&channel->outbound) > > + RING_AVAIL_PERCENT_HIWATER || >queue_sends < 1)) { > netif_tx_wake_queue(netdev_get_tx_queue(ndev, q_idx)); > ndev_ctx->eth_stats.wake_queue++; > @@ -757,7 +746,7 @@ static inline int netvsc_send_pkt( > struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx); > u64 req_id; > int ret; > - u32 ring_avail = hv_ringbuf_avail_percent(&out_channel->outbound); > + u32 ring_avail = hv_get_avail_to_write_percent(&out_channel->outbound); > > nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; > if (skb) > diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c > index faea0be18924..b0b1c2fd2b7b 100644 > --- a/drivers/net/hyperv/netvsc_drv.c > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -35,7 +35,6 @@ > #include > #include > #include > -#include > > #include > #include > @@ -55,7 +54,6 @@ static unsigned int ring_size __ro_after_init = 128; > module_param(ring_size, uint, S_IRUGO); > MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); > unsigned int netvsc_ring_bytes __ro_after_init; > -struct reciprocal_value netvsc_ring_reciprocal __ro_after_init; > > static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | > NETIF_MSG_LINK | NETIF_MSG_IFUP | > @@ -2186,7 +2184,6 @@ static int __init netvsc_drv_init(void) > ring_size); > } > netvsc_ring_bytes = ring_size * PAGE_SIZE; > - netvsc_ring_reciprocal = reciprocal_value(netvsc_ring_bytes); > > ret = vmbus_driver_register(&netvsc_drv); > if (ret) > -- > 2.14.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [Resend Patch 3/3] Storvsc: Select channel based on available percentage of ring buffer to write
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Tuesday, March 27, 2018 5:49 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org; net...@vger.kernel.org > Cc: Long Li > Subject: [Resend Patch 3/3] Storvsc: Select channel based on available > percentage of ring > buffer to write > > From: Long Li > > This is a best effort for estimating on how busy the ring buffer is for > that channel, based on available buffer to write in percentage. It is still > possible that at the time of actual ring buffer write, the space may not be > available due to other processes may be writing at the time. > > Selecting a channel based on how full it is can reduce the possibility that > a ring buffer write will fail, and avoid the situation a channel is over > busy. > > Now it's possible that storvsc can use a smaller ring buffer size > (e.g. 40k bytes) to take advantage of cache locality. > > Signed-off-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 62 > +- > 1 file changed, 50 insertions(+), 12 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index a2ec0bc9e9fa..b1a87072b3ab 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -395,6 +395,12 @@ MODULE_PARM_DESC(storvsc_ringbuffer_size, "Ring buffer > size > (bytes)"); > > module_param(storvsc_vcpus_per_sub_channel, int, S_IRUGO); > MODULE_PARM_DESC(storvsc_vcpus_per_sub_channel, "Ratio of VCPUs to > subchannels"); > + > +static int ring_avail_percent_lowater = 10; Reserving 10% of each ring buffer by default seems like more than is needed in the storvsc driver. That would be about 4Kbytes for the 40K ring buffer you suggest, and even more for a ring buffer of 128K. Each outgoing record is only about 344 bytes (I'd have to check exactly). With the new channel selection algorithm below, the only time we use a channel that is already below the low water mark is when no channel could be found that is above the low water mark. There could be a case of two or more threads deciding that a channel is above the low water mark at the same time and both choosing it, but that's likely to be rare. So it seems like we could set the default low water mark to 5 percent or even 3 percent, which will let more of the ring buffer be used, and let a channel be assigned according to the algorithm, rather than falling through to the default because all channels appear to be "full". > +module_param(ring_avail_percent_lowater, int, S_IRUGO); > +MODULE_PARM_DESC(ring_avail_percent_lowater, > + "Select a channel if available ring size > this in percent"); > + > /* > * Timeout in seconds for all devices managed by this driver. > */ > @@ -1285,9 +1291,9 @@ static int storvsc_do_io(struct hv_device *device, > { > struct storvsc_device *stor_device; > struct vstor_packet *vstor_packet; > - struct vmbus_channel *outgoing_channel; > + struct vmbus_channel *outgoing_channel, *channel; > int ret = 0; > - struct cpumask alloced_mask; > + struct cpumask alloced_mask, other_numa_mask; > int tgt_cpu; > > vstor_packet = &request->vstor_packet; > @@ -1301,22 +1307,53 @@ static int storvsc_do_io(struct hv_device *device, > /* >* Select an an appropriate channel to send the request out. >*/ > - > if (stor_device->stor_chns[q_num] != NULL) { > outgoing_channel = stor_device->stor_chns[q_num]; > - if (outgoing_channel->target_cpu == smp_processor_id()) { > + if (outgoing_channel->target_cpu == q_num) { > /* >* Ideally, we want to pick a different channel if >* available on the same NUMA node. >*/ > cpumask_and(&alloced_mask, &stor_device->alloced_cpus, > cpumask_of_node(cpu_to_node(q_num))); > - for_each_cpu_wrap(tgt_cpu, &alloced_mask, > - outgoing_channel->target_cpu + 1) { > - if (tgt_cpu != outgoing_channel->target_cpu) { > - outgoing_channel = > - stor_device->stor_chns[tgt_cpu]; > - break; > + > + for_each_cpu_wrap(tgt_cpu, &alloced_mask, q_num + 1) { > + if (tgt_cpu == q_num) > + continue; > + channel = stor_device->stor_chns[tgt_cpu]; > + if (hv_get_avail_to_write_percent( > + &channel->outbound) > +
RE: [PATCH v2 char-misc 1/1] x86/hyperv: Add interrupt handler annotations
> -Original Message- > From: Greg KH > Sent: Wednesday, April 4, 2018 1:16 AM > To: Michael Kelley (EOSG) > Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; > o...@aepfle.de; > a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com; > leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger > ; KY Srinivasan > Subject: Re: [PATCH v2 char-misc 1/1] x86/hyperv: Add interrupt handler > annotations > > On Tue, Apr 03, 2018 at 01:59:08PM -0700, mhkelle...@gmail.com wrote: > > From: Michael Kelley > > > > Add standard interrupt handler annotations to > > hyperv_vector_handler(). > > > > Signed-off-by: Michael Kelley > > --- > > Changes in v2: > > * Fixed From: line > > --- > > arch/x86/kernel/cpu/mshyperv.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c > > index 4488cf0..20f6849 100644 > > --- a/arch/x86/kernel/cpu/mshyperv.c > > +++ b/arch/x86/kernel/cpu/mshyperv.c > > @@ -41,7 +41,7 @@ static void (*hv_stimer0_handler)(void); > > static void (*hv_kexec_handler)(void); > > static void (*hv_crash_handler)(struct pt_regs *regs); > > > > -void hyperv_vector_handler(struct pt_regs *regs) > > +__visible void __irq_entry hyperv_vector_handler(struct pt_regs *regs) > > What bug does this solve? What is wrong with the existing markings? > What does __visible and __irq_entry give us that we don't already have > and we need? > > Are you really using LTO that requires this marking to prevent the code > from being removed? Thomas Gleixner commented on Vitaly Kuznetsov's Hyper-V reenlightenment patch that the interrupt handler should have these annotations: see https://lkml.org/lkml/2018/1/14/145 I put the same annotations on the interrupt handler for stimer0 Direct Mode, So this change makes the hyperv_vector_handler() consistent with hv_stimer0_vector_handler() in the same source file. It does not fix any immediate bug -- it's for consistency and alignment with what is apparently standard practice. Not sure what LTO is ... Michael > > thanks, > > greg k-h ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v2] storvsc: Set up correct queue depth values for IDE devices
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Thursday, March 22, 2018 2:47 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org > Cc: Long Li > Subject: [PATCH v2] storvsc: Set up correct queue depth values for IDE devices > > From: Long Li > > Unlike SCSI and FC, we don't use multiple channels for IDE. > Also fix the calculation for sub-channels. > > Change log: > v2: Addressed comment on incorrect number of sub-channels. > (Michael Kelley ) > > Signed-off-by: Long Li Reviewed-by: Michael Kelley > --- > drivers/scsi/storvsc_drv.c | 7 +-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 8c51d628b52e..a2ec0bc9e9fa 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1722,11 +1722,14 @@ static int storvsc_probe(struct hv_device *device, > max_targets = STORVSC_MAX_TARGETS; > max_channels = STORVSC_MAX_CHANNELS; > /* > - * On Windows8 and above, we support sub-channels for storage. > + * On Windows8 and above, we support sub-channels for storage > + * on SCSI and FC controllers. >* The number of sub-channels offerred is based on the number of >* VCPUs in the guest. >*/ > - max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel); > + if (!dev_is_ide) > + max_sub_channels = > + (num_cpus - 1) / storvsc_vcpus_per_sub_channel; > } > > scsi_driver.can_queue = (max_outstanding_req_per_channel * > -- > 2.14.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH net-next,1/2] hv_netvsc: Fix the return status in RX path
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Haiyang Zhang > Sent: Thursday, March 22, 2018 12:01 PM > To: da...@davemloft.net; net...@vger.kernel.org > Cc: Haiyang Zhang ; KY Srinivasan > ; Stephen > Hemminger ; o...@aepfle.de; vkuzn...@redhat.com; > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org > Subject: [PATCH net-next,1/2] hv_netvsc: Fix the return status in RX path > > From: Haiyang Zhang > > As defined in hyperv_net.h, the NVSP_STAT_SUCCESS is one not zero. > Some functions returns 0 when it actually means NVSP_STAT_SUCCESS. > This patch fixes them. > > In netvsc_receive(), it puts the last RNDIS packet's receive status > for all packets in a vmxferpage which may contain multiple RNDIS > packets. > This patch puts NVSP_STAT_FAIL in the receive completion if one of > the packets in a vmxferpage fails. This patch changes the status field that is being reported back to the Hyper-V host in the receive completion message in enq_receive_complete(). The current code reports 0 on success, and with the patch, it will report 1 on success. So does this change affect anything on the Hyper-V side? Or is Hyper-V just ignoring the value? If this change doesn't have any impact on the interactions with Hyper-V, perhaps it would be good to explain why in the commit message. Michael > > Signed-off-by: Haiyang Zhang > --- > drivers/net/hyperv/netvsc.c | 8 ++-- > drivers/net/hyperv/netvsc_drv.c | 2 +- > drivers/net/hyperv/rndis_filter.c | 4 ++-- > 3 files changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c > index aa95e81af6e5..1ddb2c39b6e4 100644 > --- a/drivers/net/hyperv/netvsc.c > +++ b/drivers/net/hyperv/netvsc.c > @@ -1098,12 +1098,16 @@ static int netvsc_receive(struct net_device *ndev, > void *data = recv_buf > + vmxferpage_packet->ranges[i].byte_offset; > u32 buflen = vmxferpage_packet->ranges[i].byte_count; > + int ret; > > trace_rndis_recv(ndev, q_idx, data); > > /* Pass it to the upper layer */ > - status = rndis_filter_receive(ndev, net_device, > - channel, data, buflen); > + ret = rndis_filter_receive(ndev, net_device, > +channel, data, buflen); > + > + if (unlikely(ret != NVSP_STAT_SUCCESS)) > + status = NVSP_STAT_FAIL; > } > > enq_receive_complete(ndev, net_device, q_idx, > diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c > index cdb78eefab67..33607995be62 100644 > --- a/drivers/net/hyperv/netvsc_drv.c > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -818,7 +818,7 @@ int netvsc_recv_callback(struct net_device *net, > u64_stats_update_end(&rx_stats->syncp); > > napi_gro_receive(&nvchan->napi, skb); > - return 0; > + return NVSP_STAT_SUCCESS; > } > > static void netvsc_get_drvinfo(struct net_device *net, > diff --git a/drivers/net/hyperv/rndis_filter.c > b/drivers/net/hyperv/rndis_filter.c > index 2dc00f714482..591fb8080f11 100644 > --- a/drivers/net/hyperv/rndis_filter.c > +++ b/drivers/net/hyperv/rndis_filter.c > @@ -443,10 +443,10 @@ int rndis_filter_receive(struct net_device *ndev, > "unhandled rndis message (type %u len %u)\n", > rndis_msg->ndis_msg_type, > rndis_msg->msg_len); > - break; > + return NVSP_STAT_FAIL; > } > > - return 0; > + return NVSP_STAT_SUCCESS; > } > > static int rndis_filter_query_device(struct rndis_device *dev, > -- > 2.15.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] vmbus: use put_device() if device_register fail
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of KY Srinivasan > Sent: Sunday, March 18, 2018 8:02 PM > To: Arvind Yadav ; Stephen Hemminger > ; Haiyang Zhang > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org > Subject: RE: [PATCH] vmbus: use put_device() if device_register fail > > > -Original Message- > > From: devel On Behalf > > Of Arvind Yadav > > Sent: Saturday, March 17, 2018 11:48 PM > > To: Stephen Hemminger ; Haiyang Zhang > > > > Cc: de...@linuxdriverproject.org; linux-ker...@vger.kernel.org > > Subject: [PATCH] vmbus: use put_device() if device_register fail > > > > if device_register() returned an error. Always use put_device() > > to give up the reference initialized. > > > > Signed-off-by: Arvind Yadav > > --- > > drivers/hv/vmbus_drv.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > > index bc65c4d..25da2f3 100644 > > --- a/drivers/hv/vmbus_drv.c > > +++ b/drivers/hv/vmbus_drv.c > > @@ -1358,6 +1358,7 @@ int vmbus_device_register(struct hv_device > > *child_device_obj) > > ret = device_register(&child_device_obj->device); > > if (ret) { > > pr_err("Unable to register child device\n"); > > + put_device(&child_device_obj->device); > > If the registration failed; we would not have acquired the reference on the > device and so > we should not be dropping the reference in the failure path. Looking at the code for device_register() and its constituent parts device_initialize() and device_add(), Arvind is correct. device_initialize() creates the object with a reference count of 1. device_add() does not decrement that reference count or free the object, even if it fails. The comments for device_register() call this out as well. Michael > > K. Y > > return ret; > > } > > > > -- > > 2.7.4 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with GFP_KERNEL in new_pcichild_device
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Jia-Ju Bai > Sent: Sunday, March 18, 2018 7:53 AM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; bhelg...@google.com > Cc: de...@linuxdriverproject.org; linux-...@vger.kernel.org; > linux-ker...@vger.kernel.org; > Jia-Ju Bai > Subject: [PATCH 2/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with > GFP_KERNEL in > new_pcichild_device > > new_pcichild_device() is not called in atomic context. > > The call chain ending up at new_pcichild_device() is: > [1] new_pcichild_device() <- pci_devices_present_work() > pci_devices_present_work() is only set in INIT_WORK(). > > Despite never getting called from atomic context, > new_pcichild_device() calls kzalloc with GFP_ATOMIC, > which waits busily for allocation. > GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL > to avoid busy waiting. > > This is found by a static analysis tool named DCNS written by myself. > > Signed-off-by: Jia-Ju Bai Reviewed-by: Michael Kelley > --- > drivers/pci/host/pci-hyperv.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 0fe3ea1..289e31d 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -1500,7 +1500,7 @@ static struct hv_pci_dev *new_pcichild_device(struct > hv_pcibus_device *hbus, > unsigned long flags; > int ret; > > - hpdev = kzalloc(sizeof(*hpdev), GFP_ATOMIC); > + hpdev = kzalloc(sizeof(*hpdev), GFP_KERNEL); > if (!hpdev) > return NULL; > > -- > 1.9.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with GFP_KERNEL in hv_pci_onchannelcallback
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Jia-Ju Bai > Sent: Sunday, March 18, 2018 7:53 AM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; bhelg...@google.com > Cc: de...@linuxdriverproject.org; linux-...@vger.kernel.org; > linux-ker...@vger.kernel.org; > Jia-Ju Bai > Subject: [PATCH 1/2] pci: host: pci-hyperv: Replace GFP_ATOMIC with > GFP_KERNEL in > hv_pci_onchannelcallback > > hv_pci_onchannelcallback() is not called in atomic context. > > The call chain ending up at hv_pci_onchannelcallback() is: > [1] hv_pci_onchannelcallback() <- hv_pci_probe() > hv_pci_probe() is only set as ".probe" in hv_driver > structure "hv_pci_drv". > > Despite never getting called from atomic context, Not true. hv_pci_probe() registers hv_pci_onchannelcallback() as a callback function that is invoked from the VMbus interrupt handler. So GFP_ATOMIC is appropriate. Michael > hv_pci_onchannelcallback() calls kmalloc with GFP_ATOMIC, > which waits busily for allocation. > GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL > to avoid busy waiting. > > This is found by a static analysis tool named DCNS written by myself. > > Signed-off-by: Jia-Ju Bai > --- > drivers/pci/host/pci-hyperv.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 0fe3ea1..c5c8a99 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -1887,7 +1887,7 @@ static void hv_pci_onchannelcallback(void *context) > struct pci_dev_incoming *dev_message; > struct hv_pci_dev *hpdev; > > - buffer = kmalloc(bufferlen, GFP_ATOMIC); > + buffer = kmalloc(bufferlen, GFP_KERNEL); > if (!buffer) > return; > > @@ -1899,7 +1899,7 @@ static void hv_pci_onchannelcallback(void *context) > kfree(buffer); > /* Handle large packet */ > bufferlen = bytes_recvd; > - buffer = kmalloc(bytes_recvd, GFP_ATOMIC); > + buffer = kmalloc(bytes_recvd, GFP_KERNEL); > if (!buffer) > return; > continue; > -- > 1.9.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] vmbus: use put_device() if device_register fail
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Arvind Yadav > Sent: Saturday, March 17, 2018 11:48 PM > To: Stephen Hemminger ; Haiyang Zhang > > Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org > Subject: [PATCH] vmbus: use put_device() if device_register fail > > if device_register() returned an error. Always use put_device() > to give up the reference initialized. > > Signed-off-by: Arvind Yadav Reviewed-by: Michael Kelley > --- > drivers/hv/vmbus_drv.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c > index bc65c4d..25da2f3 100644 > --- a/drivers/hv/vmbus_drv.c > +++ b/drivers/hv/vmbus_drv.c > @@ -1358,6 +1358,7 @@ int vmbus_device_register(struct hv_device > *child_device_obj) > ret = device_register(&child_device_obj->device); > if (ret) { > pr_err("Unable to register child device\n"); > + put_device(&child_device_obj->device); > return ret; > } > > -- > 2.7.4 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH] storvsc: Set up correct queue depth values for IDE devices
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Long Li > Sent: Thursday, March 15, 2018 4:52 PM > To: KY Srinivasan ; Haiyang Zhang > ; Stephen > Hemminger ; James E . J . Bottomley > ; > Martin K . Petersen ; > de...@linuxdriverproject.org; linux- > s...@vger.kernel.org; linux-ker...@vger.kernel.org > Cc: Long Li > Subject: [PATCH] storvsc: Set up correct queue depth values for IDE devices > > From: Long Li > > Unlike SCSI and FC, we don't use multiple channels for IDE. So set queue depth > correctly for IDE. > > Also set the correct cmd_per_lun for all devices. > > Signed-off-by: Long Li > --- > drivers/scsi/storvsc_drv.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index 8c51d628b52e..fba170640e9c 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1722,15 +1722,19 @@ static int storvsc_probe(struct hv_device *device, > max_targets = STORVSC_MAX_TARGETS; > max_channels = STORVSC_MAX_CHANNELS; > /* > - * On Windows8 and above, we support sub-channels for storage. > + * On Windows8 and above, we support sub-channels for storage > + * on SCSI and FC controllers. >* The number of sub-channels offerred is based on the number of >* VCPUs in the guest. >*/ > - max_sub_channels = (num_cpus / storvsc_vcpus_per_sub_channel); > + if (!dev_is_ide) > + max_sub_channels = > + num_cpus / storvsc_vcpus_per_sub_channel; This calculation of the # of sub-channels doesn't get the right answer (and it didn't before this patch either). storvsc_vcpus_per_sub_channel defaults to 4. If num_cpus is 8, max_sub_channels will be 2, but it should be 1. The sub-channel count should not include the main channel since we add 1 to the sub-channel count below when calculating can_queue. Furthermore, this is calculation is just a guess, in the sense that we're replicating the algorithm we think Hyper-V is using to determine the number of sub-channels to offer. It turns out Hyper-V is changing that algorithm for particular devices in an upcoming new Azure VM size. But the only use of max_sub_channels is in the calculation of can_queue below, so the impact is minimal. > } > > scsi_driver.can_queue = (max_outstanding_req_per_channel * >(max_sub_channels + 1)); > + scsi_driver.cmd_per_lun = scsi_driver.can_queue; can_queue is defined as "int", while cmd_per_lun is defined as "short". The calculated value of can_queue could easily be over 32,767 with 15 sub-channels and max_outstanding_req_per_channel being 3036 for the default 1 Mbyte ring buffer. So the assignment to cmd_per_lun could produce truncation and even a negative number. More broadly, since max_outstanding_req_per_channel is based on the ring buffer size, these calculations imply that Hyper-V storvsp's queuing capacity is based on the ring buffer size. I don't think that's the case. From conversations with the storvsp folks, I think Hyper-V always removes entries from the guest->host ring buffer and then lets storvsp queue them separately. So we don't want to be linking cmd_per_lun (or even can_queue, for that matter) to the ring buffer size. The current default ring buffer size of 1 Mbyte is probably 10x bigger than needed, and we want to be able to adjust that without ending up with can_queue and cmd_per_lun values that are too small. We would probably do better to set can_queue to a constant, and leave cmd_per_lun at its current value of 2048. The can_queue value is already capped at 10240 in the blk-mq layer, so maybe that's a reasonable constant to use. Thoughts? > > host = scsi_host_alloc(&scsi_driver, > sizeof(struct hv_host_device)); > -- > 2.14.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v2]PCI: hv: fix PCI-BUS domainID corruption
> -Original Message- > From: Sridhar Pitchai > Sent: Wednesday, March 14, 2018 11:08 AM > To: Lorenzo Pieralisi ; Michael Kelley (EOSG) > > Cc: Bjorn Helgaas ; Jake Oshins ; > Haiyang > Zhang ; Stephen Hemminger ; > Dexuan > Cui ; KY Srinivasan ; > de...@linuxdriverproject.org; linux-...@vger.kernel.org; > linux-ker...@vger.kernel.org > Subject: [PATCH v2]PCI: hv: fix PCI-BUS domainID corruption > > Whenever PCI bus is added, HyperV guarantees the BUS id is unique. Even > with > that when a first device is added to the bus, it overrides bus domain ID > with > the device serial number. Sometime this can result in BUS ID not being > unique. > In this case, when PCI_BUS and a device added to the bus, even before the > PCI > BUS is added to kernel, the first device tends to overwrite the domain ID > with > 0. Since there exists a PCI bus with domain ID 0 already the PCI bus > addition > fails. This patch make sure when a device is added to a bus, it never > updated > the bus domain ID. Since we have the transparent SRIOV mode now, the > short VF > device name is no longer needed. > > Fixes: 4a9b0933bdfc("PCI:hv:Use device serial number as PCI domain") > Cc: sta...@vger.kernel.org > Signed-off-by: Sridhar Pitchai > --- Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> -Original Message- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; Jack > Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH v3 3/6] PCI: hv: serialize the present/eject work items > > When we hot-remove the device, we first receive a PCI_EJECT message and > then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. > > The first message is offloaded to hv_eject_device_work(), and the second > is offloaded to pci_devices_present_work(). Both the paths can be running > list_del(&hpdev->list_entry), causing general protection fault, because > system_wq can run them concurrently. > > The patch eliminates the race condition. > > Signed-off-by: Dexuan Cui > Tested-by: Adrian Suhov > Tested-by: Chris Valean > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > --- > drivers/pci/host/pci-hyperv.c | 17 ++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 2/6] PCI: hv: hv_eject_device_work(): remove the bogus test
> -Original Message- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; Jack > Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH v3 2/6] PCI: hv: hv_eject_device_work(): remove the bogus test > > When we're in the function, hpdev->state must be hv_pcichild_ejecting: > see hv_pci_eject_device(). > > Signed-off-by: Dexuan Cui > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > Cc: Michael Kelley (EOSG) > --- > drivers/pci/host/pci-hyperv.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 1233300f41c6..04edb24c92ee 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -1796,10 +1796,7 @@ static void hv_eject_device_work(struct work_struct > *work) > > hpdev = container_of(work, struct hv_pci_dev, wrk); > > - if (hpdev->state != hv_pcichild_ejecting) { > - put_pcichild(hpdev, hv_pcidev_ref_pnp); > - return; > - } > + WARN_ON(hpdev->state != hv_pcichild_ejecting); > > /* >* Ejection can come before or after the PCI bus has been set up, so > -- > 2.7.4 Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> -Original Message- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; > sta...@vger.kernel.org; Jack Morgenstein > Subject: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > Hyper-V VM with SR-IOV. This is because when we reach hv_compose_msi_msg() > by request_irq() -> request_threaded_irq() -> __setup_irq()->irq_startup() > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > disabled in __setup_irq(). > > Fix this by polling the channel. > > 2. If the host is ejecting the VF device before we reach > hv_compose_msi_msg(), in a UP VM, we can hang in hv_compose_msi_msg() > forever, because at this time the host doesn't respond to the > CREATE_INTERRUPT request. This issue also happens to old kernels like > v4.14, v4.13, etc. > > Fix this by polling the channel for the PCI_EJECT message and > hpdev->state, and by checking the PCI vendor ID. > > Note: actually the above issues also happen to a SMP VM, if > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > Signed-off-by: Dexuan Cui > Tested-by: Adrian Suhov > Tested-by: Chris Valean > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > --- > drivers/pci/host/pci-hyperv.c | 58 > ++- > 1 file changed, 57 insertions(+), 1 deletion(-) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary
> -Original Message- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; Jack > Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH v3 5/6] PCI: hv: hv_pci_devices_present(): only queue a new > work when > necessary > > If there is a pending work, we just need to add the new dr into > the dr_list. > > This is suggested by Michael Kelley. > > Signed-off-by: Dexuan Cui > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > Cc: Michael Kelley (EOSG) > --- > drivers/pci/host/pci-hyperv.c | 15 +-- > 1 file changed, 13 insertions(+), 2 deletions(-) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 4/6] PCI: hv: remove hbus->enum_sem
> -Original Message- > From: Dexuan Cui > Sent: Tuesday, March 6, 2018 10:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; Jack > Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH v3 4/6] PCI: hv: remove hbus->enum_sem > > Since we serialize the present/eject work items now, we don't need the > semaphore any more. > > This is suggested by Michael Kelley. > > Signed-off-by: Dexuan Cui > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > Cc: Michael Kelley (EOSG) > --- > drivers/pci/host/pci-hyperv.c | 17 ++--- > 1 file changed, 2 insertions(+), 15 deletions(-) > Reviewed-by: Michael Kelley ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v2 5/6] PCI: hv: hv_pci_devices_present(): only queue a new work when necessary
> -Original Message- > From: Dexuan Cui > Sent: Monday, March 5, 2018 11:22 AM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger ; o...@aepfle.de; > a...@canonical.com; > jasow...@redhat.com > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; vkuzn...@redhat.com; marcelo.ce...@canonical.com; > Michael > Kelley (EOSG) ; Dexuan Cui > ; Jack > Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH v2 5/6] PCI: hv: hv_pci_devices_present(): only queue a new > work when > necessary > > If there is a pending work, we just need to add the new dr into > the dr_list. > > This is suggested by Michael Kelley. > > Signed-off-by: Dexuan Cui > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > Cc: Michael Kelley (EOSG) > --- > drivers/pci/host/pci-hyperv.c | 19 --- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 3a385212f666..d3aa6736a9bb 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -1733,6 +1733,7 @@ static void hv_pci_devices_present(struct > hv_pcibus_device *hbus, > struct hv_dr_state *dr; > struct hv_dr_work *dr_wrk; > unsigned long flags; > + bool pending_dr; > > dr_wrk = kzalloc(sizeof(*dr_wrk), GFP_NOWAIT); > if (!dr_wrk) > @@ -1756,11 +1757,23 @@ static void hv_pci_devices_present(struct > hv_pcibus_device > *hbus, > } > > spin_lock_irqsave(&hbus->device_list_lock, flags); > + > + /* > + * If pending_dr is true, we have already queued a work, > + * which will see the new dr. Otherwise, we need to > + * queue a new work. > + */ > + pending_dr = !list_empty(&hbus->dr_list); > list_add_tail(&dr->list_entry, &hbus->dr_list); > - spin_unlock_irqrestore(&hbus->device_list_lock, flags); A minor point: The spin_unlock_irqrestore() call can stay here. Once we have the list status in a local variable and the new entry is added to the list, nothing bad can happen if we drop the spin lock. At worst, and very unlikely, we'll queue work when some other thread has already queued work to process the list entry, but that's no big deal. I'd argue for keeping the code covered by a spin lock as small as possible. Michael > > - get_hvpcibus(hbus); > - queue_work(hbus->wq, &dr_wrk->wrk); > + if (pending_dr) { > + kfree(dr_wrk); > + } else { > + get_hvpcibus(hbus); > + queue_work(hbus->wq, &dr_wrk->wrk); > + } > + > + spin_unlock_irqrestore(&hbus->device_list_lock, flags); > } > > /** > -- > 2.7.4 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 2/3] PCI: hv: serialize the present/eject work items
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > On Behalf > Of Dexuan Cui > Sent: Friday, March 2, 2018 4:21 PM > To: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; > Stephen Hemminger > Cc: linux-ker...@vger.kernel.org; driverdev-devel@linuxdriverproject.org; > Haiyang Zhang > ; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; > vkuzn...@redhat.com; marcelo.ce...@canonical.com; Dexuan Cui > ; > Jack Morgenstein ; sta...@vger.kernel.org > Subject: [PATCH 2/3] PCI: hv: serialize the present/eject work items > > When we hot-remove the device, we first receive a PCI_EJECT message and > then receive a PCI_BUS_RELATIONS message with bus_rel->device_count == 0. > > The first message is offloaded to hv_eject_device_work(), and the second > is offloaded to pci_devices_present_work(). Both the paths can be running > list_del(&hpdev->list_entry), causing general protection fault, because > system_wq can run them concurrently. > > The patch eliminates the race condition. With this patch, the enum_sem field in struct hv_pcibus_device is no longer needed. The semaphore serializes execution in hv_pci_devices_present_work(), and that serialization is now done with the ordered workqueue. Also, the last paragraph of the top level comment for hv_pci_devices_present_work() should be updated to reflect the new ordering assumptions. Separately, an unrelated bug: At the top of hv_eject_device_work(), the first test may do a put_pcichild() and return. This exit path also needs to do put_hvpcibus() to balance the ref counts, or do a goto the last two lines at the bottom of the function. > > Signed-off-by: Dexuan Cui > Tested-by: Adrian Suhov > Tested-by: Chris Valean > Cc: Vitaly Kuznetsov > Cc: Jack Morgenstein > Cc: sta...@vger.kernel.org > Cc: Stephen Hemminger > Cc: K. Y. Srinivasan > --- > drivers/pci/host/pci-hyperv.c | 19 --- > 1 file changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c > index 1233300f41c6..57b1fb3ebdb9 100644 > --- a/drivers/pci/host/pci-hyperv.c > +++ b/drivers/pci/host/pci-hyperv.c > @@ -461,6 +461,8 @@ struct hv_pcibus_device { > struct retarget_msi_interrupt retarget_msi_interrupt_params; > > spinlock_t retarget_msi_interrupt_lock; > + > + struct workqueue_struct *wq; > }; > > /* > @@ -1770,7 +1772,7 @@ static void hv_pci_devices_present(struct > hv_pcibus_device *hbus, > spin_unlock_irqrestore(&hbus->device_list_lock, flags); > > get_hvpcibus(hbus); > - schedule_work(&dr_wrk->wrk); > + queue_work(hbus->wq, &dr_wrk->wrk); This invocation of get_hvpcibus() and queue_work() could be made conditional on whether the preceding list_add_tail() transitioned the list from empty to non-empty. If the list was already non-empty, a previously queued invocation of hv_pci_devices_present_work() will find the new entry and process it. But this is an optimization in a non-perf sensitive code path, so may not be worth it. > } > > /** > @@ -1848,7 +1850,7 @@ static void hv_pci_eject_device(struct hv_pci_dev > *hpdev) > get_pcichild(hpdev, hv_pcidev_ref_pnp); > INIT_WORK(&hpdev->wrk, hv_eject_device_work); > get_hvpcibus(hpdev->hbus); > - schedule_work(&hpdev->wrk); > + queue_work(hpdev->hbus->wq, &hpdev->wrk); > } > > /** > @@ -2463,11 +2465,17 @@ static int hv_pci_probe(struct hv_device *hdev, > spin_lock_init(&hbus->retarget_msi_interrupt_lock); > sema_init(&hbus->enum_sem, 1); > init_completion(&hbus->remove_event); > + hbus->wq = alloc_ordered_workqueue("hv_pci_%x", 0, > +hbus->sysdata.domain); > + if (!hbus->wq) { > + ret = -ENOMEM; > + goto free_bus; > + } > > ret = vmbus_open(hdev->channel, pci_ring_size, pci_ring_size, NULL, 0, >hv_pci_onchannelcallback, hbus); > if (ret) > - goto free_bus; > + goto destroy_wq; > > hv_set_drvdata(hdev, hbus); > > @@ -2536,6 +2544,9 @@ static int hv_pci_probe(struct hv_device *hdev, > hv_free_config_window(hbus); > close: > vmbus_close(hdev->channel); > +destroy_wq: > + drain_workqueue(hbus->wq); The drain_workqueue() call isn't necessary. destroy_workqueue() calls drain_workqueue() and there better not be anything in the workqueue anyway since all the ref counts are zero. > + destroy_workqueue(hbus->wq); > free_bus: > free_page((unsigned long)hbus); > return ret; > @@ -2615,6 +2626,8 @@ static int hv_pci_remove(struct hv_device *hdev) > irq_domain_free_fwnode(hbus->sysdata.fwnode); > put_hvpcibus(hbus); > wait_for_completion(&hbus->remove_event); > + drain_workqueue(hbus->wq); Same here -- drain_workqueue() isn't needed. The workqueue must be empty anyway since the remove_event has completed and the ref counts will
RE: [PATCH 08/12] Drivers: hv: vmbus: Implement Direct Mode for stimer0
> -Original Message- > From: Dan Carpenter > Sent: Monday, February 12, 2018 12:42 AM > To: KY Srinivasan ; Stephen Hemminger > > Cc: gre...@linuxfoundation.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; > o...@aepfle.de; a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com; > leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger > ; Michael Kelley (EOSG) > > Subject: Re: [PATCH 08/12] Drivers: hv: vmbus: Implement Direct Mode for > stimer0 > > On Sun, Feb 11, 2018 at 05:33:16PM -0700, k...@exchange.microsoft.com wrote: > > @@ -116,9 +146,29 @@ static int hv_ce_set_oneshot(struct clock_event_device > > *evt) > > { > > union hv_timer_config timer_cfg; > > > > + timer_cfg.as_uint64 = 0; > > timer_cfg.enable = 1; > > timer_cfg.auto_enable = 1; > > - timer_cfg.sintx = VMBUS_MESSAGE_SINT; > > + if (direct_mode_enabled) > > + /* > > +* When it expires, the timer will directly interrupt > > +* on the specified hardware vector/IRQ. > > +*/ > > + { > > + timer_cfg.direct_mode = 1; > > + timer_cfg.apic_vector = stimer0_vector; > > + hv_enable_stimer0_percpu_irq(stimer0_irq); > > + } > > + else > > + /* > > +* When it expires, the timer will generate a VMbus message, > > +* to be handled by the normal VMbus interrupt handler. > > +*/ > > + { > > + timer_cfg.direct_mode = 0; > > + timer_cfg.sintx = VMBUS_MESSAGE_SINT; > > + } > > + > > This indenting isn't right. We should probably zero out .apic_vector > if .direct_mode is zero. Or maybe it's fine. I don't know if any > static analysis tools will complain... I'll fix the indenting. Old habits The " timer_cfg.as_uint64 = 0" statement already zero's out .apic_vector along with all the other unused fields in the 64-bit value, as required by the Hyper-V spec. > > > hv_init_timer_config(HV_X64_MSR_STIMER0_CONFIG, timer_cfg.as_uint64); > > > > return 0; > > @@ -191,6 +241,10 @@ int hv_synic_alloc(void) > > INIT_LIST_HEAD(&hv_cpu->chan_list); > > } > > > > + if (direct_mode_enabled && hv_setup_stimer0_irq( > > + &stimer0_irq, &stimer0_vector, hv_stimer0_isr)) > > + goto err; > > > Can you indent it like this: > > if (direct_mode_enabled && > hv_setup_stimer0_irq(&stimer0_irq, &stimer0_vector, >hv_stimer0_isr)) > goto err; OK -- will change. > > > [ What follows is a long rant not directed at you ] > > It's annoying because as soon as I see the "goto err;", I know the error > handling for this function is going to be buggy... > > Some rules of error handling are: > > 1) Each function should clean up after itself instead returning > partially allocated structures. > 2) Each allocation function should have a matching free function. > 3) Give meaningful label names based on what the label location so that > we can tell what the goto does just by looking at it, such as, > "goto free_some_variable". This way we can just keep a mental tally > of the most recently allocated resource and verify based on the > "goto free_resource;" statemetn that it frees the correct thing. We > don't need to scroll to the bottom of the function. > > Using good names means that we should avoid do-nothing gotos > because, by definition, the label name for a do-nothing goto is > going to be vague. > > In this case the label looks like this: > > > + > > return 0; > > err: > > return -ENOMEM; > > We allocate a bunch of stuff in this function so at first glance this > looks like we leak everything but, actually, the cleanup is done in > vmbus_bus_init(). This is a layering violation. > > Later on, we changed hv_synic_alloc() in 37cdd991fac8 ("vmbus: put > related per-cpu variable together") and we started allocating: > > hv_cpu->clk_evt = kzalloc(... > > but we forgot to update the error handling because it was in the wrong > place. It's a very predictable, avoidable bug if we just use proper > error handling style. We'll fix this is a separate patch. There are a couple other minor things that should be cleaned up in hv.c as well, and can do them together. Michael > > Anyway... Sorry for the long rant. Summary: Always distrust vague > label names. > > regards, > dan carpenter ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling
> -Original Message- > From: Stephen Hemminger > Sent: Tuesday, February 13, 2018 9:35 AM > To: Michael Kelley > Cc: Michael Kelley (EOSG) ; > gre...@linuxfoundation.org; > linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com; > leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger > ; KY Srinivasan > Subject: Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer > signaling > > > > if (rbi->ring_buffer->feature_bits.feat_pending_send_sz) { > > u32 pending_sz = READ_ONCE(rbi->ring_buffer->pending_send_sz); > > > > /* > > +* Ensure the read of write_index in hv_get_bytes_to_write() > > +* happens after the read of pending_send_sz. > > +*/ > > + virt_rmb(); > > + curr_write_sz = hv_get_bytes_to_write(rbi); > > + > > + /* > > * If there was space before we began iteration, > > * then host was not blocked. Also handles case where > > * pending_sz is zero then host has nothing pending > > * and does not need to be signaled. > > */ > > - if (orig_write_sz > pending_sz) > > + if (curr_write_sz - delta > pending_sz) > > return; > > > > /* If pending write will not fit, don't give false hope. */ > > - if (hv_get_bytes_to_write(rbi) < pending_sz) > > + if (curr_write_sz <= pending_sz) > > return; > > + > > + vmbus_setevent(channel); > > } > > > > - vmbus_setevent(channel); > > } > > I think this won't work on older versions of Windows where feat_pending_sz > is never set. On those versions vmbus_setevent needs to always be called. The older Windows/Hyper-V hosts poll if they can't write to the host->guest ring buffer because it is full. We don't want to call vmbus_setevent() every time through this routine because then the guest would be signaling the host every time the host interrupts the guest. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Fix ring buffer signaling
> -Original Message- > From: KY Srinivasan > Sent: Sunday, February 11, 2018 5:14 PM --- snip --- > > if (rbi->ring_buffer->feature_bits.feat_pending_send_sz) { > > u32 pending_sz = READ_ONCE(rbi->ring_buffer- > > >pending_send_sz); > > > > /* > > +* Ensure the read of write_index in > > hv_get_bytes_to_write() > > +* happens after the read of pending_send_sz. > > +*/ > > + virt_rmb(); > We can avoid the read barrier by making the initialization of curr_write_sz > conditional > on pending_send_sz being non-zero. Indeed you can make all the signaling code > conditional > on pending_send_sz being non-zero. > > I agree that we can immediately test pending_send_sz for zero, and exit if zero. A zero value will be by far the most common path, and would save executing the read barrier and hv_get_bytes_to_write(). Can also move the calculation of "delta" to after the test for zero. But I believe the read barrier is still needed on the path where pending_send_sz is non-zero. Just the testing the value of pending_send_sz doesn't guarantee that the write_index wouldn't be speculatively read, and potentially out-of-order. And we have to consider the out-of-order behaviors on ARM64 as well. > > > + curr_write_sz = hv_get_bytes_to_write(rbi); > > + > > + /* > > * If there was space before we began iteration, > > * then host was not blocked. Also handles case where > > * pending_sz is zero then host has nothing pending > > * and does not need to be signaled. > > */ > > - if (orig_write_sz > pending_sz) > > + if (curr_write_sz - delta > pending_sz) > > return; > > > > /* If pending write will not fit, don't give false hope. */ > > - if (hv_get_bytes_to_write(rbi) < pending_sz) > > + if (curr_write_sz <= pending_sz) > > return; > > + > > + vmbus_setevent(channel); > > } > > > > - vmbus_setevent(channel); > > } > > EXPORT_SYMBOL_GPL(hv_pkt_iter_close); > > -- > > 1.8.3.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
> From: Long Li > Sent: Wednesday, January 31, 2018 12:23 PM > To: Michael Kelley (EOSG) ; KY Srinivasan > ; Stephen Hemminger ; > martin.peter...@oracle.com; de...@linuxdriverproject.org; > linux-ker...@vger.kernel.org; > linux-s...@vger.kernel.org; James E . J . Bottomley > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > channel for I/O > requests > > > Subject: RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > > channel for I/O requests > > > > Updated/corrected two email addresses ... > > > > > -Original Message- > > > From: Michael Kelley (EOSG) > > > Sent: Wednesday, January 24, 2018 2:14 PM > > > To: KY Srinivasan ; Stephen Hemminger > > > ; martin.peter...@oracle.com; > > > lo...@microsoft.com; jbottom...@odin.com; > > > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > > > linux-s...@vger.kernel.org > > > Cc: Michael Kelley (EOSG) > > > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a > > > channel for I/O requests > > > > > > Update the algorithm in storvsc_do_io to look for a channel starting > > > with the current CPU + 1 and wrap around (within the current NUMA > > > node). This spreads VMbus interrupts more evenly across CPUs. Previous > > > code always started with first CPU in the current NUMA node, skewing > > > the interrupt load to that CPU. > > > > > > Signed-off-by: Michael Kelley > > > --- > > > drivers/scsi/storvsc_drv.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > > > index e07907d..f3264c4 100644 > > > --- a/drivers/scsi/storvsc_drv.c > > > +++ b/drivers/scsi/storvsc_drv.c > > > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device, > > >*/ > > > cpumask_and(&alloced_mask, &stor_device- > > >alloced_cpus, > > > > > cpumask_of_node(cpu_to_node(q_num))); > > > - for_each_cpu(tgt_cpu, &alloced_mask) { > > > + for_each_cpu_wrap(tgt_cpu, &alloced_mask, > > > + outgoing_channel->target_cpu + 1) { > > Does it work when target_cpu is the last CPU on the system? > > Otherwise, looking good. Yes, it works. for_each_cpu_wrap() correctly wraps in the case where the 3rd parameter ('start') is one past the end of the mask. Arguably, we shouldn't rely on that, and should do the wrap to 0 before calling for_each_cpu_wrap(). > > > > if (tgt_cpu != outgoing_channel->target_cpu) > > { > > > outgoing_channel = > > > stor_device->stor_chns[tgt_cpu]; > > > -- > > > 1.8.3.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
[PATCH v4 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
From: Michael Kelley The 2016 version of Hyper-V offers the option to operate the guest VM per-vcpu stimer's in Direct Mode, which means the timer interupts on its own vector rather than queueing a VMbus message. Direct Mode reduces timer processing overhead in both the hypervisor and the guest, and avoids having timer interrupts pollute the VMbus interrupt stream for the synthetic NIC and storage. This patch enables Direct Mode by default on stimer0 when running on a version of Hyper-V that supports it. In prep for coming support of Hyper-V on ARM64, the arch independent portion of the code contains calls to routines that will be populated on ARM64 but are not needed and do nothing on x86. Signed-off-by: Michael Kelley --- Changes since v1: * Major rework to allocate and use a fixed interrupt vector * Fixed block comment style * Removed minor changes unrelated to Direct Mode Changes since v2: * Remove module parameter in drivers/hv/hv.c [Greg KH] * Add declaration for hv_stimer0_vector_handler in mshyperv.h Changes since v3: * Remove vestiges of module parameter in commit message and #include list in hv.c [Greg KH] --- arch/x86/entry/entry_32.S | 3 ++ arch/x86/entry/entry_64.S | 2 ++ arch/x86/include/asm/hardirq.h | 3 ++ arch/x86/include/asm/irq_vectors.h | 7 - arch/x86/include/asm/mshyperv.h| 13 arch/x86/include/uapi/asm/hyperv.h | 3 ++ arch/x86/kernel/cpu/mshyperv.c | 41 - arch/x86/kernel/irq.c | 9 ++ drivers/hv/hv.c| 61 -- drivers/hv/hyperv_vmbus.h | 4 ++- 10 files changed, 141 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ace8f32..1814991 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -882,6 +882,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR, BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR, hyperv_vector_handler) +BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR, +hv_stimer0_vector_handler) + #endif /* CONFIG_HYPERV */ ENTRY(page_fault) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index f048e38..23afac9 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1227,6 +1227,8 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ #if IS_ENABLED(CONFIG_HYPERV) apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ hyperv_callback_vector hyperv_vector_handler +apicinterrupt3 HYPERV_STIMER0_VECTOR \ + hv_stimer0_callback_vector hv_stimer0_vector_handler #endif /* CONFIG_HYPERV */ idtentry debug do_debughas_error_code=0 paranoid=1 shift_ist=DEBUG_STACK diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 51cc979..c788343 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -38,6 +38,9 @@ #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN) unsigned int irq_hv_callback_count; #endif +#if IS_ENABLED(CONFIG_HYPERV) + unsigned int hyperv_stimer0_count; +#endif } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 67421f6..6accf0b 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -103,7 +103,12 @@ #endif #define MANAGED_IRQ_SHUTDOWN_VECTOR0xef -#define LOCAL_TIMER_VECTOR 0xee + +#if IS_ENABLED(CONFIG_HYPERV) +#define HYPERV_STIMER0_VECTOR 0xee +#endif + +#define LOCAL_TIMER_VECTOR 0xed #define NR_VECTORS 256 diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index b623a42..5a170e5 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -171,6 +171,19 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type) void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs)); void hv_remove_crash_handler(void); +/* + * Routines for stimer0 Direct Mode handling. + * On x86/x64, there are no percpu actions to take. + */ +void hv_stimer0_vector_handler(struct pt_regs *regs); +void hv_stimer0_callback_vector(void); +int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void)); +void hv_remove_stimer0_irq(int irq); + +static inline void hv_enable_stimer0_percpu_irq(int irq) {} +static inline void hv_disable_stimer0_percpu_irq(int irq) {} + + #if IS_ENABLED(CONFIG_HYPERV) extern struct clocksource *hyperv_cs; extern void *hv_hypercall_pg; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index 1a5bfea..7213cb8 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -74,6 +74,9 @@ /* Crash M
[PATCH v3 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
From: Michael Kelley The 2016 version of Hyper-V offers the option to operate the guest VM per-vcpu stimer's in Direct Mode, which means the timer interupts on its own vector rather than queueing a VMbus message. Direct Mode reduces timer processing overhead in both the hypervisor and the guest, and avoids having timer interrupts pollute the VMbus interrupt stream for the synthetic NIC and storage. This patch enables Direct Mode by default on stimer0 when running on a version of Hyper-V that supports it, with a hv_vmbus module parameter for disabling Direct Mode and reverting to the old behavior. In prep for coming support of Hyper-V on ARM64, the arch independent portion of the code contains calls to routines that will be populated on ARM64 but are not needed and do nothing on x86. Signed-off-by: Michael Kelley --- Changes since v1: * Major rework to allocate and use a fixed interrupt vector * Fixed block comment style * Removed minor changes unrelated to Direct Mode Changes since v2: * Remove module parameter in drivers/hv/hv.c [Greg KH] * Add declaration for hv_stimer0_vector_handler in mshyperv.h --- arch/x86/entry/entry_32.S | 3 ++ arch/x86/entry/entry_64.S | 2 ++ arch/x86/include/asm/hardirq.h | 3 ++ arch/x86/include/asm/irq_vectors.h | 7 - arch/x86/include/asm/mshyperv.h| 13 arch/x86/include/uapi/asm/hyperv.h | 3 ++ arch/x86/kernel/cpu/mshyperv.c | 41 - arch/x86/kernel/irq.c | 9 ++ drivers/hv/hv.c| 62 -- drivers/hv/hyperv_vmbus.h | 4 ++- 10 files changed, 142 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index ace8f32..1814991 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -882,6 +882,9 @@ BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR, BUILD_INTERRUPT3(hyperv_callback_vector, HYPERVISOR_CALLBACK_VECTOR, hyperv_vector_handler) +BUILD_INTERRUPT3(hv_stimer0_callback_vector, HYPERV_STIMER0_VECTOR, +hv_stimer0_vector_handler) + #endif /* CONFIG_HYPERV */ ENTRY(page_fault) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index f048e38..23afac9 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1227,6 +1227,8 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ #if IS_ENABLED(CONFIG_HYPERV) apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \ hyperv_callback_vector hyperv_vector_handler +apicinterrupt3 HYPERV_STIMER0_VECTOR \ + hv_stimer0_callback_vector hv_stimer0_vector_handler #endif /* CONFIG_HYPERV */ idtentry debug do_debughas_error_code=0 paranoid=1 shift_ist=DEBUG_STACK diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 51cc979..c788343 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -38,6 +38,9 @@ #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN) unsigned int irq_hv_callback_count; #endif +#if IS_ENABLED(CONFIG_HYPERV) + unsigned int hyperv_stimer0_count; +#endif } cacheline_aligned irq_cpustat_t; DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index 67421f6..6accf0b 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -103,7 +103,12 @@ #endif #define MANAGED_IRQ_SHUTDOWN_VECTOR0xef -#define LOCAL_TIMER_VECTOR 0xee + +#if IS_ENABLED(CONFIG_HYPERV) +#define HYPERV_STIMER0_VECTOR 0xee +#endif + +#define LOCAL_TIMER_VECTOR 0xed #define NR_VECTORS 256 diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index b623a42..5a170e5 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -171,6 +171,19 @@ static inline void vmbus_signal_eom(struct hv_message *msg, u32 old_msg_type) void hv_setup_crash_handler(void (*handler)(struct pt_regs *regs)); void hv_remove_crash_handler(void); +/* + * Routines for stimer0 Direct Mode handling. + * On x86/x64, there are no percpu actions to take. + */ +void hv_stimer0_vector_handler(struct pt_regs *regs); +void hv_stimer0_callback_vector(void); +int hv_setup_stimer0_irq(int *irq, int *vector, void (*handler)(void)); +void hv_remove_stimer0_irq(int irq); + +static inline void hv_enable_stimer0_percpu_irq(int irq) {} +static inline void hv_disable_stimer0_percpu_irq(int irq) {} + + #if IS_ENABLED(CONFIG_HYPERV) extern struct clocksource *hyperv_cs; extern void *hv_hypercall_pg; diff --git a/arch/x86/include/uapi/asm/hyperv.h b/arch/x86/include/uapi/asm/hyperv.h index 1a5bfea..7213cb8 100644 --- a/arch/x86/include/uapi/asm/hyperv.h +++ b/arch/x86/include/uapi/asm/hyperv.h @@ -74,6 +74,9 @@ /* Crash MSR available */ #d
RE: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Thursday, January 25, 2018 2:00 AM > To: Michael Kelley (EOSG) > Cc: linux-ker...@vger.kernel.org; de...@linuxdriverproject.org; > o...@aepfle.de; > a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com; > leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger > ; KY Srinivasan ; x...@kernel.org; > j...@pun.org > Subject: Re: [PATCH v2 char-misc 1/1] Drivers: hv: vmbus: Implement Direct > Mode for stimer0 > > On Mon, Jan 22, 2018 at 03:29:29PM -0700, mikel...@exchange.microsoft.com > wrote: > > +/* > > + * If false, we're using the old mechanism for stimer0 interrupts > > + * where it sends a VMbus message when it expires. The old > > + * mechanism is used if Direct Mode is explicitly disabled > > + * by the module parameter, or on older versions of Hyper-V > > + * that don't support Direct Mode. While Hyper-V provides > > + * four stimer's per CPU, Linux uses only stimer0. > > + */ > > +static bool direct_mode_enabled = true; > > +module_param(direct_mode_enabled, bool, 0444); > > +MODULE_PARM_DESC(direct_mode_enabled, > > + "Set to N to disable stimer Direct Mode."); > > Ick ick ick. How is a distro or user supposed to know if/when to enable > this and not to enable it? This isn't ok, can you please make this > "automatic" to always do the right thing based on what it is running on? To be clear, this patch already does the "automatic" thing based on the capabilities of the hypervisor. But it's often handy to be able to override the automatic behavior without having to rebuild the kernel. > > Module parameters are not a good idea as they are impossible to maintain > and document and use over time. Please never add new ones to the > kernel. > Is there a better alternative for providing an "override" capability? Having an override isn't super important in this case, so I'll drop the parameter, but it would be nice to have something easier than a rebuild. Michael > thanks, > > greg k-h ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
[PATCH 1/1] scsi: storvsc: Increase cmd_per_lun for higher speed devices
Increase cmd_per_lun to allow more I/Os in progress per device, particularly for NVMe's. The Hyper-V host side can handle the higher count with no issues. Signed-off-by: Michael Kelley --- drivers/scsi/storvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index f3264c4..6205107 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1657,7 +1657,7 @@ static int storvsc_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *scmnd) .eh_timed_out = storvsc_eh_timed_out, .slave_alloc = storvsc_device_alloc, .slave_configure = storvsc_device_configure, - .cmd_per_lun = 255, + .cmd_per_lun = 2048, .this_id = -1, .use_clustering = ENABLE_CLUSTERING, /* Make sure we dont get a sg segment crosses a page boundary */ -- 1.8.3.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
Updated/corrected two email addresses ... > -Original Message- > From: Michael Kelley (EOSG) > Sent: Wednesday, January 24, 2018 2:14 PM > To: KY Srinivasan ; Stephen Hemminger > ; > martin.peter...@oracle.com; lo...@microsoft.com; jbottom...@odin.com; > de...@linuxdriverproject.org; linux-ker...@vger.kernel.org; > linux-s...@vger.kernel.org > Cc: Michael Kelley (EOSG) > Subject: [PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel > for I/O requests > > Update the algorithm in storvsc_do_io to look for a channel > starting with the current CPU + 1 and wrap around (within the > current NUMA node). This spreads VMbus interrupts more evenly > across CPUs. Previous code always started with first CPU in > the current NUMA node, skewing the interrupt load to that CPU. > > Signed-off-by: Michael Kelley > --- > drivers/scsi/storvsc_drv.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c > index e07907d..f3264c4 100644 > --- a/drivers/scsi/storvsc_drv.c > +++ b/drivers/scsi/storvsc_drv.c > @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device, >*/ > cpumask_and(&alloced_mask, &stor_device->alloced_cpus, > cpumask_of_node(cpu_to_node(q_num))); > - for_each_cpu(tgt_cpu, &alloced_mask) { > + for_each_cpu_wrap(tgt_cpu, &alloced_mask, > + outgoing_channel->target_cpu + 1) { > if (tgt_cpu != outgoing_channel->target_cpu) { > outgoing_channel = > stor_device->stor_chns[tgt_cpu]; > -- > 1.8.3.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
[PATCH 1/1] scsi: storvsc: Spread interrupts when picking a channel for I/O requests
Update the algorithm in storvsc_do_io to look for a channel starting with the current CPU + 1 and wrap around (within the current NUMA node). This spreads VMbus interrupts more evenly across CPUs. Previous code always started with first CPU in the current NUMA node, skewing the interrupt load to that CPU. Signed-off-by: Michael Kelley --- drivers/scsi/storvsc_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c index e07907d..f3264c4 100644 --- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1310,7 +1310,8 @@ static int storvsc_do_io(struct hv_device *device, */ cpumask_and(&alloced_mask, &stor_device->alloced_cpus, cpumask_of_node(cpu_to_node(q_num))); - for_each_cpu(tgt_cpu, &alloced_mask) { + for_each_cpu_wrap(tgt_cpu, &alloced_mask, + outgoing_channel->target_cpu + 1) { if (tgt_cpu != outgoing_channel->target_cpu) { outgoing_channel = stor_device->stor_chns[tgt_cpu]; -- 1.8.3.1 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support
Vitaly Kuznetsov writes: > > > "Michael Kelley (EOSG)" writes: > > > >> On Fri, 19 Jan 2018, Thomas Gleixner wrote: > >> > >>> > >>> You added '#include ' to mshyperv.h which is included > >>> in > >>> vclock_gettime.c and pulls in other stuff which fails to expand > >> > >> Is the declaration of hyperv_reenlightenment_intr() even needed in > >> mshyperv.h? The '#include ' is there for the > >> __irq_entry > >> annotation on that declaration. There's a declaration of the parallel > >> (and > >> unannotated) hyperv_vector_handler(), but that looks like a fossil that > >> isn't needed either. > >> > > > > True, > > > > the only need for the declaration in mshyperv.h is to silence "warning: > > no previous prototype for ‘hyperv_reenlightenment_intr’"; I'm not sure > > if this actually needs fixing. > > It seems we can get away with dropping '__visible' and '__irq_entry' > qualifiers from 'hyperv_reenlightenment_intr' declaration in mshyperv.h > while keeping them in hv_init.c for the implementation. This way we can > avoid the nasty 'no previous prototype' warning and not have to add > '#include ' to mshyperv.h breaking vdso. Where exactly is the 'no previous prototype' warning being generated? I was thinking the only references to hyperv_reenlightenment_intr() would be in entry_32.S and entry_64.S, but maybe there's another one I missed. Michael > > I'm inclined to go ahead with this in v4 if nobody objects. > > -- > Vitaly ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support
On Fri, 19 Jan 2018, Thomas Gleixner wrote: > -Original Message- > From: Thomas Gleixner [mailto:t...@linutronix.de] > Sent: Friday, January 19, 2018 11:48 PM > To: Vitaly Kuznetsov > Cc: kbuild test robot ; kbuild-...@01.org; > k...@vger.kernel.org; > x...@kernel.org; Stephen Hemminger ; Radim Krčmář > ; Haiyang Zhang ; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; Michael Kelley (EOSG) > ; Ingo Molnar ; Roman Kagan > ; Andy Lutomirski ; H. Peter Anvin > ; Paolo Bonzini ; Mohammed Gamal > > Subject: Re: [PATCH v3 3/7] x86/hyper-v: reenlightenment notifications support > > On Fri, 19 Jan 2018, Vitaly Kuznetsov wrote: > > kbuild test robot writes: > > > > > Hi Vitaly, > > > > > > Thank you for the patch! Perhaps something to improve: > > > > > > [auto build test WARNING on tip/auto-latest] > > > [also build test WARNING on v4.15-rc8 next-20180118] > > > [cannot apply to tip/x86/core kvm/linux-next] > > > [if your patch is applied to the wrong git tree, please drop us a note to > > > help improve the > system] > > > > > > url: > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2F0day- > ci%2Flinux%2Fcommits%2FVitaly-Kuznetsov%2Fx86-kvm-hyperv-stable-clocksorce-for-L2- > guests-when-running-nested-KVM-on-Hyper-V%2F20180119- > 160814&data=02%7C01%7CMichael.H.Kelley%40microsoft.com%7Ce95c66107da6446826830 > 8d55fda2c2b%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636520313062777623&s > data=kAXl3mLVUdJi%2BsB4Ub0fmUHQfl6NuUDjW%2FAY9%2BFLZE4%3D&reserved=0 > > > config: x86_64-allmodconfig (attached as .config) > > > compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025 > > > reproduce: > > > # save the attached .config to linux build tree > > > make ARCH=x86_64 > > > > > > All warnings (new ones prefixed by >>): > > > > > >In file included from include/linux/kasan.h:17:0, > > > from include/linux/slab.h:129, > > > from include/linux/irq.h:26, > > > from arch/x86/include/asm/hardirq.h:6, > > > from include/linux/hardirq.h:9, > > > from include/linux/interrupt.h:13, > > > from arch/x86/include/asm/mshyperv.h:8, > > > from > > > arch/x86//entry/vdso/vdso32/../vclock_gettime.c:20, > > > from arch/x86//entry/vdso/vdso32/vclock_gettime.c:33: > > >arch/x86/include/asm/pgtable.h: In function 'clone_pgd_range': > > >arch/x86/include/asm/pgtable.h:1129:9: error: implicit declaration of > > > function > 'kernel_to_user_pgdp'; did you mean 'u64_to_user_ptr'? > [-Werror=implicit-function- > declaration] > > > memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src), > > > ^~~ > > > > Sorry but I'm failing to see how this (and all the rest) is related to > > my patch ... > > You added '#include ' to mshyperv.h which is included in > vclock_gettime.c and pulls in other stuff which fails to expand Is the declaration of hyperv_reenlightenment_intr() even needed in mshyperv.h? The '#include ' is there for the __irq_entry annotation on that declaration. There's a declaration of the parallel (and unannotated) hyperv_vector_handler(), but that looks like a fossil that isn't needed either. Michael > > Thanks, > > tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH 3/6] x86/hyper-v: reenlightenment notifications support
On Fri, Dec 08, 2017 at 11:49:57AM +0100, Vitaly Kuznetsov wrote: > -Original Message- > From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > Sent: Friday, December 8, 2017 2:50 AM > To: k...@vger.kernel.org; x...@kernel.org > Cc: Paolo Bonzini ; Radim Krčmář ; > Thomas > Gleixner ; Ingo Molnar ; H. Peter Anvin > ; KY Srinivasan ; Haiyang Zhang > ; Stephen Hemminger ; Michael > Kelley (EOSG) ; Andy Lutomirski > ; > Mohammed Gamal ; Cathy Avery ; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org > Subject: [PATCH 3/6] x86/hyper-v: reenlightenment notifications support > > Hyper-V supports Live Migration notification. This is supposed to be used in > conjunction with > TSC emulation: when we are migrated to a host with different TSC frequency > for some short > period host emulates our accesses to TSC and sends us an interrupt to notify > about the event. > When we're done updating everything we can disable TSC emulation and > everything will start > working fast again. > > We didn't need these notifications before as Hyper-V guests are not supposed > to use TSC as a > clocksource: in Linux we even mark it as unstable on boot. Guests normally > use 'tsc page' > clocksouce and host updates its values on migrations automatically. > > Things change when we want to run nested virtualization: even when we pass > through PV > clocksources (kvm-clock or tsc page) to our guests we need to know TSC > frequency and when it > changes. > > Hyper-V Top Level Functional Specification (as of v5.0b) wrongly specifies > EAX:BIT(12) of CPUID:0x4009 as the feature identification bit. The right > one to check is > EAX:BIT(13) of CPUID:0x4003. I was assured that the fix in on the way. > > Signed-off-by: Vitaly Kuznetsov [snip] > diff --git a/arch/x86/include/asm/irq_vectors.h > b/arch/x86/include/asm/irq_vectors.h > index 67421f649cfa..e71c1120426b 100644 > --- a/arch/x86/include/asm/irq_vectors.h > +++ b/arch/x86/include/asm/irq_vectors.h > @@ -103,7 +103,12 @@ > #endif > > #define MANAGED_IRQ_SHUTDOWN_VECTOR 0xef > -#define LOCAL_TIMER_VECTOR 0xee > + > +#if IS_ENABLED(CONFIG_HYPERV) > +#define HYPERV_REENLIGHTENMENT_VECTOR0xee > +#endif > + > +#define LOCAL_TIMER_VECTOR 0xed > > #define NR_VECTORS256 [snip] Since you are pre-allocating a new vector, would you want to update the irq_cpustat_t data structure and your interrupt handler to count the occurrences of these interrupts, and update arch_show_interrupts() to show the count? Then cat /proc/interrupts will show the count. The reenlightenment interrupts will presumably be rare, but so are some of the others that are already counted and displayed, and it seems like consistency should be maintained. Michael ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
Thomas Gleixner writes: > On Tue, 31 Oct 2017, mikel...@exchange.microsoft.com wrote: > > diff --git a/arch/x86/include/uapi/asm/hyperv.h > > b/arch/x86/include/uapi/asm/hyperv.h > > index f65d125..408cf3e 100644 > > --- a/arch/x86/include/uapi/asm/hyperv.h > > +++ b/arch/x86/include/uapi/asm/hyperv.h > > @@ -112,6 +112,22 @@ > > #define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 << 5) > > /* Guest crash data handler available */ > > #define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 << 10) > > +/* Debug MSRs available */ > > +#define HV_X64_DEBUG_MSR_AVAILABLE (1 << 11) > > +/* Support for Non-Privileged Instruction Execution Prevention is > > available */ > > +#define HV_X64_NPIEP_AVAILABLE (1 << 12) > > +/* Support for DisableHypervisor is available */ > > +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE(1 << 13) > > +/* Extended GVA Ranges for Flush Virtual Address list is available */ > > +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE(1 << 14) > > +/* Return Hypercall output via XMM registers is available */ > > +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (1 << 15) > > +/* SINT polling mode available */ > > +#define HV_X64_SINT_POLLING_MODE_AVAILABLE (1 << 17) > > +/* Hypercall MSR lock is available */ > > +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE(1 << 18) > > +/* stimer direct mode is available */ > > +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE(1 << 19) > > How are all these defines (except the last one related to that patch? Will move to a separate patch. > > > +/* Hardware IRQ number to use for stimer0 in Direct Mode. This IRQ > > +is a fake > > + * because stimer's in Direct Mode simply interrupt on the specified > > +vector, > > + * without using a particular IOAPIC pin. But we use the IRQ > > +allocation > > + * machinery, so we need a hardware IRQ #. This value is somewhat > > +arbitrary, > > + * but it should not be a legacy IRQ (0 to 15), and should fit within > > +the > > + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So > > +any value > > + * between 16 and 23 should be good. > > + */ > > +#define HV_STIMER0_IRQNR 18 > > Why would you want abuse an IOAPIC interrupt if all you need is a vector? Allocating a vector up-front like the existing HYPERVISOR_CALLBACK_VECTOR would certainly be more straightforward, and in fact, my original internal testing of stimer Direct Mode used that approach. Vectors are a limited resource, so I wanted to find a way to allocate one on-the-fly rather than use fixed pre-allocation, even under CONFIG_HYPERV. But I've got no problem with allocating a vector up-front and skipping all the irq/APIC manipulation and related issues. Any guidance on which vector to use? > > > +/* Routines to do per-architecture handling of stimer0 when in Direct > > +Mode */ > > + > > +void hv_ack_stimer0_interrupt(struct irq_desc *desc) { > > + ack_APIC_irq(); > > +} > > + > > +static void allonline_vector_allocation_domain(int cpu, struct cpumask > > *retmask, > > + const struct cpumask *mask) > > +{ > > + cpumask_copy(retmask, cpu_online_mask); } > > + > > +int hv_allocate_stimer0_irq(int *irq, int *vector) { > > + int localirq; > > + int result; > > + struct irq_data *irq_data; > > + > > + /* The normal APIC vector allocation domain allows allocation of > > +vectors > > Please fix you comment style. Multi line comments are: > >/* > * Bla > * foo... > */ > Will do. > > +* only for the calling CPU. So we change the allocation domain to one > > +* that allows vectors to be allocated in all online CPUs. This > > +* change is fine in a Hyper-V VM because VMs don't have the usual > > +* complement of interrupting devices. > > +*/ > > + apic->vector_allocation_domain = allonline_vector_allocation_domain; > > This does not work anymore. vector_allocation_domain is gone as of 4.15. > Please check the > vector rework in > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip x86/apic > > Aside of that what guarantees that all cpus are online at the point where you > allocate that > interrupt? Nothing, so the vector will be not reserved or allocated on > offline CPUs. Now guess > what happens if you bring the offline CPUs online later, it will drown in > spurious interrupts. > Worse, the vector can also be reused for a device interrupt. Great plan. > > > + localirq = acpi_register_gsi(NULL, HV_STIMER0_IRQNR, > > + ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH); > > + if (localirq < 0) { > > + pr_err("Cannot register stimer0 gsi. Error %d", localirq); > > + return -1; > > + } > > + > > + /* We pass in a dummy IRQ handler because architecture independent code > > +* will later override the IRQ domain interrupt handler and set
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
Vitaly Kuznetsov writes: > Vitaly Kuznetsov writes: > > > mikel...@exchange.microsoft.com writes: > > > >> From: Michael Kelley > >> > >> The 2016 version of Hyper-V offers the option to operate the guest VM > >> per-vcpu stimer's in Direct Mode, which means the timer interupts on > >> its own vector rather than queueing a VMbus message. Direct Mode > >> reduces timer processing overhead in both the hypervisor and the > >> guest, and avoids having timer interrupts pollute the VMbus interrupt > >> stream for the synthetic NIC and storage. This patch enables Direct > >> Mode by default on stimer0 (which is the only stimer used in Linux on > >> Hyper-V) when running on a version of Hyper-V that supports it, with > >> a hv_vmbus module parameter for disabling Direct Mode and reverting > >> to the old behavior. > >> > >> Signed-off-by: Michael Kelley > >> --- > >> arch/x86/include/asm/mshyperv.h| 14 > >> arch/x86/include/uapi/asm/hyperv.h | 26 ++ > >> arch/x86/kernel/cpu/mshyperv.c | 64 +- > >> drivers/hv/hv.c| 71 > >> -- > >> drivers/hv/hyperv_vmbus.h | 4 ++- > >> 5 files changed, 175 insertions(+), 4 deletions(-) > >> > > (in addition to my previous comment) > > This patch is also x86-heavy so it probably makes sense to make x86 > maintainers (Thomas, > Peter, Ingo) aware no matter which tree it will go through. > > CC: x...@kernel.org > > >> diff --git a/arch/x86/include/asm/mshyperv.h > >> b/arch/x86/include/asm/mshyperv.h index 740dc97..1bba1d7 100644 > >> --- a/arch/x86/include/asm/mshyperv.h > >> +++ b/arch/x86/include/asm/mshyperv.h > >> @@ -4,6 +4,8 @@ > >> #include > >> #include > >> #include > >> +#include > >> +#include > >> #include > >> #include > >> > >> @@ -374,4 +376,16 @@ static inline struct ms_hyperv_tsc_page > >> *hv_get_tsc_page(void) > >>return NULL; > >> } > >> #endif > >> + > >> +/* Per architecture routines for stimer0 Direct Mode handling. On > >> +x86/x64 > >> + * there are no percpu actions to take. > >> + */ > >> +#if IS_ENABLED(CONFIG_HYPERV) > >> +static inline void hv_enable_stimer0_percpu_irq(int irq) { } static > >> +inline void hv_disable_stimer0_percpu_irq(int irq) { } extern int > >> +hv_allocate_stimer0_irq(int *irq, int *vector); extern void > >> +hv_deallocate_stimer0_irq(int irq); extern void > >> +hv_ack_stimer0_interrupt(struct irq_desc *desc); #endif > >> + > >> #endif > >> diff --git a/arch/x86/include/uapi/asm/hyperv.h > >> b/arch/x86/include/uapi/asm/hyperv.h > >> index f65d125..408cf3e 100644 > >> --- a/arch/x86/include/uapi/asm/hyperv.h > >> +++ b/arch/x86/include/uapi/asm/hyperv.h > >> @@ -112,6 +112,22 @@ > >> #define HV_X64_GUEST_IDLE_STATE_AVAILABLE (1 << 5) > >> /* Guest crash data handler available */ > >> #define HV_X64_GUEST_CRASH_MSR_AVAILABLE (1 << 10) > >> +/* Debug MSRs available */ > >> +#define HV_X64_DEBUG_MSR_AVAILABLE(1 << 11) > >> +/* Support for Non-Privileged Instruction Execution Prevention is > >> available */ > >> +#define HV_X64_NPIEP_AVAILABLE(1 << 12) > >> +/* Support for DisableHypervisor is available */ > >> +#define HV_X64_DISABLE_HYPERVISOR_AVAILABLE (1 << 13) > >> +/* Extended GVA Ranges for Flush Virtual Address list is available */ > >> +#define HV_X64_EXTENDED_GVA_RANGE_AVAILABLE (1 << 14) > >> +/* Return Hypercall output via XMM registers is available */ > >> +#define HV_X64_HYPERCALL_XMM_OUTPUT_AVAILABLE (1 << 15) > >> +/* SINT polling mode available */ > >> +#define HV_X64_SINT_POLLING_MODE_AVAILABLE(1 << 17) > >> +/* Hypercall MSR lock is available */ > >> +#define HV_X64_HYPERCALL_MSR_LOCK_AVAILABLE (1 << 18) > >> +/* stimer direct mode is available */ > >> +#define HV_X64_STIMER_DIRECT_MODE_AVAILABLE (1 << 19) > >> > >> /* > >> * Implementation recommendations. Indicates which behaviors the > >> hypervisor @@ -300,6 +316,16 @@ enum HV_GENERIC_SET_FORMAT { > >> > >> #define HV_SYNIC_STIMER_COUNT (4) > >> > >> +/* Hardware IRQ number to use for stimer0 in Direct Mode. This IRQ > >> +is a fake > >> + * because stimer's in Direct Mode simply interrupt on the specified > >> +vector, > >> + * without using a particular IOAPIC pin. But we use the IRQ > >> +allocation > >> + * machinery, so we need a hardware IRQ #. This value is somewhat > >> +arbitrary, > >> + * but it should not be a legacy IRQ (0 to 15), and should fit > >> +within the > >> + * single IOAPIC (0 to 23) that Hyper-V provides to a guest VM. So > >> +any value > >> + * between 16 and 23 should be good. > >> + */ > > > > (nitpick): all comments in the patch are in 'net' style: > > > > /* This is a > > * comment. > > */ > > > > While in Hyper-V files (and x86 in general) the 'normal' style is used: > > > > /* > > * This i
RE: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0
It's fixed. Michael -Original Message- From: Thomas Gleixner [mailto:t...@linutronix.de] Sent: Wednesday, November 1, 2017 9:20 AM To: Michael Kelley (EOSG) Cc: gre...@linuxfoundation.org; LKML ; de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; vkuzn...@redhat.com; jasow...@redhat.com; leann.ogasaw...@canonical.com; marcelo.ce...@canonical.com; Stephen Hemminger ; KY Srinivasan Subject: Re: [PATCH char-misc 1/1] Drivers: hv: vmbus: Implement Direct Mode for stimer0 On Wed, 1 Nov 2017, Thomas Gleixner wrote: > On Tue, 31 Oct 2017, mikel...@exchange.microsoft.com wrote: ^^^ Please fix your mail process. mikel...@exchange.microsoft.com is not your real mail address, but shows up as From and obviously the reply bounces: mikel...@exchange.microsoft.com Remote Server returned '550 5.1.10 RESOLVER.ADR.RecipientNotFound; Recipient not found by SMTP address lookup' Sigh, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel