Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 18, 2021 at 02:15:47PM +0200, Greg KH wrote: > On Mon, Oct 11, 2021 at 07:59:17AM -0400, Michael S. Tsirkin wrote: > > On Sun, Oct 10, 2021 at 03:22:39PM -0700, Andi Kleen wrote: > > > > > > > To which Andi replied > > > > One problem with removing the ioremap opt-in is that > > > > it's still possible for drivers to get at devices without going > > > > through probe. > > > > > > > > To which Greg replied: > > > > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > > > > If there are in-kernel PCI drivers that do not do this, they > > > > need to be > > > > fixed today. > > > > > > > > Can you guys resolve the differences here? > > > > > > > > > I addressed this in my other mail, but we may need more discussion. > > > > Hopefully Greg will reply to that one. > > Note, when wanting Greg to reply, someone should at the very least cc: > him. "that one" being "Andi's other mail". Which I don't remember what it was, by now. Sorry. > {sigh} > > greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 11, 2021 at 07:59:17AM -0400, Michael S. Tsirkin wrote: > On Sun, Oct 10, 2021 at 03:22:39PM -0700, Andi Kleen wrote: > > > > > To which Andi replied > > > One problem with removing the ioremap opt-in is that > > > it's still possible for drivers to get at devices without going through > > > probe. > > > > > > To which Greg replied: > > > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > > > If there are in-kernel PCI drivers that do not do this, they need to be > > > fixed today. > > > > > > Can you guys resolve the differences here? > > > > > > I addressed this in my other mail, but we may need more discussion. > > Hopefully Greg will reply to that one. Note, when wanting Greg to reply, someone should at the very least cc: him. {sigh} greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 11:35:04AM -0700, Andi Kleen wrote: > > I'd rather see more concerted efforts focused/limited core changes > > rather than leaf driver changes until there is a clearer definition of > > hardened. > > A hardened driver is a driver that Ah, you do define this, thank you! > - Had similar security (not API) oriented review of its IO operations > (mainly MMIO access, but also PCI config space) as a non privileged user > interface (like a ioctl). That review should be focused on memory safety. Where is this review done? Where is is documented? Who is responsible for keeping it up to date with every code change to the driver, and to the code that the driver calls and the code that calls the driver? > - Had some fuzzing on these IO interfaces using to be released tools. "some"? What tools? What is the input, and where is that defined? How much fuzzing do you claim is "good enough"? > Right now it's only three virtio drivers (console, net, block) Where was this work done and published? And why only 3? > Really it's no different than what we do for every new unprivileged user > interface. Really? I have seen loads of new drivers from Intel submitted in the past months that would fail any of the above things just based on obvious code reviews that I end up having to do... If you want to start a "hardened driver" effort, there's a lot of real work that needs to be done here and documented, and explained why it can not just be done for the whole kernel... greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Sun, Oct 10, 2021 at 03:11:23PM -0700, Andi Kleen wrote: > > On 10/9/2021 1:39 PM, Dan Williams wrote: > > On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin wrote: > > > On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan > > > wrote: > > > > From: Andi Kleen > > > > > > > > For Confidential VM guests like TDX, the host is untrusted and hence > > > > the devices emulated by the host or any data coming from the host > > > > cannot be trusted. So the drivers that interact with the outside world > > > > have to be hardened by sharing memory with host on need basis > > > > with proper hardening fixes. > > > > > > > > For the PCI driver case, to share the memory with the host add > > > > pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. > > > > > > > > Signed-off-by: Andi Kleen > > > > Signed-off-by: Kuppuswamy Sathyanarayanan > > > > > > > So I proposed to make all pci mappings shared, eliminating the need > > > to patch drivers. > > > > > > To which Andi replied > > > One problem with removing the ioremap opt-in is that > > > it's still possible for drivers to get at devices without going > > > through probe. > > > > > > To which Greg replied: > > > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > > > If there are in-kernel PCI drivers that do not do this, they > > > need to be > > > fixed today. > > > > > > Can you guys resolve the differences here? > > I agree with you and Greg here. If a driver is accessing hardware > > resources outside of the bind lifetime of one of the devices it > > supports, and in a way that neither modrobe-policy nor > > device-authorization -policy infrastructure can block, that sounds > > like a bug report. > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO and > others) in init functions that also register drivers (thanks Elena for the > number) > > Some are probably old drivers that could be fixed, but it's quite a few > legitimate cases. For example for platform or ISA drivers that's the only > way they can be implemented because they often have no other enumeration > mechanism. For PCI drivers it's rarer, but also still can happen. One > example that comes to mind here is the x86 Intel uncore drivers, which > support a mix of MSR, ioremap and PCI config space accesses all from the > same driver. This particular example can (and should be) fixed in other > ways, but similar things also happen in other drivers, and they're not all > broken. Even for the broken ones they're usually for some crufty old devices > that has very few users, so it's likely untestable in practice. > > My point is just that the ecosystem of devices that Linux supports is messy > enough that there are legitimate exceptions from the "First IO only in probe > call only" rule. No, there should not be for PCI drivers. If there is, that is a bug that you can, and should, fix. > And we can't just fix them all. Even if we could it would be hard to > maintain. Not true at all, you can fix them, and write a simple coccinelle rule to prevent them from ever coming back in. > Using a "firewall model" hooking into a few strategic points like we're > proposing here is much saner for everyone. No it is not. It is "easier" for you because you all do not want to fix up all of the drivers and want to add additional code complexity on top of the current mess that we have and then you can claim that you have "hardened" the drivers you care about. Despite no one ever explaining exactly what "hardened" means to me. Again, fix the existing drivers, you have the whole source, if this is something that you all care about, it should not be hard to do. Stop making excuses. greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 18 2021 at 02:55, Thomas Gleixner wrote: > On Sun, Oct 10 2021 at 15:11, Andi Kleen wrote: >> The 5.15 tree has something like ~2.4k IO accesses (including MMIO and >> others) in init functions that also register drivers (thanks Elena for >> the number) > > These numbers are completely useless simply because they are based on > nonsensical criteria. See: > > https://lore.kernel.org/r/87r1cj2uad.ffs@tglx > >> My point is just that the ecosystem of devices that Linux supports is >> messy enough that there are legitimate exceptions from the "First IO >> only in probe call only" rule. > > Your point is based on your outright refusal to actualy do a proper > analysis and your outright refusal to help fixing the real problems. > > All you have provided so far is handwaving based on a completely useless > analysis. > > Sure, your goal is to get this TDX problem solved, but it's not going to > be solved by: > > 1) Providing a nonsensical analysis > > 2) Using #1 as an argument to hack some half baken interfaces into the > kernel which allow you to tick off your checkbox and then leave the > resulting mess for others to clean up. > > Try again when you have factual data to back up your claims and factual > arguments which prove that the problem can't be fixed otherwise. > > I might be repeating myself, but kernel development works this way: > > 1) Hack your private POC - Yay! > > 2) Sit down and think hard about the problems you identified in step > #1. Do a thorough analysis. > > 3) Come up with a sensible integration plan. > > 4) Do the necessary grump work of cleanups all over the place > > 5) Add sensible infrastructure which is understandable for the bulk > of kernel/driver developers > > 6) Let your feature fall in place > > and not in the way you are insisting on: > > 1) Hack your private POC - Yay! > > 2) Define that this is the only way to do it and try to shove it down > the throat of everyone. > > 3) Getting told that this is not the way it works > > 4) Insist on it forever and blame the grumpy maintainers who are just > not understanding the great value of your approach. > > 5) Go back to #2 > > You should know that already, but I have no problem to give that lecture > to you over and over again. I probably should create a form letter. > > And no, you can bitch about me as much as you want. These are not my > personal rules and personal pet pieves. These are rules Linus cares > about very much and aside of that they just reflect common sense. > > The kernel is a common good and not the dump ground for your personal > brain waste. > > The kernel does not serve Intel. Quite the contrary Intel depends on > the kernel to work nicely with it's hardware. Ergo, Intel should have > a vested interest to serve the kernel and take responsibility for it > as a whole. And so should you as an Intel employee. > > Just dumping your next half baken workaround does not cut it especially > not when it is not backed up by sensible arguments. > > Please try again, but not before you have something substantial to back > up your claims. That said, I can't resist the urge to say a few words to the responsible senior and management people at Intel in this context: I surely know that a lot of Intel people claim that their lack of progress is _only_ because Thomas is hard to work with and Thomas wants unreasonable changes to their code, which I could perceive as an abuse of myself for the purpose of self-deception. TBH, I don't give a damn. Let me ask a few questions instead: - Is it unreasonable to expect that argumentations are based on facts and proper analysis? - Is it unreasonable to expect a proper integration of a new feature? - Does it take unreasonable effort to do a proper design? - Is it unreasonable to ask that he necessary cleanups are done upfront? If anyone of the responsible people at Intel thinks so, then they should speak up now and tell me in public and into my face what's so unreasonable about that. Thanks, Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
Andi, On Sun, Oct 10 2021 at 15:11, Andi Kleen wrote: > On 10/9/2021 1:39 PM, Dan Williams wrote: >> I agree with you and Greg here. If a driver is accessing hardware >> resources outside of the bind lifetime of one of the devices it >> supports, and in a way that neither modrobe-policy nor >> device-authorization -policy infrastructure can block, that sounds >> like a bug report. > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO and > others) in init functions that also register drivers (thanks Elena for > the number) These numbers are completely useless simply because they are based on nonsensical criteria. See: https://lore.kernel.org/r/87r1cj2uad.ffs@tglx > My point is just that the ecosystem of devices that Linux supports is > messy enough that there are legitimate exceptions from the "First IO > only in probe call only" rule. Your point is based on your outright refusal to actualy do a proper analysis and your outright refusal to help fixing the real problems. All you have provided so far is handwaving based on a completely useless analysis. Sure, your goal is to get this TDX problem solved, but it's not going to be solved by: 1) Providing a nonsensical analysis 2) Using #1 as an argument to hack some half baken interfaces into the kernel which allow you to tick off your checkbox and then leave the resulting mess for others to clean up. Try again when you have factual data to back up your claims and factual arguments which prove that the problem can't be fixed otherwise. I might be repeating myself, but kernel development works this way: 1) Hack your private POC - Yay! 2) Sit down and think hard about the problems you identified in step #1. Do a thorough analysis. 3) Come up with a sensible integration plan. 4) Do the necessary grump work of cleanups all over the place 5) Add sensible infrastructure which is understandable for the bulk of kernel/driver developers 6) Let your feature fall in place and not in the way you are insisting on: 1) Hack your private POC - Yay! 2) Define that this is the only way to do it and try to shove it down the throat of everyone. 3) Getting told that this is not the way it works 4) Insist on it forever and blame the grumpy maintainers who are just not understanding the great value of your approach. 5) Go back to #2 You should know that already, but I have no problem to give that lecture to you over and over again. I probably should create a form letter. And no, you can bitch about me as much as you want. These are not my personal rules and personal pet pieves. These are rules Linus cares about very much and aside of that they just reflect common sense. The kernel is a common good and not the dump ground for your personal brain waste. The kernel does not serve Intel. Quite the contrary Intel depends on the kernel to work nicely with it's hardware. Ergo, Intel should have a vested interest to serve the kernel and take responsibility for it as a whole. And so should you as an Intel employee. Just dumping your next half baken workaround does not cut it especially not when it is not backed up by sensible arguments. Please try again, but not before you have something substantial to back up your claims. Thanks, Thomas ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Thu, Oct 14, 2021 at 12:33:49PM +, Reshetova, Elena wrote: > > On Thu, Oct 14, 2021 at 07:27:42AM +, Reshetova, Elena wrote: > > > > On Thu, Oct 14, 2021 at 06:32:32AM +, Reshetova, Elena wrote: > > > > > > On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > > > > > > > > The 5.15 tree has something like ~2.4k IO accesses (including > > > > > > > > MMIO and > > > > > > > > others) in init functions that also register drivers (thanks > > > > > > > > Elena for > > > > > > > > the number) > > > > > > > > > > > > > > To provide more numbers on this. What I can see so far from a > > > > > > > smatch- > > based > > > > > > > analysis, we have 409 __init style functions (.probe & > > > > > > > builtin/module_ > > > > > > > _platform_driver_probe excluded) for 5.15 with allyesconfig. > > > > > > > > > > > > I don't think we care about allyesconfig at all though. > > > > > > Just don't do that. > > > > > > How about allmodconfig? This is closer to what distros actually do. > > > > > > > > > > It does not make any difference really for the content of the > > > > > /drivers/*: > > > > > gives 408 __init style functions doing IO (.probe & builtin/module_ > > > > > > > _platform_driver_probe excluded) for 5.15 with allmodconfig: > > > > > > > > > > ['doc200x_ident_chip', > > > > > 'doc_probe', 'doc2001_init', 'mtd_speedtest_init', > > > > > 'mtd_nandbiterrs_init', 'mtd_oobtest_init', 'mtd_pagetest_init', > > > > > 'tort_init', 'mtd_subpagetest_init', 'fixup_pmc551', > > > > > 'doc_set_driver_info', 'init_amd76xrom', 'init_l440gx', > > > > > 'init_sc520cdp', 'init_ichxrom', 'init_ck804xrom', 'init_esb2rom', > > > > > 'probe_acpi_namespace_devices', 'amd_iommu_init_pci', 'state_next', > > > > > 'arm_v7s_do_selftests', 'arm_lpae_run_tests', 'init_iommu_one', > > > > > > > > Um. ARM? Which architecture is this build for? > > > > > > The list of smatch IO findings is built for x86, but the smatch cross > > > function > > > database covers all archs, so when queried for all potential function > > > callers, > > > it would show non x86 arch call chains also. > > > > > > Here is the original x86 finding and call chain for the > > > 'arm_v7s_do_selftests': > > > > > > Detected low-level IO from arm_v7s_do_selftests in fun > > > __iommu_queue_command_sync > > > > > > drivers/iommu/amd/iommu.c:1025 __iommu_queue_command_sync() error: > > > {15002074744551330002} > > > 'check_host_input' read from the host using function 'readl' to a > > > member of the structure 'iommu->cmd_buf_head'; > > > > > > __iommu_queue_command_sync() > > > iommu_completion_wait() > > > amd_iommu_domain_flush_complete() > > > iommu_v1_map_page() > > > arm_v7s_do_selftests() > > > > > > So, the results can be further filtered if you want a specified arch. > > > > So what is it just for x86? Could you tell? > > I can probably figure out how to do additional filtering here, but does > it really matter for the case that is being discussed here? Andi's point was > that there quite many existing places in the kernel when low-level IO > happens before the probe stage. So I brought these numbers here. > What do you plan to do with the pure x86 results? If the list is short - just suggest securing that ;) > And here are the full results for allyesconfig, if anyone is interested (just > got permission to create > the repository today): > https://github.com/intel/ccc-linux-guest-hardening/tree/master/audit/sample_output/5.15-rc1 > We will be pushing to this repo all the scripts and fuzzing setups we use as > part of > our Linux guest hardening effort for confidential cloud computing, but it is > going to take > some time to get all the approvals collected. > > Best Regards, > Elena. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
RE: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
Elena, On Thu, Oct 14 2021 at 06:32, Elena Reshetova wrote: >> On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > It does not make any difference really for the content of the /drivers/*: > gives 408 __init style functions doing IO (.probe & builtin/module_ >> > _platform_driver_probe excluded) for 5.15 with allmodconfig: > > ['doc200x_ident_chip', > 'doc_probe', 'doc2001_init', 'mtd_speedtest_init', > 'mtd_nandbiterrs_init', 'mtd_oobtest_init', 'mtd_pagetest_init', > 'tort_init', 'mtd_subpagetest_init', 'fixup_pmc551', > 'doc_set_driver_info', 'init_amd76xrom', 'init_l440gx', > 'init_sc520cdp', 'init_ichxrom', 'init_ck804xrom', 'init_esb2rom', > 'ubi_gluebi_init', 'ubiblock_init' > 'ubi_init', 'mtd_stresstest_init', All of this is MTD and can just be disabled wholesale. Aside of that, most of these depend on either platform devices or device tree enumerations which are not ever available on X86. > 'probe_acpi_namespace_devices', > 'amd_iommu_init_pci', 'state_next', > 'init_dmars', 'iommu_init_pci', 'early_amd_iommu_init', > 'late_iommu_features_init', 'detect_ivrs', > 'intel_prepare_irq_remapping', 'intel_enable_irq_remapping', > 'intel_cleanup_irq_remapping', 'detect_intel_iommu', > 'parse_ioapics_under_ir', 'si_domain_init', > 'intel_iommu_init', 'dmar_table_init', > 'enable_drhd_fault_handling', > 'check_tylersburg_isoch', None of this is reachable because the initial detection which is ACPI table based will fail for TDX. If not, it's a guest firmware problem. > 'fb_console_init', 'xenbus_probe_backend_init', > 'xenbus_probe_frontend_init', 'setup_vcpu_hotplug_event', > 'balloon_init', XEN, that's relevant because magically the TDX guest will assume that it is a XEN instance? > 'ostm_init_clksrc', 'ftm_clockevent_init', 'ftm_clocksource_init', > 'kona_timer_init', 'mtk_gpt_init', 'samsung_clockevent_init', > 'samsung_clocksource_init', 'sysctr_timer_init', 'mxs_timer_init', > 'sun4i_timer_init', 'at91sam926x_pit_dt_init', 'owl_timer_init', > 'sun5i_setup_clockevent', > 'mt7621_clk_init', > 'samsung_clk_register_mux', 'samsung_clk_register_gate', > 'samsung_clk_register_fixed_rate', 'clk_boston_setup', > 'gemini_cc_init', 'aspeed_ast2400_cc', 'aspeed_ast2500_cc', > 'sun6i_rtc_clk_init', 'phy_init', 'ingenic_ost_register_clock', > 'meson6_timer_init', 'atcpit100_timer_init', > 'npcm7xx_clocksource_init', 'clksrc_dbx500_prcmu_init', > 'rcar_sysc_pd_setup', 'r8a779a0_sysc_pd_setup', 'renesas_soc_init', > 'rcar_rst_init', 'rmobile_setup_pm_domain', 'mcp_write_pairing_set', > 'a72_b53_rac_enable_all', 'mcp_a72_b53_set', > 'brcmstb_soc_device_early_init', 'imx8mq_soc_revision', > 'imx8mm_soc_uid', 'imx8mm_soc_revision', 'qe_init', > 'exynos5x_clk_init', 'exynos5250_clk_init', 'exynos4_get_xom', > 'create_one_cmux', 'create_one_pll', 'p2041_init_periph', > 'p4080_init_periph', 'p5020_init_periph', 'p5040_init_periph', > 'r9a06g032_clocks_probe', 'r8a73a4_cpg_clocks_init', > 'sh73a0_cpg_clocks_init', 'cpg_div6_register', > 'r8a7740_cpg_clocks_init', 'cpg_mssr_register_mod_clk', > 'cpg_mssr_register_core_clk', 'rcar_gen3_cpg_clk_register', > 'cpg_sd_clk_register', 'r7s9210_update_clk_table', > 'rz_cpg_read_mode_pins', 'rz_cpg_clocks_init', > 'rcar_r8a779a0_cpg_clk_register', 'rcar_gen2_cpg_clk_register', > 'sun8i_a33_ccu_setup', 'sun8i_a23_ccu_setup', 'sun5i_ccu_init', > 'suniv_f1c100s_ccu_setup', 'sun6i_a31_ccu_setup', > 'sun8i_v3_v3s_ccu_init', 'sun50i_h616_ccu_setup', > 'sunxi_h3_h5_ccu_init', 'sun4i_ccu_init', 'kona_ccu_init', > 'ns2_genpll_scr_clk_init', 'ns2_genpll_sw_clk_init', > 'ns2_lcpll_ddr_clk_init', 'ns2_lcpll_ports_clk_init', > 'nsp_genpll_clk_init', 'nsp_lcpll0_clk_init', > 'cygnus_genpll_clk_init', 'cygnus_lcpll0_clk_init', > 'cygnus_mipipll_clk_init', 'cygnus_audiopll_clk_init', > 'of_fixed_mmio_clk_setup', > 'arm_v7s_do_selftests', 'arm_lpae_run_tests', 'init_iommu_one', ARM based drivers are initialized on x86 in which way? > 'hv_init_tsc_clocksource', 'hv_init_clocksource', HyperV. See XEN > 'skx_init', > 'i10nm_init', 'sbridge_init', 'i82975x_init', 'i3000_init', > 'x38_init', 'ie31200_init', 'i3200_init', 'amd64_edac_init', > 'pnd2_init', 'edac_init', 'adummy_init', EDAC has already hypervisor checks > 'init_acpi_pm_clocksource', Requires ACPI table entry or command line override > 'intel_rng_mod_init', Has an old style PCI table which is searched via pci_get_device(). Could do with a cleanup which converts it to proper PCI probing. So I stop here, because it would be way simpler to have the file names but so far I could identify all of it from the top of my head. So what are you trying to tell me? That you found tons of ioremaps in __init functions which are completely irrelevant. Please stop making arguments based on completely nonsensical data. It took me less than 5 minutes to eliminate more than 50% of that list and I'm pretty sure that I could have eliminated the bulk of the rest as well. The fact that a large part of this is ARM only, the
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Thu, Oct 14, 2021 at 07:27:42AM +, Reshetova, Elena wrote: > > On Thu, Oct 14, 2021 at 06:32:32AM +, Reshetova, Elena wrote: > > > > On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > > > > > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO > > > > > > and > > > > > > others) in init functions that also register drivers (thanks Elena > > > > > > for > > > > > > the number) > > > > > > > > > > To provide more numbers on this. What I can see so far from a > > > > > smatch-based > > > > > analysis, we have 409 __init style functions (.probe & builtin/module_ > > > > > _platform_driver_probe excluded) for 5.15 with allyesconfig. > > > > > > > > I don't think we care about allyesconfig at all though. > > > > Just don't do that. > > > > How about allmodconfig? This is closer to what distros actually do. > > > > > > It does not make any difference really for the content of the /drivers/*: > > > gives 408 __init style functions doing IO (.probe & builtin/module_ > > > > > _platform_driver_probe excluded) for 5.15 with allmodconfig: > > > > > > ['doc200x_ident_chip', > > > 'doc_probe', 'doc2001_init', 'mtd_speedtest_init', > > > 'mtd_nandbiterrs_init', 'mtd_oobtest_init', 'mtd_pagetest_init', > > > 'tort_init', 'mtd_subpagetest_init', 'fixup_pmc551', > > > 'doc_set_driver_info', 'init_amd76xrom', 'init_l440gx', > > > 'init_sc520cdp', 'init_ichxrom', 'init_ck804xrom', 'init_esb2rom', > > > 'probe_acpi_namespace_devices', 'amd_iommu_init_pci', 'state_next', > > > 'arm_v7s_do_selftests', 'arm_lpae_run_tests', 'init_iommu_one', > > > > Um. ARM? Which architecture is this build for? > > The list of smatch IO findings is built for x86, but the smatch cross function > database covers all archs, so when queried for all potential function callers, > it would show non x86 arch call chains also. > > Here is the original x86 finding and call chain for the > 'arm_v7s_do_selftests': > > Detected low-level IO from arm_v7s_do_selftests in fun > __iommu_queue_command_sync > > drivers/iommu/amd/iommu.c:1025 __iommu_queue_command_sync() error: > {15002074744551330002} > 'check_host_input' read from the host using function 'readl' to a > member of the structure 'iommu->cmd_buf_head'; > > __iommu_queue_command_sync() > iommu_completion_wait() > amd_iommu_domain_flush_complete() > iommu_v1_map_page() > arm_v7s_do_selftests() > > So, the results can be further filtered if you want a specified arch. Even better would be a typical distro build. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Thu, Oct 14, 2021 at 07:27:42AM +, Reshetova, Elena wrote: > > On Thu, Oct 14, 2021 at 06:32:32AM +, Reshetova, Elena wrote: > > > > On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > > > > > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO > > > > > > and > > > > > > others) in init functions that also register drivers (thanks Elena > > > > > > for > > > > > > the number) > > > > > > > > > > To provide more numbers on this. What I can see so far from a > > > > > smatch-based > > > > > analysis, we have 409 __init style functions (.probe & builtin/module_ > > > > > _platform_driver_probe excluded) for 5.15 with allyesconfig. > > > > > > > > I don't think we care about allyesconfig at all though. > > > > Just don't do that. > > > > How about allmodconfig? This is closer to what distros actually do. > > > > > > It does not make any difference really for the content of the /drivers/*: > > > gives 408 __init style functions doing IO (.probe & builtin/module_ > > > > > _platform_driver_probe excluded) for 5.15 with allmodconfig: > > > > > > ['doc200x_ident_chip', > > > 'doc_probe', 'doc2001_init', 'mtd_speedtest_init', > > > 'mtd_nandbiterrs_init', 'mtd_oobtest_init', 'mtd_pagetest_init', > > > 'tort_init', 'mtd_subpagetest_init', 'fixup_pmc551', > > > 'doc_set_driver_info', 'init_amd76xrom', 'init_l440gx', > > > 'init_sc520cdp', 'init_ichxrom', 'init_ck804xrom', 'init_esb2rom', > > > 'probe_acpi_namespace_devices', 'amd_iommu_init_pci', 'state_next', > > > 'arm_v7s_do_selftests', 'arm_lpae_run_tests', 'init_iommu_one', > > > > Um. ARM? Which architecture is this build for? > > The list of smatch IO findings is built for x86, but the smatch cross function > database covers all archs, so when queried for all potential function callers, > it would show non x86 arch call chains also. > > Here is the original x86 finding and call chain for the > 'arm_v7s_do_selftests': > > Detected low-level IO from arm_v7s_do_selftests in fun > __iommu_queue_command_sync > > drivers/iommu/amd/iommu.c:1025 __iommu_queue_command_sync() error: > {15002074744551330002} > 'check_host_input' read from the host using function 'readl' to a > member of the structure 'iommu->cmd_buf_head'; > > __iommu_queue_command_sync() > iommu_completion_wait() > amd_iommu_domain_flush_complete() > iommu_v1_map_page() > arm_v7s_do_selftests() > > So, the results can be further filtered if you want a specified arch. So what is it just for x86? Could you tell? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Thu, Oct 14, 2021 at 06:32:32AM +, Reshetova, Elena wrote: > > On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > > > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO and > > > > others) in init functions that also register drivers (thanks Elena for > > > > the number) > > > > > > To provide more numbers on this. What I can see so far from a smatch-based > > > analysis, we have 409 __init style functions (.probe & builtin/module_ > > > _platform_driver_probe excluded) for 5.15 with allyesconfig. > > > > I don't think we care about allyesconfig at all though. > > Just don't do that. > > How about allmodconfig? This is closer to what distros actually do. > > It does not make any difference really for the content of the /drivers/*: > gives 408 __init style functions doing IO (.probe & builtin/module_ > > > _platform_driver_probe excluded) for 5.15 with allmodconfig: > > ['doc200x_ident_chip', > 'doc_probe', 'doc2001_init', 'mtd_speedtest_init', > 'mtd_nandbiterrs_init', 'mtd_oobtest_init', 'mtd_pagetest_init', > 'tort_init', 'mtd_subpagetest_init', 'fixup_pmc551', > 'doc_set_driver_info', 'init_amd76xrom', 'init_l440gx', > 'init_sc520cdp', 'init_ichxrom', 'init_ck804xrom', 'init_esb2rom', > 'probe_acpi_namespace_devices', 'amd_iommu_init_pci', 'state_next', > 'arm_v7s_do_selftests', 'arm_lpae_run_tests', 'init_iommu_one', Um. ARM? Which architecture is this build for? > 'init_dmars', 'iommu_init_pci', 'early_amd_iommu_init', > 'late_iommu_features_init', 'detect_ivrs', > 'intel_prepare_irq_remapping', 'intel_enable_irq_remapping', > 'intel_cleanup_irq_remapping', 'detect_intel_iommu', > 'parse_ioapics_under_ir', 'si_domain_init', 'ubi_init', > 'fb_console_init', 'xenbus_probe_backend_init', > 'xenbus_probe_frontend_init', 'setup_vcpu_hotplug_event', > 'balloon_init', 'intel_iommu_init', 'intel_rng_mod_init', > 'check_tylersburg_isoch', 'dmar_table_init', > 'enable_drhd_fault_handling', 'init_acpi_pm_clocksource', > 'ostm_init_clksrc', 'ftm_clockevent_init', 'ftm_clocksource_init', > 'kona_timer_init', 'mtk_gpt_init', 'samsung_clockevent_init', > 'samsung_clocksource_init', 'sysctr_timer_init', 'mxs_timer_init', > 'sun4i_timer_init', 'at91sam926x_pit_dt_init', 'owl_timer_init', > 'sun5i_setup_clockevent', 'ubi_gluebi_init', 'ubiblock_init', > 'hv_init_tsc_clocksource', 'hv_init_clocksource', 'mt7621_clk_init', > 'samsung_clk_register_mux', 'samsung_clk_register_gate', > 'samsung_clk_register_fixed_rate', 'clk_boston_setup', > 'gemini_cc_init', 'aspeed_ast2400_cc', 'aspeed_ast2500_cc', > 'sun6i_rtc_clk_init', 'phy_init', 'ingenic_ost_register_clock', > 'meson6_timer_init', 'atcpit100_timer_init', > 'npcm7xx_clocksource_init', 'clksrc_dbx500_prcmu_init', 'skx_init', > 'i10nm_init', 'sbridge_init', 'i82975x_init', 'i3000_init', > 'x38_init', 'ie31200_init', 'i3200_init', 'amd64_edac_init', > 'pnd2_init', 'edac_init', 'adummy_init', 'mtd_stresstest_init', > 'bxt_idle_state_table_update', 'sklh_idle_state_table_update', > 'skx_idle_state_table_update', > 'acpi_gpio_handle_deferred_request_irqs', 'smc_findirq', 'ltpc_probe', > 'com90io_probe', 'com90xx_probe', 'pcnet32_init_module', > 'it87_gpio_init', 'f7188x_find', 'it8712f_wdt_find', 'f71808e_find', > 'it87_wdt_init', 'f71882fg_find', 'it87_find', 'f71805f_find', > 'parport_pc_init', 'asic3_irq_probe', 'sch311x_detect', > 'amd_gpio_init', 'dvb_init', 'dvb_register', 'em28xx_alsa_register', > 'em28xx_dvb_register', 'em28xx_rc_register', 'em28xx_video_register', > 'blackbird_init', 'bttv_check_chipset', 'ivtvfb_callback_init', > 'init_control', 'con_init', 'cr_pll_init', > 'clk_disable_unused_subtree', 'fmi_init', 'cadet_init', 'pcm20_init', > 'airo_init_module', 'bnx2i_mod_init', 'bnx2fc_mod_init', > 'timer_of_irq_exit', 'init', 'kempld_init', 'ivtvfb_init', > 'brcmf_core_init', 'comedi_test_init', 'tlan_eisa_probe', > 'timer_probe', 'of_clk_init', '__reserved_mem_init_node', > 'of_irq_init', 'mace_init', 'vortex_eisa_init', 'reset_chip', > 'atp_init', 'atp_probe1', 'smc_probe', 'osi_setup', 'led_init', > 'el3_init_module', 'clk_sp810_of_setup', 'ltpc_probe_dma', > 'com90io_found', 'check_mirror', 'arcrimi_found', 'com90xx_found', > 'intel_soc_thermal_init', 'thermal_register_governors', > 'thermal_unregister_governors', 'therm_lvt_init', 'tcc_cooling_init', > 'powerclamp_probe', 'intel_init', 'qcom_geni_serial_earlycon_setup', > 'kgdboc_early_init', 'lpuart_console_setup', 'speakup_init', > 'early_console_setup', 'init_port', 'early_serial8250_setup', > 'linflex_console_setup', 'register_earlycon', 'of_setup_earlycon', > 'slgt_init', 'moxa_init', 'parport_pc_init_superio', > 'parport_pc_find_ports', 'mousedev_init', 'ses_init', 'riocm_init', > 'efi_rci2_sysfs_init', 'blogic_probe', 'blogic_init', > 'blogic_init_mm_probeinfo', 'blogic_init_probeinfo_list', > 'blogic_checkadapter', 'blogic_rdconfig', 'blogic_inquiry', > 'adpt_init', 'clk_unprepare_unused_subtree', 'aspeed_socinfo_init', > 'rcar_sysc_pd_setup',
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 2:28 PM Andi Kleen wrote: [..] > >> But how do you debug the kernel then? Making early undebuggable seems > >> just bad policy to me. > > I am not proposing making the early undebuggable. > > > That's the implication of moving the policy into initrd. > > > If only initrd can authorize then it won't be possible to authorize > before initrd, thus the early console won't work. Again, the proposal is that the allow-list is limited to just enough devices to startup and debug the initramfs and no more. Everything else can be dynamic, and this allows for a powerful custom override interface without needing to debate additional ABI like command line overrides, and minimizes future changes to this kernel-internal allow-list. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
But that was due to performance problems in hot paths. Nothing of this applies here. It applies because a new API that individual driver authors is being proposed and that's an ongoing maintenance burden that might be mitigated by hiding that implementation detail from leaf drivers. Right now we're only talking about 2 places to change, and none of those are actually in individual drivers, but in the virtio generic code and in the MSI code. While there might be drivers in the future that do it directly it will be always the exception, normal drivers don't have to deal with this. For me it both seems very straight forward and simple (but then I'm biased) You seem to be having a difficult time iterating this proposal toward consensus. I don't think the base principles are being contested as much as the semantics, scope, and need for the proposed API that is in the purview of all leaf driver developers. Right now there is no leaf driver changed at all. I'd rather see more concerted efforts focused/limited core changes rather than leaf driver changes until there is a clearer definition of hardened. A hardened driver is a driver that - Had similar security (not API) oriented review of its IO operations (mainly MMIO access, but also PCI config space) as a non privileged user interface (like a ioctl). That review should be focused on memory safety. - Had some fuzzing on these IO interfaces using to be released tools. What is the intersection of ioremap() users that are outside of the proposed probe authorization regime AND want confidential computing support? Right now it's zero I believe. That is there is other low level code that sets memory shared, but it's not using ioremap, but some other mechanisms. are needed for booting. For example in TDX we can't print something to the console with this mechanism, so you would never get any output before the initrd. Just seems like a nightmare for debugging anything. There really needs to be an authorization mechanism that works reasonably early. I can see a point of having user space overrides though, but we need to have a sane kernel default that works early. Right, as I suggested [1], just enough early authorization to bootstrap/debug initramfs and then that can authorize the remainder. But how do you debug the kernel then? Making early undebuggable seems just bad policy to me. I am not proposing making the early undebuggable. That's the implication of moving the policy into initrd. If only initrd can authorize then it won't be possible to authorize before initrd, thus the early console won't work. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/12/2021 2:18 PM, Michael S. Tsirkin wrote: On Tue, Oct 12, 2021 at 02:14:44PM -0700, Dan Williams wrote: Especially in this case where the virtio use case being opted-in is *already* in a path that has been authorized by the device-filter policy engine. That's a good point. Andi, how about setting a per-device flag if its ID has been allowed and then making pci_iomap create a shared mapping transparently? Yes for pci_iomap we could do that. If someone uses raw ioremap without a device it won't work, but I don't think that's the case for virtio at least. I suppose we could solve that problem if it actually happens. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 02:14:44PM -0700, Dan Williams wrote: > Especially in this case where the virtio use case being > opted-in is *already* in a path that has been authorized by the > device-filter policy engine. That's a good point. Andi, how about setting a per-device flag if its ID has been allowed and then making pci_iomap create a shared mapping transparently? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 11:35 AM Andi Kleen wrote: > > > > The "better safe-than-sorry" argument is hard to build consensus > > around. The spectre mitigations ran into similar problems where the > > community rightly wanted to see the details and instrument the > > problematic paths rather than blanket sprinkle lfence "just to be > > safe". > > But that was due to performance problems in hot paths. Nothing of this > applies here. It applies because a new API that individual driver authors is being proposed and that's an ongoing maintenance burden that might be mitigated by hiding that implementation detail from leaf drivers. > > > In this case the rules about when a driver is suitably > > "hardened" are vague and the overlapping policy engines are confusing. > > What is confusing exactly? Multiple places to go to enable functionality. The device-filter firewall policy can collide with the ioremap access control policy. > For me it both seems very straight forward and simple (but then I'm biased) You seem to be having a difficult time iterating this proposal toward consensus. I don't think the base principles are being contested as much as the semantics, scope, and need for the proposed API that is in the purview of all leaf driver developers. > The policy is: > > - Have an allow list at driver registration. > > - Have an additional opt-in for MMIO mappings (and potentially config > space, but that's not currently there) to cover init calls completely. The proliferation of policy engines and driver special casing is the issue. Especially in this case where the virtio use case being opted-in is *already* in a path that has been authorized by the device-filter policy engine. I.e. why special case the ioremap() in virtio to be additionally authorized when the device has already been authorized to probe? Put another way, the easiest driver API change to merge would be no additional changes in leaf drivers. > > > > > I'd rather see more concerted efforts focused/limited core changes > > rather than leaf driver changes until there is a clearer definition of > > hardened. > > A hardened driver is a driver that > > - Had similar security (not API) oriented review of its IO operations > (mainly MMIO access, but also PCI config space) as a non privileged user > interface (like a ioctl). That review should be focused on memory safety. > > - Had some fuzzing on these IO interfaces using to be released tools. What is the intersection of ioremap() users that are outside of the proposed probe authorization regime AND want confidential computing support? > Right now it's only three virtio drivers (console, net, block) > > Really it's no different than what we do for every new unprivileged user > interface. > > > > I.e. instead of jumping to the assertion that fixing up > > these init-path vulnerabilities are too big to fix, dig to the next > > level to provide more evidence that per-driver opt-in is the only > > viable option. > > > > For example, how many of these problematic paths are built-in to the > > average kernel config? > > I don't think arguments from "the average kernel config" (if such a > thing even exists) are useful. That's would be just hand waving. I'm trying to bridge to your contention that this enabling can not rely on custom kernel configs and must offer protection on the same kernel image that might ship in the host, but lets set this aside and focus on when and where leaf drivers need to adopt a new API. > > A strawman might be to add a sprinkling error > > exits in the module_init() of the problematic drivers, and only fail > > if the module is built-in, and let modprobe policy handle the rest. > > > That would be already hundreds of changes. I have no idea how such a > thing could be maintained or sustained either. > > Really I don't even see how these alternatives can be considered. Tree > sweeps should always be last resort. They're a pain for everyone. But > here they're casually thrown around as alternatives to straight forward > one or two line changes. If it looked straightforward I'm not sure we would be having this discussion, I think it's reasonable to ask if this is a per-driver opt-in responsibility that must be added in addition to probe authorization. > >> Default policy in user space just seems to be a bad idea here. Who > >> should know if a driver is hardened other than the kernel? Maintaining > >> the list somewhere else just doesn't make sense to me. > > I do not understand the maintenance burden correlation of where the > > policy is driven vs where the list is maintained? > > All the hardening and auditing happens in the kernel tree. So it seems > the natural place to store the result is in the kernel tree. > > But there's no single package for initrd, so you would need custom > configurations for all the supported distros. > > Also we're really arguing about a list that currently has three entries. > > > > Even if I agreed > > with the contention that
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 06:36:16PM +, Reshetova, Elena wrote: > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO and > > others) in init functions that also register drivers (thanks Elena for > > the number) > > To provide more numbers on this. What I can see so far from a smatch-based > analysis, we have 409 __init style functions (.probe & builtin/module_ > _platform_driver_probe excluded) for 5.15 with allyesconfig. I don't think we care about allyesconfig at all though. Just don't do that. How about allmodconfig? This is closer to what distros actually do. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/12/2021 12:13 PM, Dan Williams wrote: On Tue, Oct 12, 2021 at 11:57 AM Reshetova, Elena wrote: I suspect the true number is even higher because that doesn't include IO inside calls to other modules and indirect pointers, correct? Actually everything should be included. Smatch has cross-function db and I am using it for getting the call chains and it follows function pointers. Also since I am starting from a list of individual read IOs, every single base read IO in drivers/* should be covered as far as I can see. But if it uses some weird IO wrappers then the actual list might be higher. Why analyze individual IO calls? I thought the goal here was to disable entire classes of ioremap() users? This is everything that would need to be moved somewhere else if we didn't disable the entire classes of ioremap users. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Tue, Oct 12, 2021 at 11:57 AM Reshetova, Elena wrote: > > > > I suspect the true number is even higher because that doesn't include IO > > inside calls to other modules and indirect pointers, correct? > > Actually everything should be included. Smatch has cross-function db and > I am using it for getting the call chains and it follows function pointers. > Also since I am starting from a list of individual read IOs, every single > base read IO in drivers/* should be covered as far as I can see. But if it > uses > some weird IO wrappers then the actual list might be higher. Why analyze individual IO calls? I thought the goal here was to disable entire classes of ioremap() users? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/12/2021 11:36 AM, Reshetova, Elena wrote: The 5.15 tree has something like ~2.4k IO accesses (including MMIO and others) in init functions that also register drivers (thanks Elena for the number) To provide more numbers on this. What I can see so far from a smatch-based analysis, we have 409 __init style functions (.probe & builtin/module_ _platform_driver_probe excluded) for 5.15 with allyesconfig. The number of distinct individual IO reads (MSRs included) is much higher than 2.4k and on the range of 30k because quite often there are more than a single IO read in the same source function. The full list of accesses and the possible call paths is huge for manually looking at, but here is the list of the 409 functions if anyone wants to take a look: Thanks Elena. I suspect the true number is even higher because that doesn't include IO inside calls to other modules and indirect pointers, correct? -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/11/2021 10:31 PM, Christoph Hellwig wrote: On Mon, Oct 11, 2021 at 03:09:09PM -0400, Michael S. Tsirkin wrote: The reason we have trouble is that it's not clear what does the API mean outside the realm of TDX. If we really, truly want an API that says "ioremap and it's a hardened driver" then I guess ioremap_hardened_driver is what you want. Yes. And why would be we ioremap the BIOS anyway? It is not I/O memory in any of the senses we generally use ioremap for. I/O memory is anything outside the kernel memory map. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
The "better safe-than-sorry" argument is hard to build consensus around. The spectre mitigations ran into similar problems where the community rightly wanted to see the details and instrument the problematic paths rather than blanket sprinkle lfence "just to be safe". But that was due to performance problems in hot paths. Nothing of this applies here. In this case the rules about when a driver is suitably "hardened" are vague and the overlapping policy engines are confusing. What is confusing exactly? For me it both seems very straight forward and simple (but then I'm biased) The policy is: - Have an allow list at driver registration. - Have an additional opt-in for MMIO mappings (and potentially config space, but that's not currently there) to cover init calls completely. I'd rather see more concerted efforts focused/limited core changes rather than leaf driver changes until there is a clearer definition of hardened. A hardened driver is a driver that - Had similar security (not API) oriented review of its IO operations (mainly MMIO access, but also PCI config space) as a non privileged user interface (like a ioctl). That review should be focused on memory safety. - Had some fuzzing on these IO interfaces using to be released tools. Right now it's only three virtio drivers (console, net, block) Really it's no different than what we do for every new unprivileged user interface. I.e. instead of jumping to the assertion that fixing up these init-path vulnerabilities are too big to fix, dig to the next level to provide more evidence that per-driver opt-in is the only viable option. For example, how many of these problematic paths are built-in to the average kernel config? I don't think arguments from "the average kernel config" (if such a thing even exists) are useful. That's would be just hand waving. A strawman might be to add a sprinkling error exits in the module_init() of the problematic drivers, and only fail if the module is built-in, and let modprobe policy handle the rest. That would be already hundreds of changes. I have no idea how such a thing could be maintained or sustained either. Really I don't even see how these alternatives can be considered. Tree sweeps should always be last resort. They're a pain for everyone. But here they're casually thrown around as alternatives to straight forward one or two line changes. Default policy in user space just seems to be a bad idea here. Who should know if a driver is hardened other than the kernel? Maintaining the list somewhere else just doesn't make sense to me. I do not understand the maintenance burden correlation of where the policy is driven vs where the list is maintained? All the hardening and auditing happens in the kernel tree. So it seems the natural place to store the result is in the kernel tree. But there's no single package for initrd, so you would need custom configurations for all the supported distros. Also we're really arguing about a list that currently has three entries. Even if I agreed with the contention that out-of-tree userspace would have a hard time tracking the "hardened" driver list there is still an in-tree userspace path to explore. E.g. perf maintains lists of things tightly coupled to the kernel, this authorized device list seems to be in the same category of data. You mean the event list? perf is in the kernel tree, so it's maintained together with the kernel. But we don't have a kernel initrd. Also there is the more practical problem that some devices are needed for booting. For example in TDX we can't print something to the console with this mechanism, so you would never get any output before the initrd. Just seems like a nightmare for debugging anything. There really needs to be an authorization mechanism that works reasonably early. I can see a point of having user space overrides though, but we need to have a sane kernel default that works early. Right, as I suggested [1], just enough early authorization to bootstrap/debug initramfs and then that can authorize the remainder. But how do you debug the kernel then? Making early undebuggable seems just bad policy to me. And if you fix if for the console why not add the two more entries for virtio net and block too? -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Sun, Oct 10, 2021 at 3:11 PM Andi Kleen wrote: > > > On 10/9/2021 1:39 PM, Dan Williams wrote: > > On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin wrote: > >> On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote: > >>> From: Andi Kleen > >>> > >>> For Confidential VM guests like TDX, the host is untrusted and hence > >>> the devices emulated by the host or any data coming from the host > >>> cannot be trusted. So the drivers that interact with the outside world > >>> have to be hardened by sharing memory with host on need basis > >>> with proper hardening fixes. > >>> > >>> For the PCI driver case, to share the memory with the host add > >>> pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. > >>> > >>> Signed-off-by: Andi Kleen > >>> Signed-off-by: Kuppuswamy Sathyanarayanan > >>> > >> So I proposed to make all pci mappings shared, eliminating the need > >> to patch drivers. > >> > >> To which Andi replied > >> One problem with removing the ioremap opt-in is that > >> it's still possible for drivers to get at devices without going > >> through probe. > >> > >> To which Greg replied: > >> https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > >> If there are in-kernel PCI drivers that do not do this, they need > >> to be > >> fixed today. > >> > >> Can you guys resolve the differences here? > > I agree with you and Greg here. If a driver is accessing hardware > > resources outside of the bind lifetime of one of the devices it > > supports, and in a way that neither modrobe-policy nor > > device-authorization -policy infrastructure can block, that sounds > > like a bug report. > > The 5.15 tree has something like ~2.4k IO accesses (including MMIO and > others) in init functions that also register drivers (thanks Elena for > the number) > > Some are probably old drivers that could be fixed, but it's quite a few > legitimate cases. For example for platform or ISA drivers that's the > only way they can be implemented because they often have no other > enumeration mechanism. For PCI drivers it's rarer, but also still can > happen. One example that comes to mind here is the x86 Intel uncore > drivers, which support a mix of MSR, ioremap and PCI config space > accesses all from the same driver. This particular example can (and > should be) fixed in other ways, but similar things also happen in other > drivers, and they're not all broken. Even for the broken ones they're > usually for some crufty old devices that has very few users, so it's > likely untestable in practice. > > My point is just that the ecosystem of devices that Linux supports is > messy enough that there are legitimate exceptions from the "First IO > only in probe call only" rule. > > And we can't just fix them all. Even if we could it would be hard to > maintain. > > Using a "firewall model" hooking into a few strategic points like we're > proposing here is much saner for everyone. > > Now we can argue about the details. Right now what we're proposing has > some redundancies: it has both a device model filter and low level > filter for ioremap (this patch and some others). The low level filter is > for catching issues that don't clearly fit into the > "enumeration<->probe" model. You could call that redundant, but I would > call it defense in depth or better safe than sorry. In theory it would > be enough to have the low level opt-in only, but that would have the > drawback that is something gets enumerated after all you would have all > kind of weird device driver failures and in some cases even killed > guests. So I think it makes sense to have The "better safe-than-sorry" argument is hard to build consensus around. The spectre mitigations ran into similar problems where the community rightly wanted to see the details and instrument the problematic paths rather than blanket sprinkle lfence "just to be safe". In this case the rules about when a driver is suitably "hardened" are vague and the overlapping policy engines are confusing. I'd rather see more concerted efforts focused/limited core changes rather than leaf driver changes until there is a clearer definition of hardened. I.e. instead of jumping to the assertion that fixing up these init-path vulnerabilities are too big to fix, dig to the next level to provide more evidence that per-driver opt-in is the only viable option. For example, how many of these problematic paths are built-in to the average kernel config? A strawman might be to add a sprinkling error exits in the module_init() of the problematic drivers, and only fail if the module is built-in, and let modprobe policy handle the rest. > > > > Fix those drivers instead of sprinkling > > ioremap_shared in select places and with unclear rules about when a > > driver is allowed to do "shared" mappings. > > Only add it when the driver has been audited and hardened. > > But I agree we need on a documented process for this. I will work on > some
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 11, 2021 at 03:09:09PM -0400, Michael S. Tsirkin wrote: > The reason we have trouble is that it's not clear what does the API mean > outside the realm of TDX. > If we really, truly want an API that says "ioremap and it's a hardened > driver" then I guess ioremap_hardened_driver is what you want. Yes. And why would be we ioremap the BIOS anyway? It is not I/O memory in any of the senses we generally use ioremap for. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 11, 2021 at 10:23:00AM -0700, Andi Kleen wrote: > > On 10/11/2021 12:58 AM, Christoph Hellwig wrote: > > Just as last time: This does not make any sense. ioremap is shared > > by definition. > > It's not necessarily shared with the host for confidential computing: for > example BIOS mappings definitely should not be shared, but they're using > ioremap today. That just needs to be fixed. > But if you have a better term please propose something. I tried to clarify > it with "shared_host", but I don't know a better term. > > > -Andi > The reason we have trouble is that it's not clear what does the API mean outside the realm of TDX. If we really, truly want an API that says "ioremap and it's a hardened driver" then I guess ioremap_hardened_driver is what you want. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Mon, Oct 11, 2021 at 10:32:23AM -0700, Andi Kleen wrote: > > > Because it does not end with I/O operations, that's a trivial example. > > module unloading is famous for being racy: I just re-read that part of > > virtio drivers and sure enough we have bugs there, this is after > > they have presumably been audited, so a TDX guest is better off > > just disabling hot-unplug completely, and hotplug isn't far behind. > > These all shouldn't matter for a confidential guest. The only way it can be > attacked is through IO, everything else is protected by hardware. > > > Also it would all require doing something at the guest level, which we > assume is not malicious. > > > > Malicious filesystems can exploit many linux systems unless > > you take pains to limit what is mounted and how. > > That's expected to be handled by authenticated dmcrypt and similar. > Hardening at this level has been done for many years. It's possible to do it like this, sure. But that's not the only configuration, userspace needs to be smart about setting things up. Which is my point really. > > > Networking devices tend to get into the default namespaces and can > > do more or less whatever CAP_NET_ADMIN can. > > Etc. > > > Networking should be already hardened, otherwise you would have much worse > problems today. Same thing. NFS is pretty common, you are saying don't do it then. Fair enough but again, arbitrary configs just aren't going to be secure. > > > > hange in your subsystem here. > > Well I commented on the API patch, not the virtio patch. > > If it's a way for a driver to say "I am hardened > > and audited" then I guess it should at least say so. > > > This is handled by the central allow list. We intentionally didn't want each > driver to declare itself, but have a central list where changes will get > more scrutiny than random driver code. Makes sense. Additionally, distros can tweak that to their heart's content, selecting the functionality/security balance that makes sense for them. > But then there are the additional opt-ins for the low level firewall. These > are in the API. I don't see how it could be done at the driver level, unless > you want to pass in a struct device everywhere? I am just saying don't do it then. Don't build drivers that distro does not want to support into kernel. And don't load them when they are modules. > > > > How about creating a defconfig that makes sense for TDX then? > > > TDX can be used in many different ways, I don't think a defconfig is > > > practical. > > > > > > In theory you could do some Kconfig dependency (at the pain point of > > > having > > > separate kernel binariees), but why not just do it at run time then if you > > > maintain the list anyways. That's much easier and saner for everyone. In > > > the > > > past we usually always ended up with runtime mechanism for similar things > > > anyways. > > > > > > Also it turns out that the filter mechanisms are needed for some arch > > > drivers which are not even configurable, so alone it's probably not > > > enough, > > > > I guess they aren't really needed though right, or you won't try to > > filter them? > > We're addressing most of them with the device filter for platform drivers. > But since we cannot stop them doing ioremap IO in their init code they also > need the low level firewall. > > Some others that cannot be addressed have explicit disables. > > > > So make them configurable? > > Why not just fix the runtime? It's much saner for everyone. Proposing to do > things at build time sounds like we're in Linux 0.99 days. > > -Andi Um. Tweaking driver code is not just build time, it's development time. At least with kconfig you don't need to patch your kernel. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
Because it does not end with I/O operations, that's a trivial example. module unloading is famous for being racy: I just re-read that part of virtio drivers and sure enough we have bugs there, this is after they have presumably been audited, so a TDX guest is better off just disabling hot-unplug completely, and hotplug isn't far behind. These all shouldn't matter for a confidential guest. The only way it can be attacked is through IO, everything else is protected by hardware. Also it would all require doing something at the guest level, which we assume is not malicious. Malicious filesystems can exploit many linux systems unless you take pains to limit what is mounted and how. That's expected to be handled by authenticated dmcrypt and similar. Hardening at this level has been done for many years. Networking devices tend to get into the default namespaces and can do more or less whatever CAP_NET_ADMIN can. Etc. Networking should be already hardened, otherwise you would have much worse problems today. hange in your subsystem here. Well I commented on the API patch, not the virtio patch. If it's a way for a driver to say "I am hardened and audited" then I guess it should at least say so. This is handled by the central allow list. We intentionally didn't want each driver to declare itself, but have a central list where changes will get more scrutiny than random driver code. But then there are the additional opt-ins for the low level firewall. These are in the API. I don't see how it could be done at the driver level, unless you want to pass in a struct device everywhere? How about creating a defconfig that makes sense for TDX then? TDX can be used in many different ways, I don't think a defconfig is practical. In theory you could do some Kconfig dependency (at the pain point of having separate kernel binariees), but why not just do it at run time then if you maintain the list anyways. That's much easier and saner for everyone. In the past we usually always ended up with runtime mechanism for similar things anyways. Also it turns out that the filter mechanisms are needed for some arch drivers which are not even configurable, so alone it's probably not enough, I guess they aren't really needed though right, or you won't try to filter them? We're addressing most of them with the device filter for platform drivers. But since we cannot stop them doing ioremap IO in their init code they also need the low level firewall. Some others that cannot be addressed have explicit disables. So make them configurable? Why not just fix the runtime? It's much saner for everyone. Proposing to do things at build time sounds like we're in Linux 0.99 days. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/11/2021 12:58 AM, Christoph Hellwig wrote: Just as last time: This does not make any sense. ioremap is shared by definition. It's not necessarily shared with the host for confidential computing: for example BIOS mappings definitely should not be shared, but they're using ioremap today. But if you have a better term please propose something. I tried to clarify it with "shared_host", but I don't know a better term. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Sun, Oct 10, 2021 at 03:22:39PM -0700, Andi Kleen wrote: > > > To which Andi replied > > One problem with removing the ioremap opt-in is that > > it's still possible for drivers to get at devices without going through > > probe. > > > > To which Greg replied: > > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > > If there are in-kernel PCI drivers that do not do this, they need to be > > fixed today. > > > > Can you guys resolve the differences here? > > > I addressed this in my other mail, but we may need more discussion. Hopefully Greg will reply to that one. > > > > > And once they are resolved, mention this in the commit log so > > I don't get to re-read the series just to find out nothing > > changed in this respect? > > > > I frankly do not believe we are anywhere near being able to harden > > an arbitrary kernel config against attack. > > Why not? Device filter and the opt-ins together are a fairly strong > mechanism. Because it does not end with I/O operations, that's a trivial example. module unloading is famous for being racy: I just re-read that part of virtio drivers and sure enough we have bugs there, this is after they have presumably been audited, so a TDX guest is better off just disabling hot-unplug completely, and hotplug isn't far behind. Malicious filesystems can exploit many linux systems unless you take pains to limit what is mounted and how. Networking devices tend to get into the default namespaces and can do more or less whatever CAP_NET_ADMIN can. Etc. I am not saying this makes the effort worthless, I am saying userspace better know very well what it's doing, and kernel better be configured in a very specific way. > And it's not that they're a lot of code or super complicated either. > > You're essentially objecting to a single line change in your subsystem here. Well I commented on the API patch, not the virtio patch. If it's a way for a driver to say "I am hardened and audited" then I guess it should at least say so. It has nothing to do with host or sharing, that's an implementation detail, and it obscures the actual limitations of the approach, in that eventually in an ideal world all drivers would be secure and use this API. Yes, if that's the API that PCI gains then virtio will use it. > > How about creating a defconfig that makes sense for TDX then? > > TDX can be used in many different ways, I don't think a defconfig is > practical. > > In theory you could do some Kconfig dependency (at the pain point of having > separate kernel binariees), but why not just do it at run time then if you > maintain the list anyways. That's much easier and saner for everyone. In the > past we usually always ended up with runtime mechanism for similar things > anyways. > > Also it turns out that the filter mechanisms are needed for some arch > drivers which are not even configurable, so alone it's probably not enough, I guess they aren't really needed though right, or you won't try to filter them? So make them configurable? > > > Anyone deviating from that better know what they are doing, > > this API tweaking is just putting policy into the kernel ... > > Hardening drivers is kernel policy. It cannot be done anywhere else. > > > -Andi To clarify, the policy is which drivers to load into the kernel. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
Just as last time: This does not make any sense. ioremap is shared by definition. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
To which Andi replied One problem with removing the ioremap opt-in is that it's still possible for drivers to get at devices without going through probe. To which Greg replied: https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ If there are in-kernel PCI drivers that do not do this, they need to be fixed today. Can you guys resolve the differences here? I addressed this in my other mail, but we may need more discussion. And once they are resolved, mention this in the commit log so I don't get to re-read the series just to find out nothing changed in this respect? I frankly do not believe we are anywhere near being able to harden an arbitrary kernel config against attack. Why not? Device filter and the opt-ins together are a fairly strong mechanism. And it's not that they're a lot of code or super complicated either. You're essentially objecting to a single line change in your subsystem here. How about creating a defconfig that makes sense for TDX then? TDX can be used in many different ways, I don't think a defconfig is practical. In theory you could do some Kconfig dependency (at the pain point of having separate kernel binariees), but why not just do it at run time then if you maintain the list anyways. That's much easier and saner for everyone. In the past we usually always ended up with runtime mechanism for similar things anyways. Also it turns out that the filter mechanisms are needed for some arch drivers which are not even configurable, so alone it's probably not enough, Anyone deviating from that better know what they are doing, this API tweaking is just putting policy into the kernel ... Hardening drivers is kernel policy. It cannot be done anywhere else. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On 10/9/2021 1:39 PM, Dan Williams wrote: On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin wrote: On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote: From: Andi Kleen For Confidential VM guests like TDX, the host is untrusted and hence the devices emulated by the host or any data coming from the host cannot be trusted. So the drivers that interact with the outside world have to be hardened by sharing memory with host on need basis with proper hardening fixes. For the PCI driver case, to share the memory with the host add pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. Signed-off-by: Andi Kleen Signed-off-by: Kuppuswamy Sathyanarayanan So I proposed to make all pci mappings shared, eliminating the need to patch drivers. To which Andi replied One problem with removing the ioremap opt-in is that it's still possible for drivers to get at devices without going through probe. To which Greg replied: https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ If there are in-kernel PCI drivers that do not do this, they need to be fixed today. Can you guys resolve the differences here? I agree with you and Greg here. If a driver is accessing hardware resources outside of the bind lifetime of one of the devices it supports, and in a way that neither modrobe-policy nor device-authorization -policy infrastructure can block, that sounds like a bug report. The 5.15 tree has something like ~2.4k IO accesses (including MMIO and others) in init functions that also register drivers (thanks Elena for the number) Some are probably old drivers that could be fixed, but it's quite a few legitimate cases. For example for platform or ISA drivers that's the only way they can be implemented because they often have no other enumeration mechanism. For PCI drivers it's rarer, but also still can happen. One example that comes to mind here is the x86 Intel uncore drivers, which support a mix of MSR, ioremap and PCI config space accesses all from the same driver. This particular example can (and should be) fixed in other ways, but similar things also happen in other drivers, and they're not all broken. Even for the broken ones they're usually for some crufty old devices that has very few users, so it's likely untestable in practice. My point is just that the ecosystem of devices that Linux supports is messy enough that there are legitimate exceptions from the "First IO only in probe call only" rule. And we can't just fix them all. Even if we could it would be hard to maintain. Using a "firewall model" hooking into a few strategic points like we're proposing here is much saner for everyone. Now we can argue about the details. Right now what we're proposing has some redundancies: it has both a device model filter and low level filter for ioremap (this patch and some others). The low level filter is for catching issues that don't clearly fit into the "enumeration<->probe" model. You could call that redundant, but I would call it defense in depth or better safe than sorry. In theory it would be enough to have the low level opt-in only, but that would have the drawback that is something gets enumerated after all you would have all kind of weird device driver failures and in some cases even killed guests. So I think it makes sense to have Fix those drivers instead of sprinkling ioremap_shared in select places and with unclear rules about when a driver is allowed to do "shared" mappings. Only add it when the driver has been audited and hardened. But I agree we need on a documented process for this. I will work on some documentation for a proposal. But essentially I think it should be some variant of what Elena has outlined in her talk at Security Summit. https://static.sched.com/hosted_files/lssna2021/b6/LSS-HardeningLinuxGuestForCCC.pdf That is using extra auditing/scrutiny at review time, supported with some static code analysis that points to the interaction points, and code needs to be fuzzed explicitly. However short term it's only three virtio drivers, so this is not a urgent problem. Let the new device-authorization mechanism (with policy in userspace) Default policy in user space just seems to be a bad idea here. Who should know if a driver is hardened other than the kernel? Maintaining the list somewhere else just doesn't make sense to me. Also there is the more practical problem that some devices are needed for booting. For example in TDX we can't print something to the console with this mechanism, so you would never get any output before the initrd. Just seems like a nightmare for debugging anything. There really needs to be an authorization mechanism that works reasonably early. I can see a point of having user space overrides though, but we need to have a sane kernel default that works early. -Andi ___ Virtualization mailing
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Sat, Oct 9, 2021 at 2:53 AM Michael S. Tsirkin wrote: > > On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote: > > From: Andi Kleen > > > > For Confidential VM guests like TDX, the host is untrusted and hence > > the devices emulated by the host or any data coming from the host > > cannot be trusted. So the drivers that interact with the outside world > > have to be hardened by sharing memory with host on need basis > > with proper hardening fixes. > > > > For the PCI driver case, to share the memory with the host add > > pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. > > > > Signed-off-by: Andi Kleen > > Signed-off-by: Kuppuswamy Sathyanarayanan > > > > So I proposed to make all pci mappings shared, eliminating the need > to patch drivers. > > To which Andi replied > One problem with removing the ioremap opt-in is that > it's still possible for drivers to get at devices without going > through probe. > > To which Greg replied: > https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ > If there are in-kernel PCI drivers that do not do this, they need to > be > fixed today. > > Can you guys resolve the differences here? I agree with you and Greg here. If a driver is accessing hardware resources outside of the bind lifetime of one of the devices it supports, and in a way that neither modrobe-policy nor device-authorization -policy infrastructure can block, that sounds like a bug report. Fix those drivers instead of sprinkling ioremap_shared in select places and with unclear rules about when a driver is allowed to do "shared" mappings. Let the new device-authorization mechanism (with policy in userspace) be the central place where all of these driver "trust" issues are managed. > And once they are resolved, mention this in the commit log so > I don't get to re-read the series just to find out nothing > changed in this respect? > > I frankly do not believe we are anywhere near being able to harden > an arbitrary kernel config against attack. > How about creating a defconfig that makes sense for TDX then? > Anyone deviating from that better know what they are doing, > this API tweaking is just putting policy into the kernel ... Right, userspace authorization policy and select driver fixups seems to be the answer to the raised concerns. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v5 12/16] PCI: Add pci_iomap_host_shared(), pci_iomap_host_shared_range()
On Fri, Oct 08, 2021 at 05:37:07PM -0700, Kuppuswamy Sathyanarayanan wrote: > From: Andi Kleen > > For Confidential VM guests like TDX, the host is untrusted and hence > the devices emulated by the host or any data coming from the host > cannot be trusted. So the drivers that interact with the outside world > have to be hardened by sharing memory with host on need basis > with proper hardening fixes. > > For the PCI driver case, to share the memory with the host add > pci_iomap_host_shared() and pci_iomap_host_shared_range() APIs. > > Signed-off-by: Andi Kleen > Signed-off-by: Kuppuswamy Sathyanarayanan > So I proposed to make all pci mappings shared, eliminating the need to patch drivers. To which Andi replied One problem with removing the ioremap opt-in is that it's still possible for drivers to get at devices without going through probe. To which Greg replied: https://lore.kernel.org/all/yvxbnj431yiww...@kroah.com/ If there are in-kernel PCI drivers that do not do this, they need to be fixed today. Can you guys resolve the differences here? And once they are resolved, mention this in the commit log so I don't get to re-read the series just to find out nothing changed in this respect? I frankly do not believe we are anywhere near being able to harden an arbitrary kernel config against attack. How about creating a defconfig that makes sense for TDX then? Anyone deviating from that better know what they are doing, this API tweaking is just putting policy into the kernel ... > --- > Changes since v4: > * Replaced "_shared" with "_host_shared" in pci_iomap* APIs > * Fixed commit log as per review comments. > > include/asm-generic/pci_iomap.h | 6 + > lib/pci_iomap.c | 47 + > 2 files changed, 53 insertions(+) > > diff --git a/include/asm-generic/pci_iomap.h b/include/asm-generic/pci_iomap.h > index df636c6d8e6c..a4a83c8ab3cf 100644 > --- a/include/asm-generic/pci_iomap.h > +++ b/include/asm-generic/pci_iomap.h > @@ -18,6 +18,12 @@ extern void __iomem *pci_iomap_range(struct pci_dev *dev, > int bar, > extern void __iomem *pci_iomap_wc_range(struct pci_dev *dev, int bar, > unsigned long offset, > unsigned long maxlen); > +extern void __iomem *pci_iomap_host_shared(struct pci_dev *dev, int bar, > +unsigned long max); > +extern void __iomem *pci_iomap_host_shared_range(struct pci_dev *dev, int > bar, > + unsigned long offset, > + unsigned long maxlen); > + > /* Create a virtual mapping cookie for a port on a given PCI device. > * Do not call this directly, it exists to make it easier for architectures > * to override */ > diff --git a/lib/pci_iomap.c b/lib/pci_iomap.c > index 57bd92f599ee..2816dc8715da 100644 > --- a/lib/pci_iomap.c > +++ b/lib/pci_iomap.c > @@ -25,6 +25,11 @@ static void __iomem *map_ioremap_wc(phys_addr_t addr, > size_t size) > return ioremap_wc(addr, size); > } > > +static void __iomem *map_ioremap_host_shared(phys_addr_t addr, size_t size) > +{ > + return ioremap_host_shared(addr, size); > +} > + > static void __iomem *pci_iomap_range_map(struct pci_dev *dev, >int bar, >unsigned long offset, > @@ -106,6 +111,48 @@ void __iomem *pci_iomap_wc_range(struct pci_dev *dev, > } > EXPORT_SYMBOL_GPL(pci_iomap_wc_range); > > +/** > + * pci_iomap_host_shared_range - create a virtual shared mapping cookie > + *for a PCI BAR > + * @dev: PCI device that owns the BAR > + * @bar: BAR number > + * @offset: map memory at the given offset in BAR > + * @maxlen: max length of the memory to map > + * > + * Remap a pci device's resources shared in a confidential guest. > + * For more details see pci_iomap_range's documentation. So how does a driver author know when to use this function, and when to use the regular pci_iomap_range? Drivers have no idea whether they are used in a confidential guest, and which ranges are shared, it's a TDX thing ... This documentation should really address it. > + * > + * @maxlen specifies the maximum length to map. To get access to > + * the complete BAR from offset to the end, pass %0 here. > + */ > +void __iomem *pci_iomap_host_shared_range(struct pci_dev *dev, int bar, > + unsigned long offset, > + unsigned long maxlen) > +{ > + return pci_iomap_range_map(dev, bar, offset, maxlen, > +map_ioremap_host_shared, true); > +} > +EXPORT_SYMBOL_GPL(pci_iomap_host_shared_range); > + > +/** > + * pci_iomap_host_shared - create a virtual shared mapping cookie for a PCI > BAR > + * @dev: PCI device that owns the BAR > + * @bar: BAR