Re: linux-next: Fixes tags need some work in the tip tree
On 10/22/19 10:17 PM, Stephen Rothwell wrote: > Hi all, > > n commit > > 6fee2a0be0ec ("x86/cpu/vmware: Fix platform detection VMWARE_PORT macro") > > Fixes tag > > Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") The cited subject is missing a leading "x86/vmware:". Ingo, please let me know if you want me to respin those. Thanks, Thomas
[tip: x86/urgent] x86/cpu/vmware: Fix platform detection VMWARE_PORT macro
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: 6fee2a0be0ecae939d4b6cd8297d88b5cbb61654 Gitweb: https://git.kernel.org/tip/6fee2a0be0ecae939d4b6cd8297d88b5cbb61654 Author:Thomas Hellstrom AuthorDate:Mon, 21 Oct 2019 19:24:03 +02:00 Committer: Ingo Molnar CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00 x86/cpu/vmware: Fix platform detection VMWARE_PORT macro The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT definition, but expects it to be an integer. However, when it was moved to the new vmware.h include file, it was changed to be a string to better fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the platform detection VMWARE_PORT functionality. Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB definitions to be integers, and use __stringify() for their stringified form when needed. Signed-off-by: Thomas Hellstrom Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sean Christopherson Cc: Thomas Gleixner Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas...@shipmail.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/vmware.h | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h index 3caac90..ac9fc51 100644 --- a/arch/x86/include/asm/vmware.h +++ b/arch/x86/include/asm/vmware.h @@ -4,6 +4,7 @@ #include #include +#include /* * The hypercall definitions differ in the low word of the %edx argument @@ -20,8 +21,8 @@ */ /* Old port-based version */ -#define VMWARE_HYPERVISOR_PORT"0x5658" -#define VMWARE_HYPERVISOR_PORT_HB "0x5659" +#define VMWARE_HYPERVISOR_PORT0x5658 +#define VMWARE_HYPERVISOR_PORT_HB 0x5659 /* Current vmcall / vmmcall version */ #define VMWARE_HYPERVISOR_HB BIT(0) @@ -29,7 +30,7 @@ /* The low bandwidth call. The low word of edx is presumed clear. */ #define VMWARE_HYPERCALL \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \ "inl (%%dx), %%eax", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) @@ -39,7 +40,8 @@ * HB and OUT bits set. */ #define VMWARE_HYPERCALL_HB_OUT \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep outsb", \ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ + "rep outsb", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) @@ -48,7 +50,8 @@ * HB bit set. */ #define VMWARE_HYPERCALL_HB_IN \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep insb", \ + ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ + "rep insb", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) #endif
[tip: x86/urgent] x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm
The following commit has been merged into the x86/urgent branch of tip: Commit-ID: db633a4e0e6eda69b6065e3e106f9ea13a0676c3 Gitweb: https://git.kernel.org/tip/db633a4e0e6eda69b6065e3e106f9ea13a0676c3 Author:Thomas Hellstrom AuthorDate:Mon, 21 Oct 2019 19:24:02 +02:00 Committer: Ingo Molnar CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00 x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified. This was previously fixed for the VMWARE_PORT macro. Fix it also for the VMWARE_HYPERCALL macro. Suggested-by: Sami Tolvanen Signed-off-by: Thomas Hellstrom Reviewed-by: Nick Desaulniers Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sean Christopherson Cc: Thomas Gleixner Cc: clang-built-li...@googlegroups.com Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas...@shipmail.org Signed-off-by: Ingo Molnar --- arch/x86/include/asm/vmware.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h index e00c9e8..3caac90 100644 --- a/arch/x86/include/asm/vmware.h +++ b/arch/x86/include/asm/vmware.h @@ -29,7 +29,8 @@ /* The low bandwidth call. The low word of edx is presumed clear. */ #define VMWARE_HYPERCALL \ - ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \ + ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\ + "inl (%%dx), %%eax", \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL)
Re: [PATCH v6 0/8] Emulated coherent graphics memory take 2
On 10/14/19 3:22 PM, Thomas Hellström (VMware) wrote: > From: Thomas Hellström > > Graphics APIs like OpenGL 4.4 and Vulkan require the graphics driver > to provide coherent graphics memory, meaning that the GPU sees any > content written to the coherent memory on the next GPU operation that > touches that memory, and the CPU sees any content written by the GPU > to that memory immediately after any fence object trailing the GPU > operation is signaled. > > Paravirtual drivers that otherwise require explicit synchronization > needs to do this by hooking up dirty tracking to pagefault handlers > and buffer object validation. > > Provide mm helpers needed for this and that also allow for huge pmd- > and pud entries (patch 1-3), and the associated vmwgfx code (patch 4-7). > > The code has been tested and exercised by a tailored version of mesa > where we disable all explicit synchronization and assume graphics memory > is coherent. The performance loss varies of course; a typical number is > around 5%. > > I would like to merge this code through the DRM tree, so an ack to include > the new mm helpers in that merge would be greatly appreciated. > > Changes since RFC: > - Merge conflict changes moved to the correct patch. Fixes intra-patchset > compile errors. > - Be more aggressive when turning ttm vm code into helpers. This makes sure > we can use a const qualifier on the vmwgfx vm_ops. > - Reinstate a lost comment an fix an error path that was broken when turning > the ttm vm code into helpers. > - Remove explicit type-casts of struct vm_area_struct::vm_private_data > - Clarify the locking inversion that makes us not being able to use the mm > pagewalk code. > > Changes since v1: > - Removed the vmwgfx maintainer entry for as_dirty_helpers.c, updated > commit message accordingly > - Removed the TTM patches from the series as they are merged separately > through DRM. > Changes since v2: > - Split out the pagewalk code from as_dirty_helpers.c and document locking. > - Add pre_vma and post_vma callbacks to the pagewalk code. > - Remove huge pmd and -pud asserts that would trip when we protect vmas with > struct address_space::i_mmap_rwsem rather than with > struct vm_area_struct::mmap_sem. > - Do some naming cleanup in as_dirty_helpers.c > Changes since v3: > - Extensive renaming of the dirty helpers including the filename. > - Update walk_page_mapping() doc. > - Update the pagewalk code to not unconditionally split pmds if a pte_entry() > callback is present. Update the dirty helper pmd_entry accordingly. > - Use separate walk ops for the dirty helpers. > - Update the pagewalk code to take the pagetable lock in walk_pte_range. > Changes since v4: > - Fix pte pointer confusion in patch 2/8 > - Skip the pagewalk code conditional split patch for now, and update the > mapping_dirty_helper accordingly. That problem will be solved in a cleaner > way in a follow-up patchset. > Changes since v5: > - Fix tlb flushing when we have other pending tlb flushes. > > Cc: Andrew Morton > Cc: Matthew Wilcox > Cc: Will Deacon > Cc: Peter Zijlstra > Cc: Rik van Riel > Cc: Minchan Kim > Cc: Michal Hocko > Cc: Huang Ying > Cc: Jérôme Glisse > Cc: Kirill A. Shutemov > Kirill, Linus I have a formal Ack for two of the four mm patches. Is there a chance I can get an ack to merge the mm patches of this series through drm with the vmwgfx patches? Thanks, Thomas
Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present
Hi, On 10/9/19 7:17 PM, Linus Torvalds wrote: > On Wed, Oct 9, 2019 at 10:03 AM Thomas Hellström (VMware) > wrote: >> Nope, it handles the hugepages by ignoring them, since they should be >> read-only, but if pmd_entry() was called with something else than a >> hugepage, then it requests the fallback, but never a split. > But PAGE_WALK_FALLBACK _is_ a split. > > Oh, except you did this > > - split_huge_pmd(walk->vma, pmd, addr); > + if (!ops->pmd_entry) > + split_huge_pmd(walk->vma, pmd, addr); > > > so it avoids the split. > > No, that's unacceptable. And makes no sense anyway. If it doesn't > split the pmd, then it shouldn't walk the pte's - because they don't > exist. And if it's not a hugepmd, then the split is a no-op, so the > test makes no sense. > > I hadn't noticed that part of the patch. That simply can't be right. I > don't think you've tested this, because you never actually have > hugepages, do you? > > You didn't notice or realize that split_huge_pmd() just does that > > if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) \ > || pmd_devmap(*pmd))\ > > thing and doesn't do anythign at all if it's not huge. > > So no. That code makes no sense at all, and I didn't realize how > senseless it was, becasue I stupidly missed that "make the split > conditional" - which is insane and wrong - and I thought that you > wanted PAGE_WALK_FALLBACK to split a pmd and fall back to per-pte > entries, which is what the name implies. > > But that's not what you wanted at all. > > Just get rid of PAGE_WALK_FALLBACK entirely then, and make the rule be > that a zero return value just means "split and do ptes". Which is what > you want (see above why "split" simply is wrong, and isn't an issue > for you anyway. > > That won't change any existing cases, since even if they do have a > zero return value, they don't have a pte_entry() callback, so they > won't do that "split and do ptes" anyway. > > Linus > Hmm, so we have the following cases we need to handle when returning from the pmd_entry() handler. 1) Huge pmd was handled - Returns 0 and continues. 2) A pmd is otherwise unstable, typically someone just zapped a huge pmd. Returns PAGE_WALK_FALLBACK, gets caught in the pmd_trans_unstable() test and retries. 3) A pte directory - Returns PAGE_WALK_FALLBACK, falls through, avoids the split and continues to the next level. Yeah that split avoidance test is indeed made unnecessary by the preceding pmd_trans_unstable() test. - split_huge_pmd(walk->vma, pmd, addr); + if (!ops->pmd_entry) + split_huge_pmd(walk->vma, pmd, addr); But as the commit message says, PAGE_WALK_FALLBACK is necessary to have a virtual address range being handled once and only once. Therefore we must distinguish between 1) and 2) since 2) must be retried until it's handled correctly. So we need the PAGE_WALK_FALLBACK. And if we instead were to combine 1) and 3) in a single return value and use, for example PAGE_WALK_RETRY for 2) the following could happen. a) we handle the huge pmd and return 0 from pte_entry(). b) another process splits it. c) we fall through to the pte level and handle the same address range again... So to summarize, yes the test in the code you cite is unnecessary. But if we want to guarantee a virtual address range being handled once and only once we need the PAGE_WALK_FALLBACK, (perhaps renamed). If not, we can do without it similar to your original patch. Thanks, /Thomas
Re: [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges
On 10/8/19 7:07 PM, Linus Torvalds wrote: > On Tue, Oct 8, 2019 at 2:15 AM Thomas Hellström (VMware) > wrote: >> Add two utilities to 1) write-protect and 2) clean all ptes pointing into >> a range of an address space. > The code looks much simpler and easier to understand now, I think, but > that also makes some thing more obvious.. > >> +static int clean_record_pte(pte_t *pte, unsigned long addr, >> + unsigned long end, struct mm_walk *walk) >> +{ >> + struct wp_walk *wpwalk = walk->private; >> + struct clean_walk *cwalk = to_clean_walk(wpwalk); >> + pte_t ptent = *pte; >> + >> + if (pte_dirty(ptent)) { >> + pgoff_t pgoff = ((addr - walk->vma->vm_start) >> PAGE_SHIFT) >> + >> + walk->vma->vm_pgoff - cwalk->bitmap_pgoff; >> + pte_t old_pte = ptep_modify_prot_start(walk->vma, addr, pte); >> + >> + ptent = pte_mkclean(old_pte); >> + ptep_modify_prot_commit(walk->vma, addr, pte, old_pte, >> ptent); >> + >> + wpwalk->total++; >> + wpwalk->tlbflush_start = min(wpwalk->tlbflush_start, addr); >> + wpwalk->tlbflush_end = max(wpwalk->tlbflush_end, >> + addr + PAGE_SIZE); >> + >> + __set_bit(pgoff, cwalk->bitmap); > The above looks fundamentally racy. > > You clean the pte in memory, but it remains dirty and writable in the > TLB, and you only flush it much later. > > So now writes can continue to happen to that page, without it > necessarily being marked dirty again in the page tables, because all > the CPU TLB caches say "it's already dirty". > > This may be ok - you've moved the diry bit into that bitmap, and you > will flush the TLB before then taking action on the bitmap. So you > haven't really _lost_ any dirty bits, but it still may be worth a > comment. > > You do have comments about the _other_ issues (ie the whole "If a > caller needs to make sure all dirty ptes are picked up and none > additional are added..") but you don't actually have comments about > the TLB race basically potentially now causing "missing" dirty stuff. > > And this may actually be a big problem on some architectures. Not x86, > and maybe nothing else, but I have this dim memory of some > architectures being unhappy due to virtual caches, and writeback when > the page table entry says it's read-only and clean. > > Our normal "clean pages" is very anal about this, and does > > flush_cache_page(vma, address, pte_pfn(*pte)); > entry = ptep_clear_flush(vma, address, pte); > entry = pte_wrprotect(entry); > entry = pte_mkclean(entry); > set_pte_at(vma->vm_mm, address, pte, entry); > > when it cleans a page, and I note both the cache flush and the > "clear_flush()" (which invalidates the pte and does a synchronous TLB > flush) and we have magic semantics for that at least on s390 because > there are some low-level architecture details wrt flushing TLB entries > and modifying them. > > End result: I think the code is probably ok in practice because you > don't mind the slightly fuzzy dirty bit state, and it's _probably_ ok > on all architectures that use drm for the TLB invalidation side. But I > think it bears a bit of thinking about. Yes, there's been some considerable thinking behind this. We do have the cache flush in the pre_vma callback, and as you mention the TLB flush happens before we use the bitmap: any pte that may be subject to a race is recorded in the bitmap, and the guarantee we want to provide with the function is a) All ptes that are dirtied before the function starts executing will be recorded in the bitmap. b) All ptes dirtied after that will either be recorded in the bitmap, remain dirty or both. I probably should add that in the docs. And the actual writeback happens asynchronously a lot later *should* be OK, as long as caches are synced for the dma operation. Of course if this somehow fails on a particular architecture I guess we need to rethink and use ptep_clear_flush() at least on that architecture. Thanks, Thomas. > > Linus >
Re: [PATCH] x86/cpu/vmware: use the full form of inl in VMWARE_PORT
On 10/7/19 9:21 PM, Sami Tolvanen wrote: > LLVM's assembler doesn't accept the short form inl (%%dx) instruction, > but instead insists on the output register to be explicitly specified: > > :1:7: error: invalid operand for instruction > inl (%dx) > ^ > LLVM ERROR: Error parsing inline asm > > Use the full form of the instruction to fix the build. > > Signed-off-by: Sami Tolvanen Acked-by: Thomas Hellstrom > --- > arch/x86/kernel/cpu/vmware.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c > index 9735139cfdf8..46d732696c1c 100644 > --- a/arch/x86/kernel/cpu/vmware.c > +++ b/arch/x86/kernel/cpu/vmware.c > @@ -49,7 +49,7 @@ > #define VMWARE_CMD_VCPU_RESERVED 31 > > #define VMWARE_PORT(cmd, eax, ebx, ecx, edx) \ > - __asm__("inl (%%dx)" : \ > + __asm__("inl (%%dx), %%eax" : \ > "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\ > "a"(VMWARE_HYPERVISOR_MAGIC), \ > "c"(VMWARE_CMD_##cmd), \
Re: [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges
On 10/2/19 10:28 PM, Linus Torvalds wrote: > On Wed, Oct 2, 2019 at 12:09 PM Thomas Hellström (VMware) > wrote: >> Yes I typically tend towards using a "namespace_object_operation" naming >> scheme, with "as_dirty" being the namespace here, > We discourage that kind of mindless namespacing for core stuff. > > It makes sense in a driver or a filesystem: when there are 20+ > different filesystems all implementing the same operation, you can't > have descriptive and natural names that are unique. So having a > namespace prefix for the filesystem or driver makes sense. But even > then it tends ot be just a simple name, not the op it does. Understood. > >> Looking at Matthew's suggestion but lining up with >> "unmap_mapping_range()", perhaps we could use "clean_mapping_range" and >> "wp_mapping_range"? > Yes, I agree with Willy that "dirty" is kind of implicit when the > operation is to clean something, so the above sounds sane to me. > > The wp part I'm not entirely sure about: you're not actually > write-protecting the range. You're doing something different. You're > only doing it for shared writable mappings. Both the cleaning operation and the wp operation operate on shared writable mappings, and since they are also both restricted to entries that may take part in dirty-tracking (they're ignoring pmds and puds), so perhaps a "dirty" may make sense anyway, and to point out the similarity: clean_mapping_dirty_range() and wp_mapping_dirty_range()"? > And I'd rather see > separate 'struct mm_walk_ops' than shared ones that then have a flag > in a differfent structure to change behavior. > > Yes, yes, some of the levels share the same logic, but that just means > that you can use the same function pointer for that level in the > different "clean" vs "wp" mm_walk_op. I think that this comment and this last one: > Also, why do you loop inside the pmd_entry function, instead of just > having a pte_entry function?" are tied together. The pagewalk code is kind of awkward since if you provide a pte_entry function, then huge pmds are unconditionally split, even if handled in pmd_entry, forcing pmd-aware users to handle also ptes in pmd_entry(). I'm not entirely sure why, but it looks like it's to avoid a race where huge pmds could otherwise be unintentionally split if appearing *after* the pmd_entry() call. Other pagewalk users do the same here, see for example clear_refs_pte_range(); https://elixir.bootlin.com/linux/latest/source/fs/proc/task_mmu.c#L1040 Also the pagewalk walk_pte_range() for some reason doesn't take the page table lock, which means that a pte may well be cleared under us while we have it cached for modification. For these reasons we can't use the pte_entry, even internally and this means we have three choices: a) Selecting the pte function using a bool. Saves code and avoids extra indirect function call. b) Duplicating the pmd_entry with a different pte function, also duplicating the ops. Avoids extra indirect function call but some extra code. c) Instead of the bool, use another function pointer in yet another ops pointed to by the private structure. Saves some code. I opted for a) here, but I can of course change that if needed? > > Also, looking closer at this patch, this makes me go "Whaa?": > > + pte = (mm == _mm) ? > + pte_offset_kernel(pmd, addr) : > + pte_offset_map_lock(mm, pmd, addr, ); > > because I don't think that's sensible. When do you have a vma in kernel space? > > Also, why do you loop inside the pmd_entry function, instead of just > having a pte_entry function? Yes, that was a blind copy from v1 of the code that used "apply_to_page_range()". I'll fix that up. Thanks, /Thomas > >Linus >
Re: [PATCH v3 1/2] x86: Don't let pgprot_modify() change the page encryption bit
Hi, On 9/18/19 7:57 PM, Dave Hansen wrote: > On 9/17/19 6:01 AM, Thomas Hellström (VMware) wrote: >> diff --git a/arch/x86/include/asm/pgtable_types.h >> b/arch/x86/include/asm/pgtable_types.h >> index b5e49e6bac63..8267dd426b15 100644 >> --- a/arch/x86/include/asm/pgtable_types.h >> +++ b/arch/x86/include/asm/pgtable_types.h >> @@ -123,7 +123,7 @@ >> */ >> #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | >> \ >> _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ >> - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP) >> + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC) >> #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) > My only nit with what remains is that it expands the infestation of > things that look like a simple macro but are not. > > I'm debating whether we want to go fix that now, though. > Any chance for an ack on this? It's really a small change that, as we've found out, fixes an existing problem. Thanks, Thomas
Re: [PATCH 7/8] drm/vmwgfx: switch to own vma manager
On Thu, 2019-09-05 at 09:05 +0200, Gerd Hoffmann wrote: > Add struct drm_vma_offset_manager to vma_private, initialize it and > pass it to ttm_bo_device_init(). > > With this in place the last user of ttm's embedded vma offset manager > is gone and we can remove it (in a separate patch). > > Signed-off-by: Gerd Hoffmann > --- > drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1 + > drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 +- > 2 files changed, 6 insertions(+), 1 deletion(-) > > Reviewed-by: Thomas Hellström I assume this will be merged through drm-misc? /Thomas
[tip: x86/vmware] x86/vmware: Add a header file for hypercall definitions
The following commit has been merged into the x86/vmware branch of tip: Commit-ID: b4dd4f6e3648dfd66576515fd885a9a765c0 Gitweb: https://git.kernel.org/tip/b4dd4f6e3648dfd66576515fd885a9a765c0 Author:Thomas Hellstrom AuthorDate:Wed, 28 Aug 2019 10:03:51 +02:00 Committer: Borislav Petkov CommitterDate: Wed, 28 Aug 2019 13:32:06 +02:00 x86/vmware: Add a header file for hypercall definitions The new header is intended to be used by drivers using the backdoor. Follow the KVM example using alternatives self-patching to choose between vmcall, vmmcall and io instructions. Also define two new CPU feature flags to indicate hypervisor support for vmcall- and vmmcall instructions. The new XF86_FEATURE_VMW_VMMCALL flag is needed because using XF86_FEATURE_VMMCALL might break QEMU/KVM setups using the vmmouse driver. They rely on XF86_FEATURE_VMMCALL on AMD to get the kvm_hypercall() right. But they do not yet implement vmmcall for the VMware hypercall used by the vmmouse driver. [ bp: reflow hypercall %edx usage explanation comment. ] Signed-off-by: Thomas Hellstrom Signed-off-by: Borislav Petkov Reviewed-by: Doug Covelli Cc: Aaron Lewis Cc: "David S. Miller" Cc: Fenghua Yu Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Josh Poimboeuf Cc: linux-graphics-maintai...@vmware.com Cc: Mauro Carvalho Chehab Cc: Nicolas Ferre Cc: Robert Hoo Cc: Thomas Gleixner Cc: virtualizat...@lists.linux-foundation.org Cc: Cc: x86-ml Link: https://lkml.kernel.org/r/20190828080353.12658-3-thomas...@shipmail.org --- MAINTAINERS| 1 +- arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/vmware.h | 53 +- arch/x86/kernel/cpu/vmware.c | 6 ++- 4 files changed, 61 insertions(+), 1 deletion(-) create mode 100644 arch/x86/include/asm/vmware.h diff --git a/MAINTAINERS b/MAINTAINERS index 9cbcf16..47efc1b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17206,6 +17206,7 @@ M: "VMware, Inc." L: virtualizat...@lists.linux-foundation.org S: Supported F: arch/x86/kernel/cpu/vmware.c +F: arch/x86/include/asm/vmware.h VMWARE PVRDMA DRIVER M: Adit Ranadive diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index e880f24..aaeae2f 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -232,6 +232,8 @@ #define X86_FEATURE_VMMCALL( 8*32+15) /* Prefer VMMCALL to VMCALL */ #define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ #define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table access-dirty bit */ +#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports the VMCALL instruction */ +#define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers VMMCALL hypercall instruction */ /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */ #define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h new file mode 100644 index 000..e00c9e8 --- /dev/null +++ b/arch/x86/include/asm/vmware.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 or MIT */ +#ifndef _ASM_X86_VMWARE_H +#define _ASM_X86_VMWARE_H + +#include +#include + +/* + * The hypercall definitions differ in the low word of the %edx argument + * in the following way: the old port base interface uses the port + * number to distinguish between high- and low bandwidth versions. + * + * The new vmcall interface instead uses a set of flags to select + * bandwidth mode and transfer direction. The flags should be loaded + * into %dx by any user and are automatically replaced by the port + * number if the VMWARE_HYPERVISOR_PORT method is used. + * + * In short, new driver code should strictly use the new definition of + * %dx content. + */ + +/* Old port-based version */ +#define VMWARE_HYPERVISOR_PORT"0x5658" +#define VMWARE_HYPERVISOR_PORT_HB "0x5659" + +/* Current vmcall / vmmcall version */ +#define VMWARE_HYPERVISOR_HB BIT(0) +#define VMWARE_HYPERVISOR_OUT BIT(1) + +/* The low bandwidth call. The low word of edx is presumed clear. */ +#define VMWARE_HYPERCALL \ + ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \ + "vmcall", X86_FEATURE_VMCALL, \ + "vmmcall", X86_FEATURE_VMW_VMMCALL) + +/* + * The high bandwidth out call. The low word of edx is presumed to have the + * HB and OUT bits set. + */ +#define VMWARE_HYPERCALL_HB_OUT \ + ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_
[tip: x86/vmware] x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls
The following commit has been merged into the x86/vmware branch of tip: Commit-ID: bac7b4e843232a3a49a042410cf743341eb0887e Gitweb: https://git.kernel.org/tip/bac7b4e843232a3a49a042410cf743341eb0887e Author:Thomas Hellstrom AuthorDate:Wed, 28 Aug 2019 10:03:50 +02:00 Committer: Borislav Petkov CommitterDate: Wed, 28 Aug 2019 10:48:30 +02:00 x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls Vmware has historically used an INL instruction for this, but recent hardware versions support using VMCALL/VMMCALL instead, so use this method if supported at platform detection time. Explicitly code separate macro versions since the alternatives self-patching has not been performed at platform detection time. Also put tighter constraints on the assembly input parameters. Co-developed-by: Doug Covelli Signed-off-by: Doug Covelli Signed-off-by: Thomas Hellstrom Signed-off-by: Borislav Petkov Reviewed-by: Doug Covelli Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-graphics-maintai...@vmware.com Cc: Thomas Gleixner Cc: virtualizat...@lists.linux-foundation.org Cc: Cc: x86-ml Link: https://lkml.kernel.org/r/20190828080353.12658-2-thomas...@shipmail.org --- arch/x86/kernel/cpu/vmware.c | 88 --- 1 file changed, 71 insertions(+), 17 deletions(-) diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 3c64847..757dded 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -34,30 +34,65 @@ #undef pr_fmt #define pr_fmt(fmt)"vmware: " fmt -#define CPUID_VMWARE_INFO_LEAF 0x4000 +#define CPUID_VMWARE_INFO_LEAF 0x4000 +#define CPUID_VMWARE_FEATURES_LEAF 0x4010 +#define CPUID_VMWARE_FEATURES_ECX_VMMCALLBIT(0) +#define CPUID_VMWARE_FEATURES_ECX_VMCALL BIT(1) + #define VMWARE_HYPERVISOR_MAGIC0x564D5868 #define VMWARE_HYPERVISOR_PORT 0x5658 -#define VMWARE_PORT_CMD_GETVERSION 10 -#define VMWARE_PORT_CMD_GETHZ 45 -#define VMWARE_PORT_CMD_GETVCPU_INFO 68 -#define VMWARE_PORT_CMD_LEGACY_X2APIC 3 -#define VMWARE_PORT_CMD_VCPU_RESERVED 31 +#define VMWARE_CMD_GETVERSION10 +#define VMWARE_CMD_GETHZ 45 +#define VMWARE_CMD_GETVCPU_INFO 68 +#define VMWARE_CMD_LEGACY_X2APIC 3 +#define VMWARE_CMD_VCPU_RESERVED 31 #define VMWARE_PORT(cmd, eax, ebx, ecx, edx) \ __asm__("inl (%%dx)" : \ - "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\ - "0"(VMWARE_HYPERVISOR_MAGIC), \ - "1"(VMWARE_PORT_CMD_##cmd), \ - "2"(VMWARE_HYPERVISOR_PORT), "3"(UINT_MAX) :\ - "memory"); + "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\ + "a"(VMWARE_HYPERVISOR_MAGIC), \ + "c"(VMWARE_CMD_##cmd), \ + "d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :\ + "memory") + +#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx) \ + __asm__("vmcall" : \ + "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\ + "a"(VMWARE_HYPERVISOR_MAGIC), \ + "c"(VMWARE_CMD_##cmd), \ + "d"(0), "b"(UINT_MAX) : \ + "memory") + +#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx) \ + __asm__("vmmcall" : \ + "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\ + "a"(VMWARE_HYPERVISOR_MAGIC), \ + "c"(VMWARE_CMD_##cmd), \ + "d"(0), "b"(UINT_MAX) : \ + "memory") + +#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do { \ + switch (vmware_hypercall_mode) {\ + case CPUID_VMWARE_FEATURES_ECX_VMCALL: \ + VMWARE_VMCALL(cmd, eax, ebx, ecx, edx); \ + break; \ + case CPUID_VMWARE_FEATURES_ECX_VMMCALL: \ + VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);\ +
[tip: x86/vmware] drm/vmwgfx: Update the backdoor call with support for new instructions
The following commit has been merged into the x86/vmware branch of tip: Commit-ID: 6abe3778cf5abd59b23b9037796f3eab8b7f1d98 Gitweb: https://git.kernel.org/tip/6abe3778cf5abd59b23b9037796f3eab8b7f1d98 Author:Thomas Hellstrom AuthorDate:Wed, 28 Aug 2019 10:03:52 +02:00 Committer: Borislav Petkov CommitterDate: Wed, 28 Aug 2019 13:36:46 +02:00 drm/vmwgfx: Update the backdoor call with support for new instructions Use the definition provided by include/asm/vmware.h Signed-off-by: Thomas Hellstrom Signed-off-by: Borislav Petkov Reviewed-by: Doug Covelli Acked-by: Dave Airlie Cc: Daniel Vetter Cc: dri-de...@lists.freedesktop.org Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: pv-driv...@vmware.com Cc: Thomas Gleixner Cc: VMware Graphics Cc: x86-ml Link: https://lkml.kernel.org/r/20190828080353.12658-4-thomas...@shipmail.org --- drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 21 - drivers/gpu/drm/vmwgfx/vmwgfx_msg.h | 35 ++-- 2 files changed, 28 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c index 59e9d05..b1df3e3 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c @@ -46,8 +46,6 @@ #define RETRIES 3 #define VMW_HYPERVISOR_MAGIC0x564D5868 -#define VMW_HYPERVISOR_PORT 0x5658 -#define VMW_HYPERVISOR_HB_PORT 0x5659 #define VMW_PORT_CMD_MSG30 #define VMW_PORT_CMD_HB_MSG 0 @@ -93,7 +91,7 @@ static int vmw_open_channel(struct rpc_channel *channel, unsigned int protocol) VMW_PORT(VMW_PORT_CMD_OPEN_CHANNEL, (protocol | GUESTMSG_FLAG_COOKIE), si, di, - VMW_HYPERVISOR_PORT, + 0, VMW_HYPERVISOR_MAGIC, eax, ebx, ecx, edx, si, di); @@ -126,7 +124,7 @@ static int vmw_close_channel(struct rpc_channel *channel) VMW_PORT(VMW_PORT_CMD_CLOSE_CHANNEL, 0, si, di, - (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)), + channel->channel_id << 16, VMW_HYPERVISOR_MAGIC, eax, ebx, ecx, edx, si, di); @@ -160,7 +158,8 @@ static unsigned long vmw_port_hb_out(struct rpc_channel *channel, VMW_PORT_HB_OUT( (MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG, msg_len, si, di, - VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16), + VMWARE_HYPERVISOR_HB | (channel->channel_id << 16) | + VMWARE_HYPERVISOR_OUT, VMW_HYPERVISOR_MAGIC, bp, eax, ebx, ecx, edx, si, di); @@ -181,7 +180,7 @@ static unsigned long vmw_port_hb_out(struct rpc_channel *channel, VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_SENDPAYLOAD << 16), word, si, di, -VMW_HYPERVISOR_PORT | (channel->channel_id << 16), +channel->channel_id << 16, VMW_HYPERVISOR_MAGIC, eax, ebx, ecx, edx, si, di); } @@ -213,7 +212,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel *channel, char *reply, VMW_PORT_HB_IN( (MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG, reply_len, si, di, - VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16), + VMWARE_HYPERVISOR_HB | (channel->channel_id << 16), VMW_HYPERVISOR_MAGIC, bp, eax, ebx, ecx, edx, si, di); @@ -230,7 +229,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel *channel, char *reply, VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_RECVPAYLOAD << 16), MESSAGE_STATUS_SUCCESS, si, di, -VMW_HYPERVISOR_PORT | (channel->channel_id << 16), +channel->channel_id << 16, VMW_HYPERVISOR_MAGIC, eax, ebx, ecx, edx, si, di); @@ -269,7 +268,7 @@ static int vmw_send_msg(struct rpc_channel *channel, const char *msg) VMW_PORT(VMW_PORT_CMD_SENDSIZE, msg_len, si, di, - VMW_HYPERVISOR_PORT | (channel->channel_id << 16), + channel->channel_id << 16, VMW_HYPERVISOR_MAGIC, eax, ebx, ecx, edx, si, di); @@ -327,7 +326,7 @@ static int vmw_recv_msg(struct rpc_channel *channel, void **msg, VMW_PORT(VMW_PORT_CMD_RECVSIZE, 0, si, di, - (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)), +
[tip: x86/vmware] input/vmmouse: Update the backdoor call with support for new instructions
The following commit has been merged into the x86/vmware branch of tip: Commit-ID: f7b15c74cffd760ec9959078982d8268a38456c4 Gitweb: https://git.kernel.org/tip/f7b15c74cffd760ec9959078982d8268a38456c4 Author:Thomas Hellstrom AuthorDate:Wed, 28 Aug 2019 10:03:53 +02:00 Committer: Borislav Petkov CommitterDate: Wed, 28 Aug 2019 13:43:01 +02:00 input/vmmouse: Update the backdoor call with support for new instructions Use the definition provided by include/asm/vmware.h. Signed-off-by: Thomas Hellstrom Signed-off-by: Borislav Petkov Reviewed-by: Doug Covelli Acked-by: Dmitry Torokhov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: linux-in...@vger.kernel.org Cc: Thomas Gleixner Cc: VMware Graphics Cc: Cc: x86-ml Link: https://lkml.kernel.org/r/20190828080353.12658-5-thomas...@shipmail.org --- drivers/input/mouse/vmmouse.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c index 871e5b5..148245c 100644 --- a/drivers/input/mouse/vmmouse.c +++ b/drivers/input/mouse/vmmouse.c @@ -16,12 +16,12 @@ #include #include #include +#include #include "psmouse.h" #include "vmmouse.h" #define VMMOUSE_PROTO_MAGIC0x564D5868U -#define VMMOUSE_PROTO_PORT 0x5658 /* * Main commands supported by the vmmouse hypervisor port. @@ -84,7 +84,7 @@ struct vmmouse_data { #define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4) \ ({ \ unsigned long __dummy1, __dummy2; \ - __asm__ __volatile__ ("inl %%dx" : \ + __asm__ __volatile__ (VMWARE_HYPERCALL :\ "=a"(out1), \ "=b"(out2), \ "=c"(out3), \ @@ -94,7 +94,7 @@ struct vmmouse_data { "a"(VMMOUSE_PROTO_MAGIC), \ "b"(in1), \ "c"(VMMOUSE_PROTO_CMD_##cmd), \ - "d"(VMMOUSE_PROTO_PORT) : \ + "d"(0) :\ "memory"); \ })
Re: cleanup the walk_page_range interface
On 8/8/19 11:56 PM, Christoph Hellwig wrote: On Thu, Aug 08, 2019 at 10:50:37AM -0700, Linus Torvalds wrote: Note that both Thomas and Steven have series touching this area pending, and there are a couple consumer in flux too - the hmm tree already conflicts with this series, and I have potential dma changes on top of the consumers in Thomas and Steven's series, so we'll probably need a git tree similar to the hmm one to synchronize these updates. I'd be willing to just merge this now, if that helps. The conversion is mechanical, and my only slight worry would be that at least for my original patch I didn't build-test the (few) non-x86 architecture-specific cases. But I did end up looking at them fairly closely (basically using some grep/sed scripts to see that the conversions I did matched the same patterns). And your changes look like obvious improvements too where any mistake would have been caught by the compiler. I did cross compile the s390 and powerpc bits, but I do not have an openrisc compiler. So I'm not all that worried from a functionality standpoint, and if this will help the next merge window, I'll happily pull now. That would help with this series vs the others, but not with the other series vs each other. Although my series doesn't touch the pagewalk code, it rather borrowed some concepts from it and used for the apply_to_page_range() interface. The reason being that the pagewalk code requires the mmap_sem to be held (mainly for trans-huge pages and reading the vma->vm_flags if I understand the code correctly). That is fine when you scan the vmas of a process, but the helpers I wrote need to instead scan all vmas pointing into a struct address_space, and taking the mmap_sem for each vma will create lock inversion problems. /Thomas
Re: [PATCH 2/3] pagewalk: seperate function pointers from iterator data
On 8/8/19 5:42 PM, Christoph Hellwig wrote: The mm_walk structure currently mixed data and code. Split out the operations vectors into a new mm_walk_ops structure, and while we are changing the API also declare the mm_walk structure inside the walk_page_range and walk_page_vma functions. Based on patch from Linus Torvalds. Signed-off-by: Christoph Hellwig --- Typo: For the patch title s/seperate/separate/ Otherwise for the series Reviewed-by: Thomas Hellstrom /Thomas
[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
Commit-ID: 406de552c2be6ded524c75d14a73cf7f027f587e Gitweb: https://git.kernel.org/tip/406de552c2be6ded524c75d14a73cf7f027f587e Author: Thomas Hellstrom AuthorDate: Thu, 28 Mar 2019 12:06:37 + Committer: Thomas Gleixner CommitDate: Wed, 17 Jul 2019 00:42:27 +0200 MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE Alok Kataria will be handing over VMware's maintainership of these interfaces to Thomas Hellström, with pv-drivers as backup contact. Signed-off-by: Thomas Hellstrom Signed-off-by: Thomas Gleixner Acked-by: Alok Kataria Acked-by: Juergen Gross Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com --- MAINTAINERS | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index f5533d1bda2e..80fa7a4a0b56 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12074,7 +12074,8 @@ F: Documentation/parport*.txt PARAVIRT_OPS INTERFACE M: Juergen Gross -M: Alok Kataria +M: Thomas Hellstrom +M: "VMware, Inc." L: virtualizat...@lists.linux-foundation.org S: Supported F: Documentation/virtual/paravirt_ops.txt @@ -17087,7 +17088,8 @@ S: Maintained F: drivers/misc/vmw_balloon.c VMWARE HYPERVISOR INTERFACE -M: Alok Kataria +M: Thomas Hellstrom +M: "VMware, Inc." L: virtualizat...@lists.linux-foundation.org S: Supported F: arch/x86/kernel/cpu/vmware.c
[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
Commit-ID: 907cb11da7a725445dccc6c2ca2d428739f6cd71 Gitweb: https://git.kernel.org/tip/907cb11da7a725445dccc6c2ca2d428739f6cd71 Author: Thomas Hellstrom AuthorDate: Thu, 28 Mar 2019 12:06:37 + Committer: Thomas Gleixner CommitDate: Tue, 16 Jul 2019 23:13:50 +0200 MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE Alok Kataria will be handing over VMware's maintainership of these interfaces to Thomas Hellström, with pv-drivers as backup contact. Signed-off-by: Thomas Hellstrom Signed-off-by: Thomas Gleixner Acked-by: Alok Kataria Acked-by: Juergen Gross Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com --- MAINTAINERS | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index f5533d1bda2e..80fa7a4a0b56 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12074,7 +12074,8 @@ F: Documentation/parport*.txt PARAVIRT_OPS INTERFACE M: Juergen Gross -M: Alok Kataria +M: Thomas Hellstrom +M: "VMware, Inc." L: virtualizat...@lists.linux-foundation.org S: Supported F: Documentation/virtual/paravirt_ops.txt @@ -17087,7 +17088,8 @@ S: Maintained F: drivers/misc/vmw_balloon.c VMWARE HYPERVISOR INTERFACE -M: Alok Kataria +M: Thomas Hellstrom +M: "VMware, Inc." L: virtualizat...@lists.linux-foundation.org S: Supported F: arch/x86/kernel/cpu/vmware.c
Re: [PATCH] drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read
Thanks, Murray, I'll include in the next vmwgfx-fixes pull request. On Mon, 2019-05-20 at 21:57 +1200, Murray McAllister wrote: > If SVGA_3D_CMD_DX_SET_SHADER is called with a shader ID > of SVGA3D_INVALID_ID, and a shader type of > SVGA3D_SHADERTYPE_INVALID, the calculated binding.shader_slot > will be 4294967295, leading to an out-of-bounds read in > vmw_binding_loc() > when the offset is calculated. > > Signed-off-by: Murray McAllister > --- > drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > index 2ff7ba04d8c8..9aeb5448cfc1 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > @@ -2193,7 +2193,8 @@ static int vmw_cmd_dx_set_shader(struct > vmw_private *dev_priv, > > cmd = container_of(header, typeof(*cmd), header); > > - if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX) { > + if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX || > + cmd->body.type < SVGA3D_SHADERTYPE_MIN) { > VMW_DEBUG_USER("Illegal shader type %u.\n", > (unsigned int) cmd->body.type); > return -EINVAL;
[PATCH 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks
Add the callbacks necessary to implement emulated coherent memory for surfaces. Add a flag to the gb_surface_create ioctl to indicate that surface memory should be coherent. Also bump the drm minor version to signal the availability of coherent surfaces. Signed-off-by: Thomas Hellstrom --- .../device_include/svga3d_surfacedefs.h | 209 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 390 +- include/uapi/drm/vmwgfx_drm.h | 4 +- 4 files changed, 600 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h index f2bfd3d80598..d901206c04e3 100644 --- a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h +++ b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h @@ -1280,7 +1280,6 @@ svga3dsurface_get_pixel_offset(SVGA3dSurfaceFormat format, return offset; } - static inline u32 svga3dsurface_get_image_offset(SVGA3dSurfaceFormat format, surf_size_struct baseLevelSize, @@ -1375,4 +1374,212 @@ svga3dsurface_is_screen_target_format(SVGA3dSurfaceFormat format) return svga3dsurface_is_dx_screen_target_format(format); } +/** + * struct svga3dsurface_mip - Mimpmap level information + * @bytes: Bytes required in the backing store of this mipmap level. + * @img_stride: Byte stride per image. + * @row_stride: Byte stride per block row. + * @size: The size of the mipmap. + */ +struct svga3dsurface_mip { + size_t bytes; + size_t img_stride; + size_t row_stride; + struct drm_vmw_size size; + +}; + +/** + * struct svga3dsurface_cache - Cached surface information + * @desc: Pointer to the surface descriptor + * @mip: Array of mipmap level information. Valid size is @num_mip_levels. + * @mip_chain_bytes: Bytes required in the backing store for the whole chain + * of mip levels. + * @num_mip_levels: Valid size of the @mip array. Number of mipmap levels in + * a chain. + * @num_layers: Number of slices in an array texture or number of faces in + * a cubemap texture. + */ +struct svga3dsurface_cache { + const struct svga3d_surface_desc *desc; + struct svga3dsurface_mip mip[DRM_VMW_MAX_MIP_LEVELS]; + size_t mip_chain_bytes; + u32 num_mip_levels; + u32 num_layers; +}; + +/** + * struct svga3dsurface_loc - Surface location + * @sub_resource: Surface subresource. Defined as layer * num_mip_levels + + * mip_level. + * @x: X coordinate. + * @y: Y coordinate. + * @z: Z coordinate. + */ +struct svga3dsurface_loc { + u32 sub_resource; + u32 x, y, z; +}; + +/** + * svga3dsurface_subres - Compute the subresource from layer and mipmap. + * @cache: Surface layout data. + * @mip_level: The mipmap level. + * @layer: The surface layer (face or array slice). + * + * Return: The subresource. + */ +static inline u32 svga3dsurface_subres(const struct svga3dsurface_cache *cache, + u32 mip_level, u32 layer) +{ + return cache->num_mip_levels * layer + mip_level; +} + +/** + * svga3dsurface_setup_cache - Build a surface cache entry + * @size: The surface base level dimensions. + * @format: The surface format. + * @num_mip_levels: Number of mipmap levels. + * @num_layers: Number of layers. + * @cache: Pointer to a struct svga3dsurface_cach object to be filled in. + */ +static inline void svga3dsurface_setup_cache(const struct drm_vmw_size *size, +SVGA3dSurfaceFormat format, +u32 num_mip_levels, +u32 num_layers, +u32 num_samples, +struct svga3dsurface_cache *cache) +{ + const struct svga3d_surface_desc *desc; + u32 i; + + memset(cache, 0, sizeof(*cache)); + cache->desc = desc = svga3dsurface_get_desc(format); + cache->num_mip_levels = num_mip_levels; + cache->num_layers = num_layers; + for (i = 0; i < cache->num_mip_levels; i++) { + struct svga3dsurface_mip *mip = >mip[i]; + + mip->size = svga3dsurface_get_mip_size(*size, i); + mip->bytes = svga3dsurface_get_image_buffer_size + (desc, >size, 0) * num_samples; + mip->row_stride = + __KERNEL_DIV_ROUND_UP(mip->size.width, + desc->block_size.width) * + desc->bytes_per_block * num_samples; + mip->img_stride = + __KERNEL_DIV_ROUND_UP(mip->size.height, + desc->block_size.height) * + mip->row_stride; + cache->mip_chain_bytes += mip-&g
Re: [PATCH v2] lib/scatterlist: Provide a DMA page iterator
On Thu, 2019-02-07 at 22:26 +, Jason Gunthorpe wrote: > Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists > w/o > backing pages") introduced the sg_page_iter_dma_address() function > without > providing a way to use it in the general case. If the sg_dma_len() is > not > equal to the sg length callers cannot safely use the > for_each_sg_page/sg_page_iter_dma_address combination. > > Resolve this API mistake by providing a DMA specific iterator, > for_each_sg_dma_page(), that uses the right length so > sg_page_iter_dma_address() works as expected with all sglists. > > A new iterator type is introduced to provide compile-time safety > against > wrongly mixing accessors and iterators. > > Acked-by: Christoph Hellwig (for scatterlist) > Signed-off-by: Jason Gunthorpe > For the vmwgfx part, Acked-by: Thomas Hellstrom I'll take a deeper look to provide a vmwgfx fix as a follow up. Thanks, Thomas
Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs v2
On Mon, 2019-02-04 at 13:11 +0100, Thomas Hellström wrote: > On Mon, 2019-02-04 at 09:19 +0100, Christoph Hellwig wrote: > > On Fri, Jan 25, 2019 at 09:12:13AM +0100, Thomas Hellstrom wrote: > > > -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU) > > > - /* > > > - * No coherent page pool > > > - */ > > > - if (dev_priv->map_mode == vmw_dma_alloc_coherent) > > > + /* No TTM coherent page pool? FIXME: Ask TTM instead! */ > > > + if (!(IS_ENABLED(CONFIG_SWIOTLB) || > > > IS_ENABLED(CONFIG_INTEL_IOMMU)) && > > > + (dev_priv->map_mode == vmw_dma_alloc_coherent)) > > > return -EINVAL; > > > -#endif > > > + > > > > I don't think this edited in change makes any sense. The swiotlb > > vs > > dma-direct versions of dma_alloc_coherent are the same, so this > > check > > seems very obsfucating. > > So this part of code is identical in functionality to the previous > version. It checks whether the TTM module has the coherent page pool > enabled. (an identical test is present in TTM). What we *really* need > to do here instead is to ask TTM whether it has enabled its coherent > page pool instead of trying to mimic TTM's test, and I have a > changeset > under review for that. But as mentioned previously, I don't want to > change the TTM interface outside of a merge window, so we either have > to live with the above for 5.0 or keep the old defines. I'd prefer > the > former so I don't have to respin the patch series once more. > > Thanks, > Thoams > Hi, Christoph, I need to get this merged this week. Could you please ack or ack removing this hunk + updating the following patches for merge errors? If no response, I'll add a Cc: tag on the patch and a #v1 to your s-o- b. Thanks, Thomas
Re: [PATCH] lib/scatterlist: Provide a DMA page iterator
On Thu, 2019-01-17 at 10:30 +0100, h...@lst.de wrote: > On Wed, Jan 16, 2019 at 10:24:36AM -0700, Jason Gunthorpe wrote: > > The fact is there is 0 industry interest in using RDMA on platforms > > that can't do HW DMA cache coherency - the kernel syscalls required > > to > > do the cache flushing on the IO path would just destroy performance > > to > > the point of making RDMA pointless. Better to use netdev on those > > platforms. > > In general there is no syscall required for doing cache flushing, you > just issue the proper instructions directly from userspace. But what if there are other coherence issues? Like bounce-buffers? I'd like to +1 on what Jason says about industry interest: FWIW, vmwgfx is probably one of the graphics drivers that would lend itself best to do a fully-dma-interface compliant graphics stack experiment. But being a paravirtual driver, all platforms we can ever run on are fully coherent unless someone introduces a fake incoherency by forcing swiotlb. Putting many man-months of effort into supporting systems on which we would never run on and can never test on can never make more than academic sense. > > > > The reality is that *all* the subsytems doing DMA kernel bypass are > > ignoring the DMA mapping rules, I think we should support this > > better, > > and just accept that user space DMA will not be using syncing. > > Block > > access in cases when this is required, otherwise let it work as is > > today. > > In that case we just need to block userspace DMA access entirely. > Which given the amount of problems it creates sounds like a pretty > good idea anyway. I'm not sure I'm following you here. Are you suggesting scratching support for all (GP)GPU- and RDMA drivers? Thanks, Thomas
Re: [PATCH v4 2/3] locking: Implement an algorithm choice for Wound-Wait mutexes
Hi, On Wed, 2019-01-16 at 09:24 -0500, Rob Clark wrote: > So, I guess this is to do w/ the magic of merge commits, but it looks > like the hunk changing the crtc_ww_class got lost: So what happened here is that this commit changed it to DEFINE_WD_CLASS and the following commit changed it back again to DEFINE_WW_CLASS so that the algorithm change and only that came with the last commit. Apparently the history of that line got lost when merging since the merge didn't change it; git blame doesn't show it changed, but when looking at the commit history in gitk it's all clear. I guess that's a feature of git. The modeset locks now use true wound-wait rather than wait-die. /Thomas > > ~/src/linux master git show --pretty=short > 08295b3b5beec9aac0f7a9db86f0fc3792039da3 > drivers/gpu/drm/drm_modeset_lock.c > commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3 > Author: Thomas Hellstrom > > locking: Implement an algorithm choice for Wound-Wait mutexes > > diff --git a/drivers/gpu/drm/drm_modeset_lock.c > b/drivers/gpu/drm/drm_modeset_lock.c > index 8a5100685875..638be2eb67b4 100644 > --- a/drivers/gpu/drm/drm_modeset_lock.c > +++ b/drivers/gpu/drm/drm_modeset_lock.c > @@ -70,7 +70,7 @@ > * lists and lookup data structures. > */ > > -static DEFINE_WW_CLASS(crtc_ww_class); > +static DEFINE_WD_CLASS(crtc_ww_class); > > /** > * drm_modeset_lock_all - take all modeset locks > ~/src/linux master git log --pretty=format:%H > drivers/gpu/drm/drm_modeset_lock.c | grep > 08295b3b5beec9aac0f7a9db86f0fc3792039da3 > ~/src/linux master 1 > > > BR, > -R > > > On Tue, Jun 19, 2018 at 4:29 AM Thomas Hellstrom < > thellst...@vmware.com> wrote: > > The current Wound-Wait mutex algorithm is actually not Wound-Wait > > but > > Wait-Die. Implement also Wound-Wait as a per-ww-class choice. > > Wound-Wait > > is, contrary to Wait-Die a preemptive algorithm and is known to > > generate > > fewer backoffs. Testing reveals that this is true if the > > number of simultaneous contending transactions is small. > > As the number of simultaneous contending threads increases, Wait- > > Wound > > becomes inferior to Wait-Die in terms of elapsed time. > > Possibly due to the larger number of held locks of sleeping > > transactions. > > > > Update documentation and callers. > > > > Timings using git://people.freedesktop.org/~thomash/ww_mutex_test > > tag patch-18-06-15 > > > > Each thread runs 10 batches of lock / unlock 800 ww mutexes > > randomly > > chosen out of 10. Four core Intel x86_64: > > > > Algorithm#threads Rollbacks time > > Wound-Wait 4 ~100 ~17s. > > Wait-Die 4 ~15~19s. > > Wound-Wait 16 ~36~109s. > > Wait-Die 16 ~45~82s. > > > > Cc: Ingo Molnar > > Cc: Jonathan Corbet > > Cc: Gustavo Padovan > > Cc: Maarten Lankhorst > > Cc: Sean Paul > > Cc: David Airlie > > Cc: Davidlohr Bueso > > Cc: "Paul E. McKenney" > > Cc: Josh Triplett > > Cc: Thomas Gleixner > > Cc: Kate Stewart > > Cc: Philippe Ombredanne > > Cc: Greg Kroah-Hartman > > Cc: linux-...@vger.kernel.org > > Cc: linux-me...@vger.kernel.org > > Cc: linaro-mm-...@lists.linaro.org > > Co-authored-by: Peter Zijlstra > > Signed-off-by: Thomas Hellstrom > > > > --- > > v2: > > * Update API according to review comment by Greg Kroah-Hartman. > > * Address review comments by Peter Zijlstra: > > - Avoid _Bool in composites > > - Fix typo > > - Use __mutex_owner() where applicable > > - Rely on built-in barriers for the main loop exit condition, > > struct ww_acquire_ctx::wounded. Update code comments. > > - Explain unlocked use of list_empty(). > > v3: > > * Adapt to and incorporate cleanup by Peter Zijlstra > > * Remove unlocked use of list_empty(). > > v4: > > * Move code related to adding a waiter to the lock waiter list to a > > separate function. > > --- > > Documentation/locking/ww-mutex-design.txt | 57 +-- > > drivers/dma-buf/reservation.c | 2 +- > > drivers/gpu/drm/drm_modeset_lock.c| 2 +- > > include/linux/ww_mutex.h | 17 ++- > > kernel/locking/locktorture.c | 2 +- > > kernel/locking/mutex.c| 165 > > +++--- > > kernel/locking/test-ww_mutex.c| 2 +- > > lib/locking-selftest.c
Re: [PATCH] lib/scatterlist: Provide a DMA page iterator
On Tue, 2019-01-15 at 21:58 +0100, h...@lst.de wrote: > On Tue, Jan 15, 2019 at 07:13:11PM +, Koenig, Christian wrote: > > Thomas is correct that the interface you propose here doesn't work > > at > > all for GPUs. > > > > The kernel driver is not informed of flush/sync, but rather just > > setups > > coherent mappings between system memory and devices. > > > > In other words you have an array of struct pages and need to map > > that to > > a specific device and so create dma_addresses for the mappings. > > If you want a coherent mapping you need to use dma_alloc_coherent > and dma_mmap_coherent and you are done, that is not the problem. > That actually is one of the vmgfx modes, so I don't understand what > problem we are trying to solve if you don't actually want a non- > coherent mapping. For vmwgfx, not making dma_alloc_coherent default has a couple of reasons: 1) Memory is associated with a struct device. It has not been clear that it is exportable to other devices. 2) There seems to be restrictions in the system pages allowable. GPUs generally prefer highmem pages but dma_alloc_coherent returns a virtual address implying GFP_KERNEL? While not used by vmwgfx, TTM typically prefers HIGHMEM pages to facilitate caching mode switching without having to touch the kernel map. 3) Historically we had APIs to allow coherent access to user-space defined pages. That has gone away not but the infrastructure was built around it. dma_mmap_coherent isn't use because as the data moves between system memory, swap and VRAM, PTEs of user-space mappings are adjusted accordingly, meaning user-space doesn't have to unmap when an operation is initiated that might mean the data is moved. > Although last time I had that discussion with Daniel Vetter > I was under the impressions that GPUs really wanted non-coherent > mappings. Intel historically has done things a bit differently. And it's also possible that embedded platforms and ARM prefer this mode of operation, but I haven't caught up on that discussion. > > But if you want a coherent mapping you can't go to a struct page, > because on many systems you can't just map arbitrary memory as > uncachable. It might either come from very special limited pools, > or might need other magic applied to it so that it is not visible > in the normal direct mapping, or at least not access through it. The TTM subsystem has been relied on to provide coherent memory with the option to switch caching mode of pages. But only on selected and well tested platforms. On other platforms we simply do not load, and that's fine for now. But as mentioned multiple times, to make GPU drivers more compliant, we'd really want that bool dma_streaming_is_coherent(const struct device *) API to help us decide when to load or not. Thanks, Thomas
Re: [PATCH -next] drm/vmwgfx: Remove set but not used variable 'srf'
Yue, Thanks, Reviewed-by: Thomas Hellstrom Will include in the next -next pull. /Thomas On Wed, 2019-01-16 at 01:55 +, YueHaibing wrote: > Fixes gcc '-Wunused-but-set-variable' warning: > > drivers/gpu/drm/vmwgfx/vmwgfx_surface.c: In function > 'vmw_hw_surface_destroy': > drivers/gpu/drm/vmwgfx/vmwgfx_surface.c:335:22: warning: > variable 'srf' set but not used [-Wunused-but-set-variable] > > It never used since commit 543831cfc976 ï¼^"drm/vmwgfx: Break out > surface and > context management to separate files") > > Signed-off-by: YueHaibing > --- > drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > index 80a01cd..3b6976e 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > @@ -332,7 +332,6 @@ static void vmw_hw_surface_destroy(struct > vmw_resource *res) > { > > struct vmw_private *dev_priv = res->dev_priv; > - struct vmw_surface *srf; > void *cmd; > > if (res->func->destroy == vmw_gb_surface_destroy) { > @@ -359,7 +358,6 @@ static void vmw_hw_surface_destroy(struct > vmw_resource *res) >*/ > > mutex_lock(_priv->cmdbuf_mutex); > - srf = vmw_res_to_srf(res); > dev_priv->used_memory_size -= res->backup_size; > mutex_unlock(_priv->cmdbuf_mutex); > } > > >
Re: [PATCH] lib/scatterlist: Provide a DMA page iterator
On Tue, 2019-01-15 at 16:20 +0100, h...@lst.de wrote: > On Tue, Jan 15, 2019 at 03:24:55PM +0100, Christian König wrote: > > Yeah, indeed. Bounce buffers are an absolute no-go for GPUs. > > > > If the DMA API finds that a piece of memory is not directly > > accessible by > > the GPU we need to return an error and not try to use bounce > > buffers behind > > the surface. > > > > That is something which always annoyed me with the DMA API, which > > is > > otherwise rather cleanly defined. > > That is exactly what I want to fix with my series to make > DMA_ATTR_NON_CONSISTENT more useful and always available: > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fiommu%2F2018-December%2F031985.htmldata=02%7C01%7Cthellstrom%40vmware.com%7Cb1799c4073024a824f9408d67afcfcea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636831624340834140sdata=JiRBfzZMvN3joJ4vKiErbzXAHaNzuBcLapRJDL%2Bt6Hc%3Dreserved=0 > > With that you allocate the memory using dma_alloc_attrs with > DMA_ATTR_NON_CONSISTENT, and use dma_sync_single_* to transfer > ownership to the cpu and back to the device, with a gurantee that > there won't be any bouncing. So far the interest by the parties that > requested the feature has been rather lacklustre, though. In the graphics case, it's probably because it doesn't fit the graphics use-cases: 1) Memory typically needs to be mappable by another device. (the "dma- buf" interface) 2) DMA buffers are exported to user-space and is sub-allocated by it. Mostly there are no GPU user-space kernel interfaces to sync / flush subregions and these syncs may happen on a smaller-than-cache-line granularity. So to help the graphics driver, that coherent flag would be much more useful. /Thomas
Re: [PATCH] lib/scatterlist: Provide a DMA page iterator
Hi, Christoph, On Mon, 2019-01-14 at 10:48 +0100, Christoph Hellwig wrote: > On Thu, Jan 10, 2019 at 04:42:18PM -0700, Jason Gunthorpe wrote: > > > Changes since the RFC: > > > - Rework vmwgfx too [CH] > > > - Use a distinct type for the DMA page iterator [CH] > > > - Do not have a #ifdef [CH] > > > > ChristophH: Will you ack? > > This looks generally fine. > > > Are you still OK with the vmwgfx reworking, or should we go back to > > the original version that didn't have the type safety so this > > driver > > can be left broken? > > I think the map method in vmgfx that just does virt_to_phys is > pretty broken. Thomas, can you check if you see any performance > difference with just doing the proper dma mapping, as that gets the > driver out of interface abuse land? The performance difference is not really the main problem here. The problem is that even though we utilize the streaming DMA interface, we use it only since we have to for DMA-Remapping and assume that the memory is coherent. To be able to be as compliant as possible and ditch the virt-to-phys mode, we *need* a DMA interface flag that tells us when the dma_sync_for_xxx are no-ops. If they aren't we'll refuse to load for now. I'm not sure, but I think also nouveau and radeon suffer from the same issue. > > While we're at it I think we need to merge my series in this area > for 5.0, because without that the driver is already broken. Where > should we merge it? I can merge it through vmwgfx/drm-fixes. There is an outstanding issue with patch 3. Do you want me to fix that up? Thanks, Thomas
Re: fix DMA ops layering violations in vmwgfx
On Tue, 2019-01-08 at 14:12 +0100, h...@lst.de wrote: > On Tue, Jan 08, 2019 at 09:51:45AM +0000, Thomas Hellstrom wrote: > > Hi, Christoph, > > > > On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote: > > > Hi Thomas, > > > > > > vmwgfx has been doing some odd checks based on DMA ops which rely > > > on deep DMA mapping layer internals, and I think the changes in > > > Linux 4.21 finally broke most of these implicit assumptions. > > > > Thanks. > > What we're really trying to do here is to try to detect the > > situation > > where DMA remapping using hardware IOMMUs is going on but memory is > > still coherent, since the driver can currently only work with > > coherent > > memory[1]. Currently we use intel_iommu_enabled to detect this > > situation, but it would be really helpful if there were a generic > > bool > > that advertizes this situation since we need to deal with other > > IOMMUs > > as well going forward. Any suggestion? > > I'm missing the link of the [1] reference above. Sorry. I meant to remove the reference since the explanation would be rather lengthy. > But if you need > coherent memory you should simply always use dma_alloc_coherent, that > is the only gurantee you get. If you use any other dma mapping > methods > you will otherwise need to explicitly transfer ownership by mapping/ > unmapping or using dma_sync* before and after every device access. The problem is that graphics applications typically allocate huge amounts of DMA memory. The GPU may want to map half of available system memory or more. Moreover this memory might need to be mappable by multiple devices, which would rule out dma_alloc_coherent() in many situations. Using SWIOTLB with bounce-buffers is also typically out of the question for performance- and memory usage reasons. Moreover the sync operations would often affect sub-regions of 2D and 3D textures, which makes the dma_sync API inefficient even if it maps to no-ops. vmwgfx would rather not load in these situations. Finally graphics applications, do not always provide conservative sync regions. > > And the whole DMA API bypass using the phys mode is something that > I'd really prefer not to see in any driver as it tends to cause > major problems sooner or later. I fully understand this. However, given the above, the DMA API is quite awkward for graphics operation. It would be easier to motivate a full removal of the phys mode if we could query the DMA API whether the dma_sync operations are no-ops or not. /Thomas
Re: [PATCH 4/4] drm/vmwgfx: unwind spaghetti code in vmw_dma_select_mode
On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote: > Just use a simple if/else chain to select the DMA mode. > > Signed-off-by: Christoph Hellwig Reviewed-by: Thomas Hellstrom > --- > drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 25 ++--- > 1 file changed, 6 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > index c2060f6cc9e8..86387735a90b 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > @@ -566,34 +566,21 @@ static int vmw_dma_select_mode(struct > vmw_private *dev_priv) > [vmw_dma_map_populate] = "Keeping DMA mappings.", > [vmw_dma_map_bind] = "Giving up DMA mappings early."}; > > - if (intel_iommu_enabled) { > + if (vmw_force_coherent) > + dev_priv->map_mode = vmw_dma_alloc_coherent; > + else if (intel_iommu_enabled) > dev_priv->map_mode = vmw_dma_map_populate; > - goto out_fixup; > - } > - > - if (!(vmw_force_iommu || vmw_force_coherent)) { > + else if (!vmw_force_iommu) > dev_priv->map_mode = vmw_dma_phys; > - DRM_INFO("DMA map mode: %s\n", names[dev_priv- > >map_mode]); > - return 0; > - } > - > -#ifdef CONFIG_SWIOTLB > - if (swiotlb_nr_tbl()) > + else if (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl()) > dev_priv->map_mode = vmw_dma_alloc_coherent; > else > -#endif > dev_priv->map_mode = vmw_dma_map_populate; > > -out_fixup: > - if (dev_priv->map_mode == vmw_dma_map_populate && > - vmw_restrict_iommu) > + if (dev_priv->map_mode == vmw_dma_map_populate && > vmw_restrict_iommu) > dev_priv->map_mode = vmw_dma_map_bind; > > - if (vmw_force_coherent) > - dev_priv->map_mode = vmw_dma_alloc_coherent; > - > DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]); > - > return 0; > } >
Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs
Hi, On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote: ... > intel_iommu_enabled is defined as always false for > !CONFIG_INTEL_IOMMU, > so remove the ifdefs around it. > > Signed-off-by: Christoph Hellwig > --- > > -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU) > - /* > - * No coherent page pool > - */ > - if (dev_priv->map_mode == vmw_dma_alloc_coherent) > - return -EINVAL; > -#endif > Actually this hunk is incorrect, it tries to determine whether the TTM subsystem maintains a coherent page pool or not. If not, we can't use vmw_dma_alloc_coherent. /Thomas
Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs
Reviewed-by: Thomas Hellstrom On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote: > intel_iommu_enabled is defined as always false for > !CONFIG_INTEL_IOMMU, > so remove the ifdefs around it. > > Signed-off-by: Christoph Hellwig > --- > drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 18 -- > 1 file changed, 18 deletions(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > index 69e325b2d954..236052ad233c 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c > @@ -567,12 +567,10 @@ static int vmw_dma_select_mode(struct > vmw_private *dev_priv) > [vmw_dma_map_bind] = "Giving up DMA mappings early."}; > const struct dma_map_ops *dma_ops = get_dma_ops(dev_priv->dev- > >dev); > > -#ifdef CONFIG_INTEL_IOMMU > if (intel_iommu_enabled) { > dev_priv->map_mode = vmw_dma_map_populate; > goto out_fixup; > } > -#endif > > if (!(vmw_force_iommu || vmw_force_coherent)) { > dev_priv->map_mode = vmw_dma_phys; > @@ -589,9 +587,7 @@ static int vmw_dma_select_mode(struct vmw_private > *dev_priv) > dev_priv->map_mode = vmw_dma_map_populate; > #endif > > -#ifdef CONFIG_INTEL_IOMMU > out_fixup: > -#endif > if (dev_priv->map_mode == vmw_dma_map_populate && > vmw_restrict_iommu) > dev_priv->map_mode = vmw_dma_map_bind; > @@ -599,13 +595,6 @@ static int vmw_dma_select_mode(struct > vmw_private *dev_priv) > if (vmw_force_coherent) > dev_priv->map_mode = vmw_dma_alloc_coherent; > > -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU) > - /* > - * No coherent page pool > - */ > - if (dev_priv->map_mode == vmw_dma_alloc_coherent) > - return -EINVAL; > -#endif > DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]); > > return 0; > @@ -619,7 +608,6 @@ static int vmw_dma_select_mode(struct vmw_private > *dev_priv) > * With 32-bit we can only handle 32 bit PFNs. Optionally set that > * restriction also for 64-bit systems. > */ > -#ifdef CONFIG_INTEL_IOMMU > static int vmw_dma_masks(struct vmw_private *dev_priv) > { > struct drm_device *dev = dev_priv->dev; > @@ -631,12 +619,6 @@ static int vmw_dma_masks(struct vmw_private > *dev_priv) > } > return 0; > } > -#else > -static int vmw_dma_masks(struct vmw_private *dev_priv) > -{ > - return 0; > -} > -#endif > > static int vmw_driver_load(struct drm_device *dev, unsigned long > chipset) > {
Re: fix DMA ops layering violations in vmwgfx
Hi, Christoph, On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote: > Hi Thomas, > > vmwgfx has been doing some odd checks based on DMA ops which rely > on deep DMA mapping layer internals, and I think the changes in > Linux 4.21 finally broke most of these implicit assumptions. Thanks. What we're really trying to do here is to try to detect the situation where DMA remapping using hardware IOMMUs is going on but memory is still coherent, since the driver can currently only work with coherent memory[1]. Currently we use intel_iommu_enabled to detect this situation, but it would be really helpful if there were a generic bool that advertizes this situation since we need to deal with other IOMMUs as well going forward. Any suggestion? Comments on the patches separately. Thanks, Thomas > > The real fix is in patch 3, but I think the others are important > to make it clear what is actually going on.
Re: [PATCH][drm-next] drm/selftest: fix spelling mistake "dimention" -> "dimenstion"
On Mon, 2018-12-10 at 09:26 +, Colin King wrote: Reviewed-by: Thomas Hellstrom I'll take this in the next pull request unless I'm told otherwise. /Thomas > From: Colin Ian King > > There is a spelling mistake in a pr_err message, fix this. > > Signed-off-by: Colin Ian King > --- > drivers/gpu/drm/selftests/test-drm_damage_helper.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/selftests/test-drm_damage_helper.c > b/drivers/gpu/drm/selftests/test-drm_damage_helper.c > index a2f753205a3e..9d2bcdf8bc29 100644 > --- a/drivers/gpu/drm/selftests/test-drm_damage_helper.c > +++ b/drivers/gpu/drm/selftests/test-drm_damage_helper.c > @@ -53,7 +53,7 @@ static bool check_damage_clip(struct > drm_plane_state *state, struct drm_rect *r, > int src_y2 = (state->src.y2 >> 16) + !!(state->src.y2 & > 0x); > > if (x1 >= x2 || y1 >= y2) { > - pr_err("Cannot have damage clip with no dimention.\n"); > + pr_err("Cannot have damage clip with no dimension.\n"); > return false; > } >
[tip:locking/core] locking/mutex: Fix mutex debug call and ww_mutex documentation
Commit-ID: e13e2366d8415e029fe96a62502955083e272cef Gitweb: https://git.kernel.org/tip/e13e2366d8415e029fe96a62502955083e272cef Author: Thomas Hellstrom AuthorDate: Mon, 3 Sep 2018 16:07:08 +0200 Committer: Ingo Molnar CommitDate: Mon, 10 Sep 2018 10:05:10 +0200 locking/mutex: Fix mutex debug call and ww_mutex documentation The following commit: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes") introduced a reference in the documentation to a function that was removed in an earlier commit. It also forgot to remove a call to debug_mutex_add_waiter() which is now unconditionally called by __mutex_add_waiter(). Fix those bugs. Signed-off-by: Thomas Hellstrom Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dri-de...@lists.freedesktop.org Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes") Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellst...@vmware.com Signed-off-by: Ingo Molnar --- kernel/locking/mutex.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 1a81a1257b3f..3f8a35104285 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -389,7 +389,7 @@ static bool __ww_mutex_wound(struct mutex *lock, /* * wake_up_process() paired with set_current_state() * inserts sufficient barriers to make sure @owner either sees -* it's wounded in __ww_mutex_lock_check_stamp() or has a +* it's wounded in __ww_mutex_check_kill() or has a * wakeup pending to re-read the wounded state. */ if (owner != current) @@ -946,7 +946,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, } debug_mutex_lock_common(lock, ); - debug_mutex_add_waiter(lock, , current); lock_contended(>dep_map, ip);
[tip:locking/core] locking/mutex: Fix mutex debug call and ww_mutex documentation
Commit-ID: e13e2366d8415e029fe96a62502955083e272cef Gitweb: https://git.kernel.org/tip/e13e2366d8415e029fe96a62502955083e272cef Author: Thomas Hellstrom AuthorDate: Mon, 3 Sep 2018 16:07:08 +0200 Committer: Ingo Molnar CommitDate: Mon, 10 Sep 2018 10:05:10 +0200 locking/mutex: Fix mutex debug call and ww_mutex documentation The following commit: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes") introduced a reference in the documentation to a function that was removed in an earlier commit. It also forgot to remove a call to debug_mutex_add_waiter() which is now unconditionally called by __mutex_add_waiter(). Fix those bugs. Signed-off-by: Thomas Hellstrom Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dri-de...@lists.freedesktop.org Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes") Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellst...@vmware.com Signed-off-by: Ingo Molnar --- kernel/locking/mutex.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 1a81a1257b3f..3f8a35104285 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -389,7 +389,7 @@ static bool __ww_mutex_wound(struct mutex *lock, /* * wake_up_process() paired with set_current_state() * inserts sufficient barriers to make sure @owner either sees -* it's wounded in __ww_mutex_lock_check_stamp() or has a +* it's wounded in __ww_mutex_check_kill() or has a * wakeup pending to re-read the wounded state. */ if (owner != current) @@ -946,7 +946,6 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, } debug_mutex_lock_common(lock, ); - debug_mutex_add_waiter(lock, , current); lock_contended(>dep_map, ip);
Re: [PATCH v4 0/4] locking,drm: Fix ww mutex naming / algorithm inconsistency
On 06/19/2018 11:45 AM, Peter Zijlstra wrote: I suspect you want this through the DRM tree? Ingo are you OK with that? Yes, I can ask Dave to pull this. Ingo? Thanks, Thomas
Re: [PATCH v4 0/4] locking,drm: Fix ww mutex naming / algorithm inconsistency
On 06/19/2018 11:45 AM, Peter Zijlstra wrote: I suspect you want this through the DRM tree? Ingo are you OK with that? Yes, I can ask Dave to pull this. Ingo? Thanks, Thomas
Re: [PATCH 1/3] locking: WW mutex cleanup
On 06/19/2018 11:44 AM, Peter Zijlstra wrote: On Tue, Jun 19, 2018 at 10:24:43AM +0200, Thomas Hellstrom wrote: From: Peter Ziljstra Make the WW mutex code more readable by adding comments, splitting up functions and pointing out that we're actually using the Wait-Die algorithm. Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Gustavo Padovan Cc: Maarten Lankhorst Cc: Sean Paul Cc: David Airlie Cc: Davidlohr Bueso Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Thomas Gleixner Cc: Kate Stewart Cc: Philippe Ombredanne Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Co-authored-by: Thomas Hellstrom Signed-off-by: Thomas Hellstrom --- Documentation/locking/ww-mutex-design.txt | 12 +- include/linux/ww_mutex.h | 28 ++--- kernel/locking/mutex.c| 202 ++ 3 files changed, 145 insertions(+), 97 deletions(-) Acked-by: Peter Zijlstra (Intel) Hi Peter, Do you want to add a SOB, since you're the main author? Thomas
Re: [PATCH 1/3] locking: WW mutex cleanup
On 06/19/2018 11:44 AM, Peter Zijlstra wrote: On Tue, Jun 19, 2018 at 10:24:43AM +0200, Thomas Hellstrom wrote: From: Peter Ziljstra Make the WW mutex code more readable by adding comments, splitting up functions and pointing out that we're actually using the Wait-Die algorithm. Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Gustavo Padovan Cc: Maarten Lankhorst Cc: Sean Paul Cc: David Airlie Cc: Davidlohr Bueso Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Thomas Gleixner Cc: Kate Stewart Cc: Philippe Ombredanne Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Co-authored-by: Thomas Hellstrom Signed-off-by: Thomas Hellstrom --- Documentation/locking/ww-mutex-design.txt | 12 +- include/linux/ww_mutex.h | 28 ++--- kernel/locking/mutex.c| 202 ++ 3 files changed, 145 insertions(+), 97 deletions(-) Acked-by: Peter Zijlstra (Intel) Hi Peter, Do you want to add a SOB, since you're the main author? Thomas
Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
On 06/14/2018 03:29 PM, Matthew Wilcox wrote: On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote: On 06/14/2018 01:36 PM, Peter Zijlstra wrote: Currently you don't allow mixing WD and WW contexts (which is not immediately obvious from the above code), and the above hard relies on that. Are there sensible use cases for mixing them? IOW will your current restriction stand without hassle? Contexts _must_ agree on the algorithm used to resolve deadlocks. With Wait-Die, for example, older transactions will wait if a lock is held by a younger transaction and with Wound-Wait, younger transactions will wait if a lock is held by an older transaction so there is no way of mixing them. Maybe the compiler should be enforcing that; ie make it a different type? It's intended to be enforced by storing the algorithm choice in the WW_MUTEX_CLASS which must be common for an acquire context and the ww_mutexes it acquires. However, I don't think there is a check that that holds. I guess we could add it as a DEBUG_MUTEX test in ww_mutex_lock(). Thanks, Thomas
Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
On 06/14/2018 03:29 PM, Matthew Wilcox wrote: On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote: On 06/14/2018 01:36 PM, Peter Zijlstra wrote: Currently you don't allow mixing WD and WW contexts (which is not immediately obvious from the above code), and the above hard relies on that. Are there sensible use cases for mixing them? IOW will your current restriction stand without hassle? Contexts _must_ agree on the algorithm used to resolve deadlocks. With Wait-Die, for example, older transactions will wait if a lock is held by a younger transaction and with Wound-Wait, younger transactions will wait if a lock is held by an older transaction so there is no way of mixing them. Maybe the compiler should be enforcing that; ie make it a different type? It's intended to be enforced by storing the algorithm choice in the WW_MUTEX_CLASS which must be common for an acquire context and the ww_mutexes it acquires. However, I don't think there is a check that that holds. I guess we could add it as a DEBUG_MUTEX test in ww_mutex_lock(). Thanks, Thomas
[PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
The current Wound-Wait mutex algorithm is actually not Wound-Wait but Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait is, contrary to Wait-Die a preemptive algorithm and is known to generate fewer backoffs. Testing reveals that this is true if the number of simultaneous contending transactions is small. As the number of simultaneous contending threads increases, Wait-Wound becomes inferior to Wait-Die in terms of elapsed time. Possibly due to the larger number of held locks of sleeping transactions. Update documentation and callers. Timings using git://people.freedesktop.org/~thomash/ww_mutex_test tag patch-18-06-04 Each thread runs 10 batches of lock / unlock 800 ww mutexes randomly chosen out of 10. Four core Intel x86_64: Algorithm#threads Rollbacks time Wound-Wait 4 ~100 ~17s. Wait-Die 4 ~15~19s. Wound-Wait 16 ~36~109s. Wait-Die 16 ~45~82s. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Gustavo Padovan Cc: Maarten Lankhorst Cc: Sean Paul Cc: David Airlie Cc: Davidlohr Bueso Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Thomas Gleixner Cc: Kate Stewart Cc: Philippe Ombredanne Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Signed-off-by: Thomas Hellstrom --- Documentation/locking/ww-mutex-design.txt | 57 ++ drivers/dma-buf/reservation.c | 2 +- drivers/gpu/drm/drm_modeset_lock.c| 2 +- include/linux/ww_mutex.h | 19 -- kernel/locking/locktorture.c | 2 +- kernel/locking/mutex.c| 98 --- kernel/locking/test-ww_mutex.c| 2 +- lib/locking-selftest.c| 2 +- 8 files changed, 152 insertions(+), 32 deletions(-) diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt index 34c3a1b50b9a..29c85623b551 100644 --- a/Documentation/locking/ww-mutex-design.txt +++ b/Documentation/locking/ww-mutex-design.txt @@ -1,4 +1,4 @@ -Wait/Wound Deadlock-Proof Mutex Design +Wound/Wait Deadlock-Proof Mutex Design == Please read mutex-design.txt first, as it applies to wait/wound mutexes too. @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the younger task) unlocks all of the buffers that it has already locked, and then tries again. -In the RDBMS literature this deadlock handling approach is called wait/wound: -The older tasks waits until it can acquire the contended lock. The younger tasks -needs to back off and drop all the locks it is currently holding, i.e. the -younger task is wounded. +In the RDBMS literature, a reservation ticket is associated with a transaction. +and the deadlock handling approach is called Wait-Die. The name is based on +the actions of a locking thread when it encounters an already locked mutex. +If the transaction holding the lock is younger, the locking transaction waits. +If the transaction holding the lock is older, the locking transaction backs off +and dies. Hence Wait-Die. +There is also another algorithm called Wound-Wait: +If the transaction holding the lock is younger, the locking transaction +preempts the transaction holding the lock, requiring it to back off. It +Wounds the other transaction. +If the transaction holding the lock is older, it waits for the other +transaction. Hence Wound-Wait. +The two algorithms are both fair in that a transaction will eventually succeed. +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs +compared to Wait-Die, but is, on the other hand, associated with more work than +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive +algorithm which requires a reliable way to preempt another transaction. Concepts @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task trying to acquire locks doesn't grab a new reservation id, but keeps the one it acquired when starting the lock acquisition. This ticket is stored in the acquire context. Furthermore the acquire context keeps track of debugging state -to catch w/w mutex interface abuse. +to catch w/w mutex interface abuse. An acquire context is representing a +transaction. W/w class: In contrast to normal mutexes the lock class needs to be explicit for -w/w mutexes, since it is required to initialize the acquire context. +w/w mutexes, since it is required to initialize the acquire context. The lock +class also specifies what algorithm to use, Wound-Wait or Wait-Die. Furthermore there are three different class of w/w lock acquire functions: @@ -90,10 +105,15 @@ provided. Usage - +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
[PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
The current Wound-Wait mutex algorithm is actually not Wound-Wait but Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait is, contrary to Wait-Die a preemptive algorithm and is known to generate fewer backoffs. Testing reveals that this is true if the number of simultaneous contending transactions is small. As the number of simultaneous contending threads increases, Wait-Wound becomes inferior to Wait-Die in terms of elapsed time. Possibly due to the larger number of held locks of sleeping transactions. Update documentation and callers. Timings using git://people.freedesktop.org/~thomash/ww_mutex_test tag patch-18-06-04 Each thread runs 10 batches of lock / unlock 800 ww mutexes randomly chosen out of 10. Four core Intel x86_64: Algorithm#threads Rollbacks time Wound-Wait 4 ~100 ~17s. Wait-Die 4 ~15~19s. Wound-Wait 16 ~36~109s. Wait-Die 16 ~45~82s. Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Jonathan Corbet Cc: Gustavo Padovan Cc: Maarten Lankhorst Cc: Sean Paul Cc: David Airlie Cc: Davidlohr Bueso Cc: "Paul E. McKenney" Cc: Josh Triplett Cc: Thomas Gleixner Cc: Kate Stewart Cc: Philippe Ombredanne Cc: Greg Kroah-Hartman Cc: linux-...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org Signed-off-by: Thomas Hellstrom --- Documentation/locking/ww-mutex-design.txt | 57 ++ drivers/dma-buf/reservation.c | 2 +- drivers/gpu/drm/drm_modeset_lock.c| 2 +- include/linux/ww_mutex.h | 19 -- kernel/locking/locktorture.c | 2 +- kernel/locking/mutex.c| 98 --- kernel/locking/test-ww_mutex.c| 2 +- lib/locking-selftest.c| 2 +- 8 files changed, 152 insertions(+), 32 deletions(-) diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt index 34c3a1b50b9a..29c85623b551 100644 --- a/Documentation/locking/ww-mutex-design.txt +++ b/Documentation/locking/ww-mutex-design.txt @@ -1,4 +1,4 @@ -Wait/Wound Deadlock-Proof Mutex Design +Wound/Wait Deadlock-Proof Mutex Design == Please read mutex-design.txt first, as it applies to wait/wound mutexes too. @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the younger task) unlocks all of the buffers that it has already locked, and then tries again. -In the RDBMS literature this deadlock handling approach is called wait/wound: -The older tasks waits until it can acquire the contended lock. The younger tasks -needs to back off and drop all the locks it is currently holding, i.e. the -younger task is wounded. +In the RDBMS literature, a reservation ticket is associated with a transaction. +and the deadlock handling approach is called Wait-Die. The name is based on +the actions of a locking thread when it encounters an already locked mutex. +If the transaction holding the lock is younger, the locking transaction waits. +If the transaction holding the lock is older, the locking transaction backs off +and dies. Hence Wait-Die. +There is also another algorithm called Wound-Wait: +If the transaction holding the lock is younger, the locking transaction +preempts the transaction holding the lock, requiring it to back off. It +Wounds the other transaction. +If the transaction holding the lock is older, it waits for the other +transaction. Hence Wound-Wait. +The two algorithms are both fair in that a transaction will eventually succeed. +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs +compared to Wait-Die, but is, on the other hand, associated with more work than +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive +algorithm which requires a reliable way to preempt another transaction. Concepts @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task trying to acquire locks doesn't grab a new reservation id, but keeps the one it acquired when starting the lock acquisition. This ticket is stored in the acquire context. Furthermore the acquire context keeps track of debugging state -to catch w/w mutex interface abuse. +to catch w/w mutex interface abuse. An acquire context is representing a +transaction. W/w class: In contrast to normal mutexes the lock class needs to be explicit for -w/w mutexes, since it is required to initialize the acquire context. +w/w mutexes, since it is required to initialize the acquire context. The lock +class also specifies what algorithm to use, Wound-Wait or Wait-Die. Furthermore there are three different class of w/w lock acquire functions: @@ -90,10 +105,15 @@ provided. Usage - +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
[PATCH RFC 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
The current Wound-Wait mutex algorithm is actually not Wound-Wait but Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait is, contrary to Wait-Die a preemptive algorithm and is known to generate fewer backoffs in some cases. Testing reveals that this is true if the number of simultaneous contending transactions is small. As the number of simultaneous contending threads increases, Wait-Wound becomes inferior to Wait-Die, probably due to the larger number of held locks of sleeping transactions. Update documentation and callers. Signed-off-by: Thomas Hellstrom <thellst...@vmware.com> --- Documentation/locking/ww-mutex-design.txt | 57 + drivers/dma-buf/reservation.c | 2 +- drivers/gpu/drm/drm_modeset_lock.c| 2 +- include/linux/ww_mutex.h | 19 -- kernel/locking/locktorture.c | 2 +- kernel/locking/mutex.c| 101 +++--- kernel/locking/test-ww_mutex.c| 2 +- lib/locking-selftest.c| 2 +- 8 files changed, 154 insertions(+), 33 deletions(-) diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt index 34c3a1b50b9a..29c85623b551 100644 --- a/Documentation/locking/ww-mutex-design.txt +++ b/Documentation/locking/ww-mutex-design.txt @@ -1,4 +1,4 @@ -Wait/Wound Deadlock-Proof Mutex Design +Wound/Wait Deadlock-Proof Mutex Design == Please read mutex-design.txt first, as it applies to wait/wound mutexes too. @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the younger task) unlocks all of the buffers that it has already locked, and then tries again. -In the RDBMS literature this deadlock handling approach is called wait/wound: -The older tasks waits until it can acquire the contended lock. The younger tasks -needs to back off and drop all the locks it is currently holding, i.e. the -younger task is wounded. +In the RDBMS literature, a reservation ticket is associated with a transaction. +and the deadlock handling approach is called Wait-Die. The name is based on +the actions of a locking thread when it encounters an already locked mutex. +If the transaction holding the lock is younger, the locking transaction waits. +If the transaction holding the lock is older, the locking transaction backs off +and dies. Hence Wait-Die. +There is also another algorithm called Wound-Wait: +If the transaction holding the lock is younger, the locking transaction +preempts the transaction holding the lock, requiring it to back off. It +Wounds the other transaction. +If the transaction holding the lock is older, it waits for the other +transaction. Hence Wound-Wait. +The two algorithms are both fair in that a transaction will eventually succeed. +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs +compared to Wait-Die, but is, on the other hand, associated with more work than +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive +algorithm which requires a reliable way to preempt another transaction. Concepts @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task trying to acquire locks doesn't grab a new reservation id, but keeps the one it acquired when starting the lock acquisition. This ticket is stored in the acquire context. Furthermore the acquire context keeps track of debugging state -to catch w/w mutex interface abuse. +to catch w/w mutex interface abuse. An acquire context is representing a +transaction. W/w class: In contrast to normal mutexes the lock class needs to be explicit for -w/w mutexes, since it is required to initialize the acquire context. +w/w mutexes, since it is required to initialize the acquire context. The lock +class also specifies what algorithm to use, Wound-Wait or Wait-Die. Furthermore there are three different class of w/w lock acquire functions: @@ -90,10 +105,15 @@ provided. Usage - +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you +typically expect the number of simultaneous competing transactions to be small, +and the rollback cost can be substantial. + Three different ways to acquire locks within the same w/w class. Common definitions for methods #1 and #2: -static DEFINE_WW_CLASS(ww_class); +static DEFINE_WW_CLASS(ww_class, false); struct obj { struct ww_mutex lock; @@ -243,7 +263,7 @@ struct obj { struct list_head locked_list; }; -static DEFINE_WW_CLASS(ww_class); +static DEFINE_WW_CLASS(ww_class, false); void __unlock_objs(struct list_head *list) { @@ -312,12 +332,23 @@ Design: We maintain the following invariants for the wait list: (1) Waiters with an acquire context are sorted by stamp order; waiters w
[PATCH RFC 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes
The current Wound-Wait mutex algorithm is actually not Wound-Wait but Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait is, contrary to Wait-Die a preemptive algorithm and is known to generate fewer backoffs in some cases. Testing reveals that this is true if the number of simultaneous contending transactions is small. As the number of simultaneous contending threads increases, Wait-Wound becomes inferior to Wait-Die, probably due to the larger number of held locks of sleeping transactions. Update documentation and callers. Signed-off-by: Thomas Hellstrom --- Documentation/locking/ww-mutex-design.txt | 57 + drivers/dma-buf/reservation.c | 2 +- drivers/gpu/drm/drm_modeset_lock.c| 2 +- include/linux/ww_mutex.h | 19 -- kernel/locking/locktorture.c | 2 +- kernel/locking/mutex.c| 101 +++--- kernel/locking/test-ww_mutex.c| 2 +- lib/locking-selftest.c| 2 +- 8 files changed, 154 insertions(+), 33 deletions(-) diff --git a/Documentation/locking/ww-mutex-design.txt b/Documentation/locking/ww-mutex-design.txt index 34c3a1b50b9a..29c85623b551 100644 --- a/Documentation/locking/ww-mutex-design.txt +++ b/Documentation/locking/ww-mutex-design.txt @@ -1,4 +1,4 @@ -Wait/Wound Deadlock-Proof Mutex Design +Wound/Wait Deadlock-Proof Mutex Design == Please read mutex-design.txt first, as it applies to wait/wound mutexes too. @@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher reservation id (i.e. the younger task) unlocks all of the buffers that it has already locked, and then tries again. -In the RDBMS literature this deadlock handling approach is called wait/wound: -The older tasks waits until it can acquire the contended lock. The younger tasks -needs to back off and drop all the locks it is currently holding, i.e. the -younger task is wounded. +In the RDBMS literature, a reservation ticket is associated with a transaction. +and the deadlock handling approach is called Wait-Die. The name is based on +the actions of a locking thread when it encounters an already locked mutex. +If the transaction holding the lock is younger, the locking transaction waits. +If the transaction holding the lock is older, the locking transaction backs off +and dies. Hence Wait-Die. +There is also another algorithm called Wound-Wait: +If the transaction holding the lock is younger, the locking transaction +preempts the transaction holding the lock, requiring it to back off. It +Wounds the other transaction. +If the transaction holding the lock is older, it waits for the other +transaction. Hence Wound-Wait. +The two algorithms are both fair in that a transaction will eventually succeed. +However, the Wound-Wait algorithm is typically stated to generate fewer backoffs +compared to Wait-Die, but is, on the other hand, associated with more work than +Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive +algorithm which requires a reliable way to preempt another transaction. Concepts @@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is important the a task trying to acquire locks doesn't grab a new reservation id, but keeps the one it acquired when starting the lock acquisition. This ticket is stored in the acquire context. Furthermore the acquire context keeps track of debugging state -to catch w/w mutex interface abuse. +to catch w/w mutex interface abuse. An acquire context is representing a +transaction. W/w class: In contrast to normal mutexes the lock class needs to be explicit for -w/w mutexes, since it is required to initialize the acquire context. +w/w mutexes, since it is required to initialize the acquire context. The lock +class also specifies what algorithm to use, Wound-Wait or Wait-Die. Furthermore there are three different class of w/w lock acquire functions: @@ -90,10 +105,15 @@ provided. Usage - +The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die +argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you +typically expect the number of simultaneous competing transactions to be small, +and the rollback cost can be substantial. + Three different ways to acquire locks within the same w/w class. Common definitions for methods #1 and #2: -static DEFINE_WW_CLASS(ww_class); +static DEFINE_WW_CLASS(ww_class, false); struct obj { struct ww_mutex lock; @@ -243,7 +263,7 @@ struct obj { struct list_head locked_list; }; -static DEFINE_WW_CLASS(ww_class); +static DEFINE_WW_CLASS(ww_class, false); void __unlock_objs(struct list_head *list) { @@ -312,12 +332,23 @@ Design: We maintain the following invariants for the wait list: (1) Waiters with an acquire context are sorted by stamp order; waiters without an acquire context
[PATCH RFC 0/2] locking: Fix ww_mutex algorithm inconsistency.
The algorithm used for linux Wound/Wait mutexes, is actually not Wound/Wait but Wait/Die. See for example http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html Rather than renaming them across the tree to something like Wait/Die mutexes or Deadlock Avoidance mutexes, this patch set implements also the Wound/Wait algorithm It shouldn't touch the binary ordinary mutex paths when compiled with otpimization. We use Wound/Wait for the modeset locks which in theory may benefit slightly from Wound/Wait rather than Wait/Die, but Wait/Die is actually superior for larger number of simultaneous contending transactions so we keep the original implementation as a choice, and don't touch the algorithm used by the reservation objects. Performance- and functional testing has been done using git://people.freedesktop.org/~thomash/ww_mutex_test Using WW_BUILTIN and 4 and 16 competing threads. Another option is of course to ignore the the naming / algorithm inconsistency.
[PATCH RFC 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.
For modeset locks we don't expect a high number of contending transactions so change algorithm from Wait-Die to Wound-Wait. Signed-off-by: Thomas Hellstrom <thellst...@vmware.com> --- drivers/gpu/drm/drm_modeset_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c index f22a7ef41de1..294997765a2c 100644 --- a/drivers/gpu/drm/drm_modeset_lock.c +++ b/drivers/gpu/drm/drm_modeset_lock.c @@ -70,7 +70,7 @@ * lists and lookup data structures. */ -static DEFINE_WW_CLASS(crtc_ww_class, true); +static DEFINE_WW_CLASS(crtc_ww_class, false); /** * drm_modeset_lock_all - take all modeset locks -- 2.14.3
[PATCH RFC 0/2] locking: Fix ww_mutex algorithm inconsistency.
The algorithm used for linux Wound/Wait mutexes, is actually not Wound/Wait but Wait/Die. See for example http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html Rather than renaming them across the tree to something like Wait/Die mutexes or Deadlock Avoidance mutexes, this patch set implements also the Wound/Wait algorithm It shouldn't touch the binary ordinary mutex paths when compiled with otpimization. We use Wound/Wait for the modeset locks which in theory may benefit slightly from Wound/Wait rather than Wait/Die, but Wait/Die is actually superior for larger number of simultaneous contending transactions so we keep the original implementation as a choice, and don't touch the algorithm used by the reservation objects. Performance- and functional testing has been done using git://people.freedesktop.org/~thomash/ww_mutex_test Using WW_BUILTIN and 4 and 16 competing threads. Another option is of course to ignore the the naming / algorithm inconsistency.
[PATCH RFC 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.
For modeset locks we don't expect a high number of contending transactions so change algorithm from Wait-Die to Wound-Wait. Signed-off-by: Thomas Hellstrom --- drivers/gpu/drm/drm_modeset_lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_modeset_lock.c b/drivers/gpu/drm/drm_modeset_lock.c index f22a7ef41de1..294997765a2c 100644 --- a/drivers/gpu/drm/drm_modeset_lock.c +++ b/drivers/gpu/drm/drm_modeset_lock.c @@ -70,7 +70,7 @@ * lists and lookup data structures. */ -static DEFINE_WW_CLASS(crtc_ww_class, true); +static DEFINE_WW_CLASS(crtc_ww_class, false); /** * drm_modeset_lock_all - take all modeset locks -- 2.14.3
Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping
On 04/27/2018 06:56 PM, Robin Murphy wrote: Hi Thomas, On 25/04/18 14:21, Thomas Hellstrom wrote: Hi, Robin, Thanks for the patch. It was some time since I put together that code, but I remember hitting something similar to https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linuxquestions.org_questions_linux-2Dkernel-2D70_-2527nents-2527-2Dargument-2Dof-2Ddma-5Funmap-5Fsg-2D4175621964_=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=UACKfhMfw9wac0BNUWXnAiivjaBgY_jAEupre0zXoOQ=8NQNd-XBCViYHJH4fHk-RluFYd9CDjbYzXl_BWhC0ig= Even if it's clear from the documentation that orig_nents should be used. Hmmm, it's odd that you would see issues - it's always been something that CONFIG_DMA_API_DEBUG would have screamed about, and as far as I'm aware for x86, nents and orig_nents should always end up equal anyway. I would definitely be interested to see the specific fault details if it can be reproduced. I suppose one possibility is that there's some path where you inadvertently unmap something which was never mapped, but passing nents=0 means you manage to get away with it without the DMA API backend trying to interpret any bogus DMA addresses/lengths. FWIW, the rationale is that sync_sg/unmap_sg operate on sg->page (which can always be translated back to a meaningful CPU address for cache/write buffer maintenance), not sg->dma_address (which sometimes cannot), therefore passing a truncated list will have the effect of just not syncing the tail end of the buffer, which is clearly bad. Robin. I agree. I browsed the current software- and hardware iommu dma backends and all of them seem to set nents == orig_nents. Still, according to the docs sg_map() is free to erase any sg->page - related information and given that, it would be more natural if all operations after sg_map() would operate on sg->dma_address. I mean if sg_map would truncate the sg list length to 1, and erase all other information (which it clearly is free to do according to the docs), it would be pretty meaningless to supply orig_nents for the unmapping operation? /Thomas On 04/13/2018 05:14 PM, Robin Murphy wrote: dma_unmap_sg() should be called with the same number of entries originally passed to dma_map_sg(), not the number it returned, which may be fewer. Admittedly this driver probably never runs on non-coherent architectures where getting that wrong could lead to data loss, but it's always good to be correct, and it's trivially easy to fix by just restoring the SG table state before the call instead of afterwards. Signed-off-by: Robin Murphy <robin.mur...@arm.com> --- Found by inspection while poking around TTM users. drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c index 2fd091f9..971223d39469 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c @@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt *vmw_tt) { struct device *dev = vmw_tt->dev_priv->dev->dev; + vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents, DMA_BIDIRECTIONAL); - vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; } /**
Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping
On 04/27/2018 06:56 PM, Robin Murphy wrote: Hi Thomas, On 25/04/18 14:21, Thomas Hellstrom wrote: Hi, Robin, Thanks for the patch. It was some time since I put together that code, but I remember hitting something similar to https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linuxquestions.org_questions_linux-2Dkernel-2D70_-2527nents-2527-2Dargument-2Dof-2Ddma-5Funmap-5Fsg-2D4175621964_=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=UACKfhMfw9wac0BNUWXnAiivjaBgY_jAEupre0zXoOQ=8NQNd-XBCViYHJH4fHk-RluFYd9CDjbYzXl_BWhC0ig= Even if it's clear from the documentation that orig_nents should be used. Hmmm, it's odd that you would see issues - it's always been something that CONFIG_DMA_API_DEBUG would have screamed about, and as far as I'm aware for x86, nents and orig_nents should always end up equal anyway. I would definitely be interested to see the specific fault details if it can be reproduced. I suppose one possibility is that there's some path where you inadvertently unmap something which was never mapped, but passing nents=0 means you manage to get away with it without the DMA API backend trying to interpret any bogus DMA addresses/lengths. FWIW, the rationale is that sync_sg/unmap_sg operate on sg->page (which can always be translated back to a meaningful CPU address for cache/write buffer maintenance), not sg->dma_address (which sometimes cannot), therefore passing a truncated list will have the effect of just not syncing the tail end of the buffer, which is clearly bad. Robin. I agree. I browsed the current software- and hardware iommu dma backends and all of them seem to set nents == orig_nents. Still, according to the docs sg_map() is free to erase any sg->page - related information and given that, it would be more natural if all operations after sg_map() would operate on sg->dma_address. I mean if sg_map would truncate the sg list length to 1, and erase all other information (which it clearly is free to do according to the docs), it would be pretty meaningless to supply orig_nents for the unmapping operation? /Thomas On 04/13/2018 05:14 PM, Robin Murphy wrote: dma_unmap_sg() should be called with the same number of entries originally passed to dma_map_sg(), not the number it returned, which may be fewer. Admittedly this driver probably never runs on non-coherent architectures where getting that wrong could lead to data loss, but it's always good to be correct, and it's trivially easy to fix by just restoring the SG table state before the call instead of afterwards. Signed-off-by: Robin Murphy --- Found by inspection while poking around TTM users. drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c index 2fd091f9..971223d39469 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c @@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt *vmw_tt) { struct device *dev = vmw_tt->dev_priv->dev->dev; + vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents, DMA_BIDIRECTIONAL); - vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; } /**
Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping
Hi, Robin, Thanks for the patch. It was some time since I put together that code, but I remember hitting something similar to https://www.linuxquestions.org/questions/linux-kernel-70/%27nents%27-argument-of-dma_unmap_sg-4175621964/ Even if it's clear from the documentation that orig_nents should be used. /Thomas On 04/13/2018 05:14 PM, Robin Murphy wrote: dma_unmap_sg() should be called with the same number of entries originally passed to dma_map_sg(), not the number it returned, which may be fewer. Admittedly this driver probably never runs on non-coherent architectures where getting that wrong could lead to data loss, but it's always good to be correct, and it's trivially easy to fix by just restoring the SG table state before the call instead of afterwards. Signed-off-by: Robin Murphy--- Found by inspection while poking around TTM users. drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c index 2fd091f9..971223d39469 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c @@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt *vmw_tt) { struct device *dev = vmw_tt->dev_priv->dev->dev; + vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents, DMA_BIDIRECTIONAL); - vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; } /**
Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping
Hi, Robin, Thanks for the patch. It was some time since I put together that code, but I remember hitting something similar to https://www.linuxquestions.org/questions/linux-kernel-70/%27nents%27-argument-of-dma_unmap_sg-4175621964/ Even if it's clear from the documentation that orig_nents should be used. /Thomas On 04/13/2018 05:14 PM, Robin Murphy wrote: dma_unmap_sg() should be called with the same number of entries originally passed to dma_map_sg(), not the number it returned, which may be fewer. Admittedly this driver probably never runs on non-coherent architectures where getting that wrong could lead to data loss, but it's always good to be correct, and it's trivially easy to fix by just restoring the SG table state before the call instead of afterwards. Signed-off-by: Robin Murphy --- Found by inspection while poking around TTM users. drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c index 2fd091f9..971223d39469 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c @@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt *vmw_tt) { struct device *dev = vmw_tt->dev_priv->dev->dev; + vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents, DMA_BIDIRECTIONAL); - vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents; } /**
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 03:47 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 01:35:25PM +0200, Thomas Hellstrom wrote: On 04/05/2018 12:03 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz Spintzyk <lukasz.spint...@displaylink.com> Optional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk <lukasz.spint...@displaylink.com> Signed-off-by: Deepak Rawat <dra...@vmware.com> The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 03:47 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 01:35:25PM +0200, Thomas Hellstrom wrote: On 04/05/2018 12:03 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz Spintzyk Optional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk Signed-off-by: Deepak Rawat The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state,
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying b
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying buffer object. Are you saying that people are already
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 12:10 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 10:49:09AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:52 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote: With damage property in drm_plane_state, this patch adds helper iterator to traverse the damage clips. Iterator will return the damage rectangles in framebuffer, plane or crtc coordinates as need by driver implementation. Signed-off-by: Deepak Rawat <dra...@vmware.com> I'd really like selftest/unittests for this stuff. There's an awful lot of cornercases in this here (for any of the transformed iterators at least), and unit tests is the best way to make sure we handle them all correctly. Bonus points if you integrate the new selftests into igt so intel CI can run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also the framework I'd copy for this stuff. --- drivers/gpu/drm/drm_atomic_helper.c | 122 include/drm/drm_atomic_helper.h | 39 2 files changed, 161 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 55b44e3..355b514 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -3865,3 +3865,125 @@ void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj memcpy(state, obj->state, sizeof(*state)); } EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state); + +/** + * drm_atomic_helper_damage_iter_init - initialize the damage iterator + * @iter: The iterator to initialize. + * @type: Coordinate type caller is interested in. + * @state: plane_state from which to iterate the damage clips. + * @hdisplay: Width of crtc on which plane is scanned out. + * @vdisplay: Height of crtc on which plane is scanned out. + * + * Initialize an iterator that is used to translate and clip a set of damage + * rectangles in framebuffer coordinates to plane and crtc coordinates. The type + * argument specify which type of coordinate to iterate in. + * + * Returns: 0 on success and negative error code on error. If an error code is + * returned then it means the plane state should not update. + */ +int +drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter, + enum drm_atomic_helper_damage_clip_type type, + const struct drm_plane_state *state, + uint32_t hdisplay, uint32_t vdisplay) +{ + if (!state || !state->crtc || !state->fb) + return -EINVAL; + + memset(iter, 0, sizeof(*iter)); + iter->clips = (struct drm_rect *)state->damage_clips->data; + iter->num_clips = state->num_clips; + iter->type = type; + + /* +* Full update in case of scaling or rotation. In future support for +* scaling/rotating damage clips can be added +*/ + if (state->crtc_w != (state->src_w >> 16) || + state->crtc_h != state->src_h >> 16 || state->rotation != 0) { + iter->curr_clip = iter->num_clips; + return 0; Given there's no user of this I have no idea how this manages to provoke a full clip rect. selftest code would be perfect for stuff like this. Also, I think we should provide a full clip for the case of num_clips == 0, so that driver code can simply iterate over all clips and doesn't ever have to handle the "no clip rects provided" case as a special case itself. + } + + iter->fb_src.x1 = 0; + iter->fb_src.y1 = 0; + iter->fb_src.x2 = state->fb->width; + iter->fb_src.y2 = state->fb->height; + + iter->plane_src.x1 = state->src_x >> 16; + iter->plane_src.y1 = state->src_y >> 16; + iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16); + iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16); + iter->translate_plane_x = -iter->plane_src.x1; + iter->translate_plane_y = -iter->plane_src.y1; + + /* Clip plane src rect to fb dimensions */ + drm_rect_intersect(>plane_src, >fb_src); This smells like driver bug. Also, see Ville's recent efforts to improve the atomic plane clipping, I think drm_plane_state already has all the clip rects you want. + + iter->crtc_src.x1 = 0; + iter->crtc_src.y1 = 0; + iter->crtc_src.x2 = hdisplay; + iter->crtc_src.y2 = vdisplay; + iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x); + iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y); + + /* Clip crtc src rect to plane dimensions */ + drm_rect_translate(>crtc_src, -iter->translate_crtc_x, + -i
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 12:10 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 10:49:09AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:52 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote: With damage property in drm_plane_state, this patch adds helper iterator to traverse the damage clips. Iterator will return the damage rectangles in framebuffer, plane or crtc coordinates as need by driver implementation. Signed-off-by: Deepak Rawat I'd really like selftest/unittests for this stuff. There's an awful lot of cornercases in this here (for any of the transformed iterators at least), and unit tests is the best way to make sure we handle them all correctly. Bonus points if you integrate the new selftests into igt so intel CI can run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also the framework I'd copy for this stuff. --- drivers/gpu/drm/drm_atomic_helper.c | 122 include/drm/drm_atomic_helper.h | 39 2 files changed, 161 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 55b44e3..355b514 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -3865,3 +3865,125 @@ void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj memcpy(state, obj->state, sizeof(*state)); } EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state); + +/** + * drm_atomic_helper_damage_iter_init - initialize the damage iterator + * @iter: The iterator to initialize. + * @type: Coordinate type caller is interested in. + * @state: plane_state from which to iterate the damage clips. + * @hdisplay: Width of crtc on which plane is scanned out. + * @vdisplay: Height of crtc on which plane is scanned out. + * + * Initialize an iterator that is used to translate and clip a set of damage + * rectangles in framebuffer coordinates to plane and crtc coordinates. The type + * argument specify which type of coordinate to iterate in. + * + * Returns: 0 on success and negative error code on error. If an error code is + * returned then it means the plane state should not update. + */ +int +drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter, + enum drm_atomic_helper_damage_clip_type type, + const struct drm_plane_state *state, + uint32_t hdisplay, uint32_t vdisplay) +{ + if (!state || !state->crtc || !state->fb) + return -EINVAL; + + memset(iter, 0, sizeof(*iter)); + iter->clips = (struct drm_rect *)state->damage_clips->data; + iter->num_clips = state->num_clips; + iter->type = type; + + /* +* Full update in case of scaling or rotation. In future support for +* scaling/rotating damage clips can be added +*/ + if (state->crtc_w != (state->src_w >> 16) || + state->crtc_h != state->src_h >> 16 || state->rotation != 0) { + iter->curr_clip = iter->num_clips; + return 0; Given there's no user of this I have no idea how this manages to provoke a full clip rect. selftest code would be perfect for stuff like this. Also, I think we should provide a full clip for the case of num_clips == 0, so that driver code can simply iterate over all clips and doesn't ever have to handle the "no clip rects provided" case as a special case itself. + } + + iter->fb_src.x1 = 0; + iter->fb_src.y1 = 0; + iter->fb_src.x2 = state->fb->width; + iter->fb_src.y2 = state->fb->height; + + iter->plane_src.x1 = state->src_x >> 16; + iter->plane_src.y1 = state->src_y >> 16; + iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16); + iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16); + iter->translate_plane_x = -iter->plane_src.x1; + iter->translate_plane_y = -iter->plane_src.y1; + + /* Clip plane src rect to fb dimensions */ + drm_rect_intersect(>plane_src, >fb_src); This smells like driver bug. Also, see Ville's recent efforts to improve the atomic plane clipping, I think drm_plane_state already has all the clip rects you want. + + iter->crtc_src.x1 = 0; + iter->crtc_src.y1 = 0; + iter->crtc_src.x2 = hdisplay; + iter->crtc_src.y2 = vdisplay; + iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x); + iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y); + + /* Clip crtc src rect to plane dimensions */ + drm_rect_translate(>crtc_src, -iter->translate_crtc_x, + -iter->translate_crtc_x); We can als
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 12:03 PM, Daniel Vetter wrote: On the topic of input validation: Should the kernel check in shared code that all the clip rects are sensible, i.e. within the dimensions of the fb? Relying on drivers for that seems like a bad idea. I guess we could do that. There seems to be different needs for different kinds of iterators, but otherwise I was thinking that an iterator could just skip invalid rects if found. Then we don't need a separate validation step, but OTOH user-space won't get notified if that was intended. I'm not 100% sure user-space would do anything sensible with any error information, though. /Thomas That could be done in core code in drm_atomic_plane_check(). -Daniel /Thomas ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_dri-2Ddevel=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=JV7fhQ4zTiyiKsY8M0Zlf81l7szYePUke8kwSvQv1T8=HEcbr-8aoWqRvWGY6RcL1QeAtEyxMRLHbsOtdJFk78I=
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 12:03 PM, Daniel Vetter wrote: On the topic of input validation: Should the kernel check in shared code that all the clip rects are sensible, i.e. within the dimensions of the fb? Relying on drivers for that seems like a bad idea. I guess we could do that. There seems to be different needs for different kinds of iterators, but otherwise I was thinking that an iterator could just skip invalid rects if found. Then we don't need a separate validation step, but OTOH user-space won't get notified if that was intended. I'm not 100% sure user-space would do anything sensible with any error information, though. /Thomas That could be done in core code in drm_atomic_plane_check(). -Daniel /Thomas ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_dri-2Ddevel=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=JV7fhQ4zTiyiKsY8M0Zlf81l7szYePUke8kwSvQv1T8=HEcbr-8aoWqRvWGY6RcL1QeAtEyxMRLHbsOtdJFk78I=
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 12:03 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz Spintzyk <lukasz.spint...@displaylink.com> Optional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk <lukasz.spint...@displaylink.com> Signed-off-by: Deepak Rawat <dra...@vmware.com> The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state,
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 12:03 PM, Daniel Vetter wrote: On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote: On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz Spintzyk Optional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk Signed-off-by: Deepak Rawat The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state, property, val); @@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz SpintzykOptional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk Signed-off-by: Deepak Rawat The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state, property, val); @@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane, *val = state->color_encoding; } else if (property == plane->color_range_property) {
Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane
On 04/05/2018 09:35 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote: From: Lukasz Spintzyk Optional plane property to mark damaged regions on the plane in framebuffer coordinates of the framebuffer attached to the plane. The layout of blob data is simply an array of drm_mode_rect with maximum array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src coordinates, damage clips are not in 16.16 fixed point. Damage clips are a hint to kernel as which area of framebuffer has changed since last page-flip. This should be helpful for some drivers especially for virtual devices where each framebuffer change needs to be transmitted over network, usb, etc. Driver which are interested in enabling DAMAGE_CLIPS property for a plane should enable this property using drm_plane_enable_damage_clips. Signed-off-by: Lukasz Spintzyk Signed-off-by: Deepak Rawat The property uapi section is missing, see: https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ= Plane composition feels like the best place to put this. Please use that section to tie all the various bits together, including the helpers you're adding in the following patches for drivers to use. Bunch of nitpicks below, but overall I'm agreeing now with just going with fb coordinate damage rects. Like you say, the thing needed here now is userspace + driver actually implementing this. I'd also say the compat helper to map the legacy fb->dirty to this new atomic way of doing things should be included here (gives us a lot more testing for these new paths). Icing on the cake would be an igt to make sure kernel rejects invalid clip rects correctly. --- drivers/gpu/drm/drm_atomic.c| 42 + drivers/gpu/drm/drm_atomic_helper.c | 4 drivers/gpu/drm/drm_mode_config.c | 5 + drivers/gpu/drm/drm_plane.c | 12 +++ include/drm/drm_mode_config.h | 15 + include/drm/drm_plane.h | 16 ++ include/uapi/drm/drm_mode.h | 15 + 7 files changed, 109 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c index 7d25c42..9226d24 100644 --- a/drivers/gpu/drm/drm_atomic.c +++ b/drivers/gpu/drm/drm_atomic.c @@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer *p, } /** + * drm_atomic_set_damage_for_plane - sets the damage clips property to plane + * @state: plane state + * @blob: damage clips in framebuffer coordinates + * + * Returns: + * + * Zero on success, error code on failure. + */ +static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state, + struct drm_property_blob *blob) +{ + if (blob == state->damage_clips) + return 0; + + drm_property_blob_put(state->damage_clips); + state->damage_clips = NULL; + + if (blob) { + uint32_t count = blob->length/sizeof(struct drm_rect); + + if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS) + return -EINVAL; + + state->damage_clips = drm_property_blob_get(blob); + state->num_clips = count; + } else { + state->damage_clips = NULL; + state->num_clips = 0; + } + + return 0; +} + +/** * drm_atomic_get_plane_state - get plane state * @state: global atomic state object * @plane: plane to get state object for @@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane *plane, state->color_encoding = val; } else if (property == plane->color_range_property) { state->color_range = val; + } else if (property == config->prop_damage_clips) { + struct drm_property_blob *blob = + drm_property_lookup_blob(dev, val); + int ret = drm_atomic_set_damage_for_plane(state, blob); There's already a helper with size-checking built-in, see drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd just provide a little inline helper that does the blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast). + drm_property_blob_put(blob); + return ret; } else if (plane->funcs->atomic_set_property) { return plane->funcs->atomic_set_property(plane, state, property, val); @@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane, *val = state->color_encoding; } else if (property == plane->color_range_property) { *val = state->color_range; + } else if (property ==
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 09:52 AM, Daniel Vetter wrote: TYPE_PLANE I have no idea who needs that. I suggest we just drop it. I'm assuming CRTC plane coordinates here. They are used for uploading contents of hardware planes. Like, in the simplest case, cursor images. /Thomas
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 09:52 AM, Daniel Vetter wrote: TYPE_PLANE I have no idea who needs that. I suggest we just drop it. I'm assuming CRTC plane coordinates here. They are used for uploading contents of hardware planes. Like, in the simplest case, cursor images. /Thomas
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 09:52 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote: With damage property in drm_plane_state, this patch adds helper iterator to traverse the damage clips. Iterator will return the damage rectangles in framebuffer, plane or crtc coordinates as need by driver implementation. Signed-off-by: Deepak RawatI'd really like selftest/unittests for this stuff. There's an awful lot of cornercases in this here (for any of the transformed iterators at least), and unit tests is the best way to make sure we handle them all correctly. Bonus points if you integrate the new selftests into igt so intel CI can run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also the framework I'd copy for this stuff. --- drivers/gpu/drm/drm_atomic_helper.c | 122 include/drm/drm_atomic_helper.h | 39 2 files changed, 161 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 55b44e3..355b514 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -3865,3 +3865,125 @@ void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj memcpy(state, obj->state, sizeof(*state)); } EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state); + +/** + * drm_atomic_helper_damage_iter_init - initialize the damage iterator + * @iter: The iterator to initialize. + * @type: Coordinate type caller is interested in. + * @state: plane_state from which to iterate the damage clips. + * @hdisplay: Width of crtc on which plane is scanned out. + * @vdisplay: Height of crtc on which plane is scanned out. + * + * Initialize an iterator that is used to translate and clip a set of damage + * rectangles in framebuffer coordinates to plane and crtc coordinates. The type + * argument specify which type of coordinate to iterate in. + * + * Returns: 0 on success and negative error code on error. If an error code is + * returned then it means the plane state should not update. + */ +int +drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter, + enum drm_atomic_helper_damage_clip_type type, + const struct drm_plane_state *state, + uint32_t hdisplay, uint32_t vdisplay) +{ + if (!state || !state->crtc || !state->fb) + return -EINVAL; + + memset(iter, 0, sizeof(*iter)); + iter->clips = (struct drm_rect *)state->damage_clips->data; + iter->num_clips = state->num_clips; + iter->type = type; + + /* +* Full update in case of scaling or rotation. In future support for +* scaling/rotating damage clips can be added +*/ + if (state->crtc_w != (state->src_w >> 16) || + state->crtc_h != state->src_h >> 16 || state->rotation != 0) { + iter->curr_clip = iter->num_clips; + return 0; Given there's no user of this I have no idea how this manages to provoke a full clip rect. selftest code would be perfect for stuff like this. Also, I think we should provide a full clip for the case of num_clips == 0, so that driver code can simply iterate over all clips and doesn't ever have to handle the "no clip rects provided" case as a special case itself. + } + + iter->fb_src.x1 = 0; + iter->fb_src.y1 = 0; + iter->fb_src.x2 = state->fb->width; + iter->fb_src.y2 = state->fb->height; + + iter->plane_src.x1 = state->src_x >> 16; + iter->plane_src.y1 = state->src_y >> 16; + iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16); + iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16); + iter->translate_plane_x = -iter->plane_src.x1; + iter->translate_plane_y = -iter->plane_src.y1; + + /* Clip plane src rect to fb dimensions */ + drm_rect_intersect(>plane_src, >fb_src); This smells like driver bug. Also, see Ville's recent efforts to improve the atomic plane clipping, I think drm_plane_state already has all the clip rects you want. + + iter->crtc_src.x1 = 0; + iter->crtc_src.y1 = 0; + iter->crtc_src.x2 = hdisplay; + iter->crtc_src.y2 = vdisplay; + iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x); + iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y); + + /* Clip crtc src rect to plane dimensions */ + drm_rect_translate(>crtc_src, -iter->translate_crtc_x, + -iter->translate_crtc_x); We can also scale. I suggest we leave out scaling for now until someone actually needs it. + drm_rect_intersect(>crtc_src, >plane_src); + + return 0; +} +EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init); + +/** + * drm_atomic_helper_damage_iter_next - advance the damage iterator + * @iter: The
Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.
On 04/05/2018 09:52 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote: With damage property in drm_plane_state, this patch adds helper iterator to traverse the damage clips. Iterator will return the damage rectangles in framebuffer, plane or crtc coordinates as need by driver implementation. Signed-off-by: Deepak Rawat I'd really like selftest/unittests for this stuff. There's an awful lot of cornercases in this here (for any of the transformed iterators at least), and unit tests is the best way to make sure we handle them all correctly. Bonus points if you integrate the new selftests into igt so intel CI can run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also the framework I'd copy for this stuff. --- drivers/gpu/drm/drm_atomic_helper.c | 122 include/drm/drm_atomic_helper.h | 39 2 files changed, 161 insertions(+) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index 55b44e3..355b514 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -3865,3 +3865,125 @@ void __drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj memcpy(state, obj->state, sizeof(*state)); } EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state); + +/** + * drm_atomic_helper_damage_iter_init - initialize the damage iterator + * @iter: The iterator to initialize. + * @type: Coordinate type caller is interested in. + * @state: plane_state from which to iterate the damage clips. + * @hdisplay: Width of crtc on which plane is scanned out. + * @vdisplay: Height of crtc on which plane is scanned out. + * + * Initialize an iterator that is used to translate and clip a set of damage + * rectangles in framebuffer coordinates to plane and crtc coordinates. The type + * argument specify which type of coordinate to iterate in. + * + * Returns: 0 on success and negative error code on error. If an error code is + * returned then it means the plane state should not update. + */ +int +drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter, + enum drm_atomic_helper_damage_clip_type type, + const struct drm_plane_state *state, + uint32_t hdisplay, uint32_t vdisplay) +{ + if (!state || !state->crtc || !state->fb) + return -EINVAL; + + memset(iter, 0, sizeof(*iter)); + iter->clips = (struct drm_rect *)state->damage_clips->data; + iter->num_clips = state->num_clips; + iter->type = type; + + /* +* Full update in case of scaling or rotation. In future support for +* scaling/rotating damage clips can be added +*/ + if (state->crtc_w != (state->src_w >> 16) || + state->crtc_h != state->src_h >> 16 || state->rotation != 0) { + iter->curr_clip = iter->num_clips; + return 0; Given there's no user of this I have no idea how this manages to provoke a full clip rect. selftest code would be perfect for stuff like this. Also, I think we should provide a full clip for the case of num_clips == 0, so that driver code can simply iterate over all clips and doesn't ever have to handle the "no clip rects provided" case as a special case itself. + } + + iter->fb_src.x1 = 0; + iter->fb_src.y1 = 0; + iter->fb_src.x2 = state->fb->width; + iter->fb_src.y2 = state->fb->height; + + iter->plane_src.x1 = state->src_x >> 16; + iter->plane_src.y1 = state->src_y >> 16; + iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16); + iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16); + iter->translate_plane_x = -iter->plane_src.x1; + iter->translate_plane_y = -iter->plane_src.y1; + + /* Clip plane src rect to fb dimensions */ + drm_rect_intersect(>plane_src, >fb_src); This smells like driver bug. Also, see Ville's recent efforts to improve the atomic plane clipping, I think drm_plane_state already has all the clip rects you want. + + iter->crtc_src.x1 = 0; + iter->crtc_src.y1 = 0; + iter->crtc_src.x2 = hdisplay; + iter->crtc_src.y2 = vdisplay; + iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x); + iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y); + + /* Clip crtc src rect to plane dimensions */ + drm_rect_translate(>crtc_src, -iter->translate_crtc_x, + -iter->translate_crtc_x); We can also scale. I suggest we leave out scaling for now until someone actually needs it. + drm_rect_intersect(>crtc_src, >plane_src); + + return 0; +} +EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init); + +/** + * drm_atomic_helper_damage_iter_next - advance the damage iterator + * @iter: The iterator to advance. +
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 12:28 PM, Thomas Hellstrom wrote: On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3)
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 12:28 PM, Thomas Hellstrom wrote: On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying b
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 11:56 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote: On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip (atomic or not) should result in the entire buffer getting uploaded. The dirty flag is kinda redundant, a flip with the same buffer works the same way as a dirtyfb with the entire buffer as the dirty rectangle. In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying buffer object. Are you saying that people are already
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark <robdcl...@gmail.com> --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying buffer object. Are you saying that people are already using 3) and we should keep using that? /Thomas
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
On 04/04/2018 10:43 AM, Daniel Vetter wrote: On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote: Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. I'm not entirely following here - I thought for frontbuffer rendering the dirty rects have always just been a hint, and that the driver was always free to re-upload the entire buffer to the screen. And through a helper like Rob's proposing here (and have floated around in different versions already) we'd essentially map a frontbuffer dirtyfb call to a fake flip with dirty rect. Manual upload drivers already need to upload the entire screen if they get a flip, since some userspace uses that to flush out frontbuffer rendering (instead of calling dirtyfb). So from that pov the new dirty flag is kinda not necessary imo. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. Or I'm not following you here, because I don't quite see the difference between dirtyfb and a flip. -Daniel OK, let me rephrase: From the driver's point-of-view, in the atomic world, new content and the need for manual upload is indicated by a change in fb attached to the plane. With Rob's patch here, (correct me if I'm wrong) in addition new content and the need for manual upload is identified by the dirty flag, (since the fb stays the same and the driver thus never identifies a page-flip). In both these situations, clip rects can provide a hint to restrict the dirty region. Now I was thinking about the preferred way for user-space to communicate front buffer rendering through the atomic ioctl: 1) Expose a dirty (or content_age property) 2) Attach a clip-rect blob property. 3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same underlying buffer object. Are you saying that people are already using 3) and we should keep using that? /Thomas
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clarkwrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. /Thomas
Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb
Hi, On 04/04/2018 08:58 AM, Daniel Vetter wrote: On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark wrote: Add an atomic helper to implement dirtyfb support. This is needed to support DSI command-mode panels with x11 userspace (ie. when we can't rely on pageflips to trigger a flush to the panel). To signal to the driver that the async atomic update needs to synchronize with fences, even though the fb didn't change, the drm_atomic_state::dirty flag is added. Signed-off-by: Rob Clark --- Background: there are a number of different folks working on getting upstream kernel working on various different phones/tablets with qcom SoC's.. many of them have command mode panels, so we kind of need a way to support the legacy dirtyfb ioctl for x11 support. I know there is work on a proprer non-legacy atomic property for userspace to communicate dirty-rect(s) to the kernel, so this can be improved from triggering a full-frame flush once that is in place. But we kinda needa a stop-gap solution. I had considered an in-driver solution for this, but things get a bit tricky if userspace ands up combining dirtyfb ioctls with page- flips, because we need to synchronize setting various CTL.FLUSH bits with setting the CTL.START bit. (ie. really all we need to do for cmd mode panels is bang CTL.START, but is this ends up racing with pageflips setting FLUSH bits, then bad things.) The easiest soln is to wrap this up as an atomic commit and rely on the worker to serialize things. Hence adding an atomic dirtyfb helper. I guess at least the helper, with some small addition to translate and pass-thru the dirty rect(s) is useful to the final atomic dirty- rect property solution. Depending on how far off that is, a stop- gap solution could be useful. Adding Noralf, who iirc already posted the full dirty helpers already somewhere. -Daniel I've asked Deepak to RFC the core changes suggested for the full dirty blob on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to get to the desired coordinates. One thing to perhaps discuss is how we would like to fit this with front-buffer rendering and the dirty ioctl. In the page-flip context, the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the damage region that can be fully ignored by the driver, new content is indicated by a new framebuffer. We could do the same for frontbuffer rendering: Either set a dirty flag like you do here, or provide a content_age state member. Since we clear the dirty flag on state copies, I guess that would be sufficient. The blob rectangles would then become a hint to restrict the damage region. Another approach would be to have the presence of dirty rects without framebuffer change to indicate frontbuffer rendering. I think I like the first approach best, although it may be tempting for user-space apps to just set the dirty bit instead of providing the full damage region. /Thomas
Re: Signal handling in a page fault handler
On 04/03/2018 02:33 PM, Chris Wilson wrote: Quoting Matthew Wilcox (2018-04-02 15:10:58) Souptick and I have been auditing the various page fault handler routines and we've noticed that graphics drivers assume that a signal should be able to interrupt a page fault. In contrast, the page cache takes great care to allow only fatal signals to interrupt a page fault. I believe (but have not verified) that a non-fatal signal being delivered to a task which is in the middle of a page fault may well end up in an infinite loop, attempting to handle the page fault and failing forever. Here's one of the simpler ones: ret = mutex_lock_interruptible(_obj->lock); if (ret) return VM_FAULT_NOPAGE; (many other drivers do essentially the same thing including i915) On seeing NOPAGE, the fault handler believes the PTE is in the page table, so does nothing before it returns to arch code at which point I get lost in the magic assembler macros. I believe it will end up returning to userspace if the signal is non-fatal, at which point it'll go right back into the page fault handler, and mutex_lock_interruptible() will immediately fail. So we've converted a sleeping lock into the most expensive spinlock. I'll ask the obvious question: why isn't the signal handled on return to userspace? +1 I don't think the graphics drivers really want to be interrupted by any signal. Assume the worst case and we may block for 10s. Even a 10ms delay may be unacceptable to some signal handlers (one presumes). For the number one ^C usecase, yes that may be reduced to only bother if it's killable, but I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19) that want to be able to interrupt random blockages. -Chris I think the TTM page fault handler originally set the standard for this. First, IMO any critical section that waits for the GPU (like typically the page fault handler does), should be locked at least killable. The need for interruptible locks came from the X server's silken mouse relying on signals for smooth mouse operations: You didn't want the X server to be stuck in the kernel waiting for GPU completion when it should handle the cursor move request.. Now that doesn't seem to be the case anymore but to reiterate Chris' question, why would the signal persist once returned to user-space? /Thomas ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Signal handling in a page fault handler
On 04/03/2018 02:33 PM, Chris Wilson wrote: Quoting Matthew Wilcox (2018-04-02 15:10:58) Souptick and I have been auditing the various page fault handler routines and we've noticed that graphics drivers assume that a signal should be able to interrupt a page fault. In contrast, the page cache takes great care to allow only fatal signals to interrupt a page fault. I believe (but have not verified) that a non-fatal signal being delivered to a task which is in the middle of a page fault may well end up in an infinite loop, attempting to handle the page fault and failing forever. Here's one of the simpler ones: ret = mutex_lock_interruptible(_obj->lock); if (ret) return VM_FAULT_NOPAGE; (many other drivers do essentially the same thing including i915) On seeing NOPAGE, the fault handler believes the PTE is in the page table, so does nothing before it returns to arch code at which point I get lost in the magic assembler macros. I believe it will end up returning to userspace if the signal is non-fatal, at which point it'll go right back into the page fault handler, and mutex_lock_interruptible() will immediately fail. So we've converted a sleeping lock into the most expensive spinlock. I'll ask the obvious question: why isn't the signal handled on return to userspace? +1 I don't think the graphics drivers really want to be interrupted by any signal. Assume the worst case and we may block for 10s. Even a 10ms delay may be unacceptable to some signal handlers (one presumes). For the number one ^C usecase, yes that may be reduced to only bother if it's killable, but I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19) that want to be able to interrupt random blockages. -Chris I think the TTM page fault handler originally set the standard for this. First, IMO any critical section that waits for the GPU (like typically the page fault handler does), should be locked at least killable. The need for interruptible locks came from the X server's silken mouse relying on signals for smooth mouse operations: You didn't want the X server to be stuck in the kernel waiting for GPU completion when it should handle the cursor move request.. Now that doesn't seem to be the case anymore but to reiterate Chris' question, why would the signal persist once returned to user-space? /Thomas ___ dri-devel mailing list dri-de...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH] drm/vmwgfx: Return boolean instead of integer in vmw_fence_obj_signaled
On 01/19/2018 01:02 AM, Gustavo A. R. Silva wrote: Return statements in functions returning bool should use true/false instead of 1/0. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. SilvaReviewed-by: Thomas Hellström I'll queue this for 4.17. Thanks, Thomas --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 6c5c75c..35fd6f9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -496,7 +496,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence) struct vmw_fence_manager *fman = fman_from_fence(fence); if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >base.flags)) - return 1; + return true; vmw_fences_update(fman);
Re: [PATCH] drm/vmwgfx: Return boolean instead of integer in vmw_fence_obj_signaled
On 01/19/2018 01:02 AM, Gustavo A. R. Silva wrote: Return statements in functions returning bool should use true/false instead of 1/0. This issue was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva Reviewed-by: Thomas Hellström I'll queue this for 4.17. Thanks, Thomas --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 6c5c75c..35fd6f9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -496,7 +496,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence) struct vmw_fence_manager *fman = fman_from_fence(fence); if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >base.flags)) - return 1; + return true; vmw_fences_update(fman);
Re: linux-next: Signed-off-by missing for commit in the drm-misc-fixes tree
Hi, Stephen, That would be me. I've historically only added my signed-off-by:s on commits I've (co-) authored myself, as the committer info is already registered, But I take it that's not the correct approach? /Thomas On 01/18/2018 09:23 PM, Stephen Rothwell wrote: Hi all, Commit 8a510a5c7526 ("drm/vmwgfx: fix memory corruption with legacy/sou connectors") is missing a Signed-off-by from its committer.
Re: linux-next: Signed-off-by missing for commit in the drm-misc-fixes tree
Hi, Stephen, That would be me. I've historically only added my signed-off-by:s on commits I've (co-) authored myself, as the committer info is already registered, But I take it that's not the correct approach? /Thomas On 01/18/2018 09:23 PM, Stephen Rothwell wrote: Hi all, Commit 8a510a5c7526 ("drm/vmwgfx: fix memory corruption with legacy/sou connectors") is missing a Signed-off-by from its committer.
Re: [PATCH] drm/vmwgfx: fix memory corruption with legacy/sou connectors
On 01/17/2018 04:16 PM, Rob Clark wrote: From: Rob Clark <rcl...@redhat.com> It looks like in all cases 'struct vmw_connector_state' is used. But only in stdu connectors, was atomic_{duplicate,destroy}_state() properly subclassed. Leading to writes beyond the end of the allocated connector state block and all sorts of fun memory corruption related crashes. Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state" Cc: <sta...@vger.kernel.org> Signed-off-by: Rob Clark <rcl...@redhat.com> Reviewed-by: Thomas Hellstrom <thellst...@vmware.com> Thanks, Rob! /Thomas --- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c index b8a09807c5de..3824595fece1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c @@ -266,8 +266,8 @@ static const struct drm_connector_funcs vmw_legacy_connector_funcs = { .set_property = vmw_du_connector_set_property, .destroy = vmw_ldu_connector_destroy, .reset = vmw_du_connector_reset, - .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, - .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, + .atomic_duplicate_state = vmw_du_connector_duplicate_state, + .atomic_destroy_state = vmw_du_connector_destroy_state, .atomic_set_property = vmw_du_connector_atomic_set_property, .atomic_get_property = vmw_du_connector_atomic_get_property, }; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c index bc5f6026573d..63a4cd794b73 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c @@ -420,8 +420,8 @@ static const struct drm_connector_funcs vmw_sou_connector_funcs = { .set_property = vmw_du_connector_set_property, .destroy = vmw_sou_connector_destroy, .reset = vmw_du_connector_reset, - .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, - .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, + .atomic_duplicate_state = vmw_du_connector_duplicate_state, + .atomic_destroy_state = vmw_du_connector_destroy_state, .atomic_set_property = vmw_du_connector_atomic_set_property, .atomic_get_property = vmw_du_connector_atomic_get_property, };
Re: [PATCH] drm/vmwgfx: fix memory corruption with legacy/sou connectors
On 01/17/2018 04:16 PM, Rob Clark wrote: From: Rob Clark It looks like in all cases 'struct vmw_connector_state' is used. But only in stdu connectors, was atomic_{duplicate,destroy}_state() properly subclassed. Leading to writes beyond the end of the allocated connector state block and all sorts of fun memory corruption related crashes. Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state" Cc: Signed-off-by: Rob Clark Reviewed-by: Thomas Hellstrom Thanks, Rob! /Thomas --- drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c index b8a09807c5de..3824595fece1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c @@ -266,8 +266,8 @@ static const struct drm_connector_funcs vmw_legacy_connector_funcs = { .set_property = vmw_du_connector_set_property, .destroy = vmw_ldu_connector_destroy, .reset = vmw_du_connector_reset, - .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, - .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, + .atomic_duplicate_state = vmw_du_connector_duplicate_state, + .atomic_destroy_state = vmw_du_connector_destroy_state, .atomic_set_property = vmw_du_connector_atomic_set_property, .atomic_get_property = vmw_du_connector_atomic_get_property, }; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c index bc5f6026573d..63a4cd794b73 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c @@ -420,8 +420,8 @@ static const struct drm_connector_funcs vmw_sou_connector_funcs = { .set_property = vmw_du_connector_set_property, .destroy = vmw_sou_connector_destroy, .reset = vmw_du_connector_reset, - .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state, - .atomic_destroy_state = drm_atomic_helper_connector_destroy_state, + .atomic_duplicate_state = vmw_du_connector_duplicate_state, + .atomic_destroy_state = vmw_du_connector_destroy_state, .atomic_set_property = vmw_du_connector_atomic_set_property, .atomic_get_property = vmw_du_connector_atomic_get_property, };
Re: drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
On 01/17/2018 07:33 AM, Thomas Hellstrom wrote: Hi, Woody, On 01/16/2018 10:39 PM, Woody Suwalski wrote: Thomas, the same way my DRM patch has disappeared: Date Tue, 19 Dec 2017 11:50:57 -0800 From Sinclair Yeh <> Subject Re: [PATCH v.2] 4.15 vmgfx boot warning This looks okay to me. Since this is a core DRM patch I think Sinclair was expecting it to be pulled in using the drm-misc tree. Did you get a comment from Daniel on this patch? Thanks, Thomas Actually, checkpatch.pl complains about -ENOSYS, so I'm sending a pull request for v1 of this patch instead, to make everybody happy. I've rephrased the commit log somewhat. Hope that's OK. /Thomas
Re: drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
On 01/17/2018 07:33 AM, Thomas Hellstrom wrote: Hi, Woody, On 01/16/2018 10:39 PM, Woody Suwalski wrote: Thomas, the same way my DRM patch has disappeared: Date Tue, 19 Dec 2017 11:50:57 -0800 From Sinclair Yeh <> Subject Re: [PATCH v.2] 4.15 vmgfx boot warning This looks okay to me. Since this is a core DRM patch I think Sinclair was expecting it to be pulled in using the drm-misc tree. Did you get a comment from Daniel on this patch? Thanks, Thomas Actually, checkpatch.pl complains about -ENOSYS, so I'm sending a pull request for v1 of this patch instead, to make everybody happy. I've rephrased the commit log somewhat. Hope that's OK. /Thomas
drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
Hi, Woody, On 01/16/2018 10:39 PM, Woody Suwalski wrote: Thomas, the same way my DRM patch has disappeared: Date Tue, 19 Dec 2017 11:50:57 -0800 From Sinclair Yeh <> Subject Re: [PATCH v.2] 4.15 vmgfx boot warning This looks okay to me. Since this is a core DRM patch I think Sinclair was expecting it to be pulled in using the drm-misc tree. Did you get a comment from Daniel on this patch? Thanks, Thomas
drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
Hi, Woody, On 01/16/2018 10:39 PM, Woody Suwalski wrote: Thomas, the same way my DRM patch has disappeared: Date Tue, 19 Dec 2017 11:50:57 -0800 From Sinclair Yeh <> Subject Re: [PATCH v.2] 4.15 vmgfx boot warning This looks okay to me. Since this is a core DRM patch I think Sinclair was expecting it to be pulled in using the drm-misc tree. Did you get a comment from Daniel on this patch? Thanks, Thomas
Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
Hi, Arnd, Sinclair's on paternal leave and I thought this patch was already in drm-next. My bad. Dave, is it too late to pull this in for the next merge window? /Thomas On 01/16/2018 06:18 PM, Arnd Bergmann wrote: DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit seconds/microseconds format. As of commit c61eef726a78 ("drm: add support for monotonic vblank timestamps"), other DRM drivers use monotonic times for drm_event_vblank, but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from the y2038/y2106 overflow as well as time jumps. For consistency, this changes vmwgfx to use ktime_get_ts64 as well, which solves those problems and avoids the deprecated do_gettimeofday() function. This should be transparent to to user space, as long as it doesn't compare the time against the result of gettimeofday(). Link: https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10076599_=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=8M6vawBo0zDsjbqhzV0xpOwAzX7Zm-uyGlIDnQ7-Gms=sHGuz0aoE9aLjVp5GALo8mYrN1bwOHW6mGpJIZmhwAc= Signed-off-by: Arnd Bergmann--- Originally sent on Nov 27. Sinclair Yeh said he'd pick it up for the next pull request, but it's not in linux-next yet. Resending the unchanged patch, please pick it up when you have time, or feel free to ignore this email in case it's already in some tree that just isn't part of linux-next but will be sent during the next merge window. --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 6c5c75cf5e6c..9ed544f8958f 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -901,11 +901,12 @@ static void vmw_event_fence_action_seq_passed(struct vmw_fence_action *action) spin_lock_irq(>event_lock); if (likely(eaction->tv_sec != NULL)) { - struct timeval tv; + struct timespec64 ts; - do_gettimeofday(); - *eaction->tv_sec = tv.tv_sec; - *eaction->tv_usec = tv.tv_usec; + ktime_get_ts64(); + /* monotonic time, so no y2038 overflow */ + *eaction->tv_sec = ts.tv_sec; + *eaction->tv_usec = ts.tv_nsec / NSEC_PER_USEC; } drm_send_event_locked(dev, eaction->event);
Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps
Hi, Arnd, Sinclair's on paternal leave and I thought this patch was already in drm-next. My bad. Dave, is it too late to pull this in for the next merge window? /Thomas On 01/16/2018 06:18 PM, Arnd Bergmann wrote: DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit seconds/microseconds format. As of commit c61eef726a78 ("drm: add support for monotonic vblank timestamps"), other DRM drivers use monotonic times for drm_event_vblank, but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from the y2038/y2106 overflow as well as time jumps. For consistency, this changes vmwgfx to use ktime_get_ts64 as well, which solves those problems and avoids the deprecated do_gettimeofday() function. This should be transparent to to user space, as long as it doesn't compare the time against the result of gettimeofday(). Link: https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10076599_=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=8M6vawBo0zDsjbqhzV0xpOwAzX7Zm-uyGlIDnQ7-Gms=sHGuz0aoE9aLjVp5GALo8mYrN1bwOHW6mGpJIZmhwAc= Signed-off-by: Arnd Bergmann --- Originally sent on Nov 27. Sinclair Yeh said he'd pick it up for the next pull request, but it's not in linux-next yet. Resending the unchanged patch, please pick it up when you have time, or feel free to ignore this email in case it's already in some tree that just isn't part of linux-next but will be sent during the next merge window. --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index 6c5c75cf5e6c..9ed544f8958f 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -901,11 +901,12 @@ static void vmw_event_fence_action_seq_passed(struct vmw_fence_action *action) spin_lock_irq(>event_lock); if (likely(eaction->tv_sec != NULL)) { - struct timeval tv; + struct timespec64 ts; - do_gettimeofday(); - *eaction->tv_sec = tv.tv_sec; - *eaction->tv_usec = tv.tv_usec; + ktime_get_ts64(); + /* monotonic time, so no y2038 overflow */ + *eaction->tv_sec = ts.tv_sec; + *eaction->tv_usec = ts.tv_nsec / NSEC_PER_USEC; } drm_send_event_locked(dev, eaction->event);
Re: [PATCH] drm/vmwgfx: remove DRM_ERROR message, stops log spamming
Hi, Colin, On 09/12/2017 07:35 PM, Colin King wrote: From: Colin Ian Kingmmap'ing the device multiple times will spam the kernel log with the DRM_ERROR message about illegal mmap'ing the old fifo space. How are you hitting this? Multiple mappings should be fine as long as mapping offsets are correct, so hitting this message should indicate that the user-space app is doing something seriously wrong, and having it present in the log should probably help more than it hurts. /Thomas Since the mmap'ing will fail with an -EINVAL there is no need to emit this message, so just remove it. Signed-off-by: Colin Ian King --- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c index e771091d2cd3..1e633c602fb1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c @@ -33,10 +33,8 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma) struct drm_file *file_priv; struct vmw_private *dev_priv; - if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) { - DRM_ERROR("Illegal attempt to mmap old fifo space.\n"); + if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) return -EINVAL; - } file_priv = filp->private_data; dev_priv = vmw_priv(file_priv->minor->dev);
Re: [PATCH] drm/vmwgfx: remove DRM_ERROR message, stops log spamming
Hi, Colin, On 09/12/2017 07:35 PM, Colin King wrote: From: Colin Ian King mmap'ing the device multiple times will spam the kernel log with the DRM_ERROR message about illegal mmap'ing the old fifo space. How are you hitting this? Multiple mappings should be fine as long as mapping offsets are correct, so hitting this message should indicate that the user-space app is doing something seriously wrong, and having it present in the log should probably help more than it hurts. /Thomas Since the mmap'ing will fail with an -EINVAL there is no need to emit this message, so just remove it. Signed-off-by: Colin Ian King --- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c index e771091d2cd3..1e633c602fb1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c @@ -33,10 +33,8 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma) struct drm_file *file_priv; struct vmw_private *dev_priv; - if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) { - DRM_ERROR("Illegal attempt to mmap old fifo space.\n"); + if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) return -EINVAL; - } file_priv = filp->private_data; dev_priv = vmw_priv(file_priv->minor->dev);
Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops
On 08/30/2017 10:30 AM, Daniel Vetter wrote: On Wed, Aug 30, 2017 at 08:21:46AM +0200, Thomas Hellstrom wrote: On 08/30/2017 07:47 AM, Arvind Yadav wrote: vmw_fence_ops are not supposed to change at runtime. Functions "dma_fence_init" working with const vmw_fence_ops provided by . So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav...@gmail.com> --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index b8bc5bc..abc5f03 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout) return ret; } -static struct dma_fence_ops vmw_fence_ops = { +static const struct dma_fence_ops vmw_fence_ops = { .get_driver_name = vmw_fence_get_driver_name, .get_timeline_name = vmw_fence_get_timeline_name, .enable_signaling = vmw_fence_enable_signaling, Reviewed-by: Thomas Hellstrom <thellst...@vmware.com> Does this mean you'll merge it, or does this mean you'll expect someone else to merge this? I'm always confused when maintainers reply with an r-b/ack for a patch only touching their driver, and no further information at all. -Daniel For patches only touching our driver, I'd say we're always responsible for sorting out how it's going to be merged. Since Sinclair is maintaining the vmwgfx trees I thought I'd give him a chance to comment on how he wanted it merged. /Thomas
Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops
On 08/30/2017 10:30 AM, Daniel Vetter wrote: On Wed, Aug 30, 2017 at 08:21:46AM +0200, Thomas Hellstrom wrote: On 08/30/2017 07:47 AM, Arvind Yadav wrote: vmw_fence_ops are not supposed to change at runtime. Functions "dma_fence_init" working with const vmw_fence_ops provided by . So mark the non-const structs as const. Signed-off-by: Arvind Yadav --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index b8bc5bc..abc5f03 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout) return ret; } -static struct dma_fence_ops vmw_fence_ops = { +static const struct dma_fence_ops vmw_fence_ops = { .get_driver_name = vmw_fence_get_driver_name, .get_timeline_name = vmw_fence_get_timeline_name, .enable_signaling = vmw_fence_enable_signaling, Reviewed-by: Thomas Hellstrom Does this mean you'll merge it, or does this mean you'll expect someone else to merge this? I'm always confused when maintainers reply with an r-b/ack for a patch only touching their driver, and no further information at all. -Daniel For patches only touching our driver, I'd say we're always responsible for sorting out how it's going to be merged. Since Sinclair is maintaining the vmwgfx trees I thought I'd give him a chance to comment on how he wanted it merged. /Thomas
Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops
On 08/30/2017 07:47 AM, Arvind Yadav wrote: vmw_fence_ops are not supposed to change at runtime. Functions "dma_fence_init" working with const vmw_fence_ops provided by . So mark the non-const structs as const. Signed-off-by: Arvind Yadav <arvind.yadav...@gmail.com> --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index b8bc5bc..abc5f03 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout) return ret; } -static struct dma_fence_ops vmw_fence_ops = { +static const struct dma_fence_ops vmw_fence_ops = { .get_driver_name = vmw_fence_get_driver_name, .get_timeline_name = vmw_fence_get_timeline_name, .enable_signaling = vmw_fence_enable_signaling, Reviewed-by: Thomas Hellstrom <thellst...@vmware.com>
Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops
On 08/30/2017 07:47 AM, Arvind Yadav wrote: vmw_fence_ops are not supposed to change at runtime. Functions "dma_fence_init" working with const vmw_fence_ops provided by . So mark the non-const structs as const. Signed-off-by: Arvind Yadav --- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c index b8bc5bc..abc5f03 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c @@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, signed long timeout) return ret; } -static struct dma_fence_ops vmw_fence_ops = { +static const struct dma_fence_ops vmw_fence_ops = { .get_driver_name = vmw_fence_get_driver_name, .get_timeline_name = vmw_fence_get_timeline_name, .enable_signaling = vmw_fence_enable_signaling, Reviewed-by: Thomas Hellstrom
Re: [PATCH 28/29] drm/vmwgfx: switch to drm_*{get,put} helpers
On 08/03/2017 01:58 PM, Cihangir Akturk wrote: drm_*_reference() and drm_*_unreference() functions are just compatibility alias for drm_*_get() and drm_*_put() adn should not be s/adn/and/ used by new code. So convert all users of compatibility functions to use the new APIs. Signed-off-by: Cihangir Akturk <cakt...@gmail.com> --- drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c index 6f4cb46..d43dce9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c @@ -500,7 +500,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par, } if (cur_fb) { - drm_framebuffer_unreference(cur_fb); + drm_framebuffer_put(cur_fb); par->set_fb = NULL; } diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c index 5ec24fd..fd4a988 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c @@ -316,7 +316,7 @@ int vmw_present_ioctl(struct drm_device *dev, void *data, out_no_surface: ttm_read_unlock(_priv->reservation_sem); out_no_ttm_lock: - drm_framebuffer_unreference(fb); + drm_framebuffer_put(fb); out_no_fb: drm_modeset_unlock_all(dev); out_no_copy: @@ -393,7 +393,7 @@ int vmw_present_readback_ioctl(struct drm_device *dev, void *data, ttm_read_unlock(_priv->reservation_sem); out_no_ttm_lock: - drm_framebuffer_unreference(fb); + drm_framebuffer_put(fb); out_no_fb: drm_modeset_unlock_all(dev); out_no_copy: Apart from the above, Reviewed-by: Thomas Hellstrom <thellst...@vmware.com> (Assuming this gets pulled together with the whole series, not vmwgfx-next) Thanks, Thomas
Re: [PATCH 28/29] drm/vmwgfx: switch to drm_*{get,put} helpers
On 08/03/2017 01:58 PM, Cihangir Akturk wrote: drm_*_reference() and drm_*_unreference() functions are just compatibility alias for drm_*_get() and drm_*_put() adn should not be s/adn/and/ used by new code. So convert all users of compatibility functions to use the new APIs. Signed-off-by: Cihangir Akturk --- drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c index 6f4cb46..d43dce9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c @@ -500,7 +500,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par, } if (cur_fb) { - drm_framebuffer_unreference(cur_fb); + drm_framebuffer_put(cur_fb); par->set_fb = NULL; } diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c index 5ec24fd..fd4a988 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c @@ -316,7 +316,7 @@ int vmw_present_ioctl(struct drm_device *dev, void *data, out_no_surface: ttm_read_unlock(_priv->reservation_sem); out_no_ttm_lock: - drm_framebuffer_unreference(fb); + drm_framebuffer_put(fb); out_no_fb: drm_modeset_unlock_all(dev); out_no_copy: @@ -393,7 +393,7 @@ int vmw_present_readback_ioctl(struct drm_device *dev, void *data, ttm_read_unlock(_priv->reservation_sem); out_no_ttm_lock: - drm_framebuffer_unreference(fb); + drm_framebuffer_put(fb); out_no_fb: drm_modeset_unlock_all(dev); out_no_copy: Apart from the above, Reviewed-by: Thomas Hellstrom (Assuming this gets pulled together with the whole series, not vmwgfx-next) Thanks, Thomas