Re: linux-next: Fixes tags need some work in the tip tree

2019-10-22 Thread Thomas Hellstrom
On 10/22/19 10:17 PM, Stephen Rothwell wrote:
> Hi all,
>
> n commit
>
>   6fee2a0be0ec ("x86/cpu/vmware: Fix platform detection VMWARE_PORT macro")
>
> Fixes tag
>
>   Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")

The cited subject is missing a leading "x86/vmware:". Ingo, please let
me know if you want me to respin those.

Thanks,

Thomas




[tip: x86/urgent] x86/cpu/vmware: Fix platform detection VMWARE_PORT macro

2019-10-21 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 6fee2a0be0ecae939d4b6cd8297d88b5cbb61654
Gitweb:
https://git.kernel.org/tip/6fee2a0be0ecae939d4b6cd8297d88b5cbb61654
Author:Thomas Hellstrom 
AuthorDate:Mon, 21 Oct 2019 19:24:03 +02:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00

x86/cpu/vmware: Fix platform detection VMWARE_PORT macro

The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.

Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.

Signed-off-by: Thomas Hellstrom 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sean Christopherson 
Cc: Thomas Gleixner 
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas...@shipmail.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/vmware.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
index 3caac90..ac9fc51 100644
--- a/arch/x86/include/asm/vmware.h
+++ b/arch/x86/include/asm/vmware.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * The hypercall definitions differ in the low word of the %edx argument
@@ -20,8 +21,8 @@
  */
 
 /* Old port-based version */
-#define VMWARE_HYPERVISOR_PORT"0x5658"
-#define VMWARE_HYPERVISOR_PORT_HB "0x5659"
+#define VMWARE_HYPERVISOR_PORT0x5658
+#define VMWARE_HYPERVISOR_PORT_HB 0x5659
 
 /* Current vmcall / vmmcall version */
 #define VMWARE_HYPERVISOR_HB   BIT(0)
@@ -29,7 +30,7 @@
 
 /* The low bandwidth call. The low word of edx is presumed clear. */
 #define VMWARE_HYPERCALL   \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \
  "inl (%%dx), %%eax",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
@@ -39,7 +40,8 @@
  * HB and OUT bits set.
  */
 #define VMWARE_HYPERCALL_HB_OUT
\
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep outsb", \
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; 
" \
+ "rep outsb",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
 
@@ -48,7 +50,8 @@
  * HB bit set.
  */
 #define VMWARE_HYPERCALL_HB_IN \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep insb", \
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; 
" \
+ "rep insb",   \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
 #endif


[tip: x86/urgent] x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm

2019-10-21 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: db633a4e0e6eda69b6065e3e106f9ea13a0676c3
Gitweb:
https://git.kernel.org/tip/db633a4e0e6eda69b6065e3e106f9ea13a0676c3
Author:Thomas Hellstrom 
AuthorDate:Mon, 21 Oct 2019 19:24:02 +02:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00

x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm

LLVM's assembler doesn't accept the short form INL instruction:

  inl (%%dx)

but instead insists on the output register to be explicitly specified.

This was previously fixed for the VMWARE_PORT macro. Fix it also for
the VMWARE_HYPERCALL macro.

Suggested-by: Sami Tolvanen 
Signed-off-by: Thomas Hellstrom 
Reviewed-by: Nick Desaulniers 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sean Christopherson 
Cc: Thomas Gleixner 
Cc: clang-built-li...@googlegroups.com
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas...@shipmail.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/vmware.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
index e00c9e8..3caac90 100644
--- a/arch/x86/include/asm/vmware.h
+++ b/arch/x86/include/asm/vmware.h
@@ -29,7 +29,8 @@
 
 /* The low bandwidth call. The low word of edx is presumed clear. */
 #define VMWARE_HYPERCALL   \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\
+ "inl (%%dx), %%eax",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
 


Re: [PATCH v6 0/8] Emulated coherent graphics memory take 2

2019-10-21 Thread Thomas Hellstrom
On 10/14/19 3:22 PM, Thomas Hellström (VMware) wrote:
> From: Thomas Hellström 
>
> Graphics APIs like OpenGL 4.4 and Vulkan require the graphics driver
> to provide coherent graphics memory, meaning that the GPU sees any
> content written to the coherent memory on the next GPU operation that
> touches that memory, and the CPU sees any content written by the GPU
> to that memory immediately after any fence object trailing the GPU
> operation is signaled.
>
> Paravirtual drivers that otherwise require explicit synchronization
> needs to do this by hooking up dirty tracking to pagefault handlers
> and buffer object validation.
>
> Provide mm helpers needed for this and that also allow for huge pmd-
> and pud entries (patch 1-3), and the associated vmwgfx code (patch 4-7).
>
> The code has been tested and exercised by a tailored version of mesa
> where we disable all explicit synchronization and assume graphics memory
> is coherent. The performance loss varies of course; a typical number is
> around 5%.
>
> I would like to merge this code through the DRM tree, so an ack to include
> the new mm helpers in that merge would be greatly appreciated.
>
> Changes since RFC:
> - Merge conflict changes moved to the correct patch. Fixes intra-patchset
>   compile errors.
> - Be more aggressive when turning ttm vm code into helpers. This makes sure
>   we can use a const qualifier on the vmwgfx vm_ops.
> - Reinstate a lost comment an fix an error path that was broken when turning
>   the ttm vm code into helpers.
> - Remove explicit type-casts of struct vm_area_struct::vm_private_data
> - Clarify the locking inversion that makes us not being able to use the mm
>   pagewalk code.
>
> Changes since v1:
> - Removed the vmwgfx maintainer entry for as_dirty_helpers.c, updated
>   commit message accordingly
> - Removed the TTM patches from the series as they are merged separately
>   through DRM.
> Changes since v2:
> - Split out the pagewalk code from as_dirty_helpers.c and document locking.
> - Add pre_vma and post_vma callbacks to the pagewalk code.
> - Remove huge pmd and -pud asserts that would trip when we protect vmas with
>   struct address_space::i_mmap_rwsem rather than with
>   struct vm_area_struct::mmap_sem.
> - Do some naming cleanup in as_dirty_helpers.c
> Changes since v3:
> - Extensive renaming of the dirty helpers including the filename.
> - Update walk_page_mapping() doc.
> - Update the pagewalk code to not unconditionally split pmds if a pte_entry()
>   callback is present. Update the dirty helper pmd_entry accordingly.
> - Use separate walk ops for the dirty helpers.
> - Update the pagewalk code to take the pagetable lock in walk_pte_range.
> Changes since v4:
> - Fix pte pointer confusion in patch 2/8
> - Skip the pagewalk code conditional split patch for now, and update the
>   mapping_dirty_helper accordingly. That problem will be solved in a cleaner
>   way in a follow-up patchset.
> Changes since v5:
> - Fix tlb flushing when we have other pending tlb flushes.
>   
> Cc: Andrew Morton 
> Cc: Matthew Wilcox 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Rik van Riel 
> Cc: Minchan Kim 
> Cc: Michal Hocko 
> Cc: Huang Ying 
> Cc: Jérôme Glisse 
> Cc: Kirill A. Shutemov 
>
Kirill, Linus

I have a formal Ack for two of the four mm patches. Is there a chance I
can get an ack to merge the mm patches of this series through drm with
the vmwgfx patches?

Thanks,

Thomas




Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

2019-10-09 Thread Thomas Hellstrom
Hi,

On 10/9/19 7:17 PM, Linus Torvalds wrote:
> On Wed, Oct 9, 2019 at 10:03 AM Thomas Hellström (VMware)
>  wrote:
>> Nope, it handles the hugepages by ignoring them, since they should be
>> read-only, but if pmd_entry() was called with something else than a
>> hugepage, then it requests the fallback, but never a split.
> But  PAGE_WALK_FALLBACK _is_ a split.
>
> Oh, except you did this
>
> -   split_huge_pmd(walk->vma, pmd, addr);
> +   if (!ops->pmd_entry)
> +   split_huge_pmd(walk->vma, pmd, addr);
>
>
> so it avoids the split.
>
> No, that's unacceptable. And makes no sense anyway. If it doesn't
> split the pmd, then it shouldn't walk the pte's - because they don't
> exist. And if it's not a hugepmd, then the split is a no-op, so the
> test makes no sense.
>
> I hadn't noticed that part of the patch. That simply can't be right. I
> don't think you've tested this, because you never actually have
> hugepages, do you?
>
> You didn't notice or realize that split_huge_pmd() just does that
>
> if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)   \
> || pmd_devmap(*pmd))\
>
> thing and doesn't do anythign at all if it's not huge.
>
> So no. That code makes no sense at all, and I didn't realize how
> senseless it was, becasue I stupidly missed that "make the split
> conditional" - which is insane and wrong - and I thought that you
> wanted PAGE_WALK_FALLBACK to split a pmd and fall back to per-pte
> entries, which is what the name implies.
>
> But that's not what you wanted at all.
>
> Just get rid of PAGE_WALK_FALLBACK entirely then, and make the rule be
> that a zero return value just means "split and do ptes". Which is what
> you want (see above why "split" simply is wrong, and isn't an issue
> for you anyway.
>
> That won't change any existing cases, since even if they do have a
> zero return value, they don't have a pte_entry() callback, so they
> won't do that "split and do ptes" anyway.
>
>  Linus
>
Hmm, so we have the following cases we need to handle when returning
from the pmd_entry() handler.

1) Huge pmd was handled - Returns 0 and continues.
2) A pmd is otherwise unstable, typically someone just zapped a huge
pmd. Returns PAGE_WALK_FALLBACK, gets caught in the pmd_trans_unstable()
test and retries.
3) A pte directory - Returns PAGE_WALK_FALLBACK, falls through, avoids
the split and continues to the next level. Yeah that split avoidance
test is indeed made unnecessary by the preceding pmd_trans_unstable() test.

-   split_huge_pmd(walk->vma, pmd, addr);
+   if (!ops->pmd_entry)
+   split_huge_pmd(walk->vma, pmd, addr);

But as the commit message says, PAGE_WALK_FALLBACK is necessary to have
a virtual address range being handled once and only once. Therefore we
must distinguish between 1) and 2) since 2) must be retried until it's
handled correctly.

So we need the PAGE_WALK_FALLBACK. And if we instead were to combine 1)
and 3) in a single return value and use, for example PAGE_WALK_RETRY for
2)  the following could happen.

a) we handle the huge pmd and return 0 from pte_entry().
b) another process splits it.
c) we fall through to the pte level and handle the same address range
again...

So to summarize, yes the test in the code you cite is unnecessary. But
if we want to guarantee a virtual address range being handled once and
only once we need the PAGE_WALK_FALLBACK, (perhaps renamed). If not, we
can do without it similar to your original patch.

Thanks,

/Thomas





Re: [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges

2019-10-08 Thread Thomas Hellstrom
On 10/8/19 7:07 PM, Linus Torvalds wrote:
> On Tue, Oct 8, 2019 at 2:15 AM Thomas Hellström (VMware)
>  wrote:
>> Add two utilities to 1) write-protect and 2) clean all ptes pointing into
>> a range of an address space.
> The code looks much simpler and easier to understand now, I think, but
> that also makes some thing more obvious..
>
>> +static int clean_record_pte(pte_t *pte, unsigned long addr,
>> +   unsigned long end, struct mm_walk *walk)
>> +{
>> +   struct wp_walk *wpwalk = walk->private;
>> +   struct clean_walk *cwalk = to_clean_walk(wpwalk);
>> +   pte_t ptent = *pte;
>> +
>> +   if (pte_dirty(ptent)) {
>> +   pgoff_t pgoff = ((addr - walk->vma->vm_start) >> PAGE_SHIFT) 
>> +
>> +   walk->vma->vm_pgoff - cwalk->bitmap_pgoff;
>> +   pte_t old_pte = ptep_modify_prot_start(walk->vma, addr, pte);
>> +
>> +   ptent = pte_mkclean(old_pte);
>> +   ptep_modify_prot_commit(walk->vma, addr, pte, old_pte, 
>> ptent);
>> +
>> +   wpwalk->total++;
>> +   wpwalk->tlbflush_start = min(wpwalk->tlbflush_start, addr);
>> +   wpwalk->tlbflush_end = max(wpwalk->tlbflush_end,
>> +  addr + PAGE_SIZE);
>> +
>> +   __set_bit(pgoff, cwalk->bitmap);
> The above looks fundamentally racy.
>
> You clean the pte in memory, but it remains dirty and writable in the
> TLB, and you only flush it much later.
>
> So now writes can continue to happen to that page, without it
> necessarily being marked dirty again in the page tables, because all
> the CPU TLB caches say "it's already dirty".
>
> This may be ok - you've moved the diry bit into that bitmap, and you
> will flush the TLB before then taking action on the bitmap. So you
> haven't really _lost_ any dirty bits, but it still may be worth a
> comment.
>
> You do have comments about the _other_ issues (ie the whole "If a
> caller needs to make sure all dirty ptes are picked up and none
> additional are added..") but you don't actually have comments about
> the TLB race basically potentially now causing "missing" dirty stuff.
>
> And this may actually be a big problem on some architectures. Not x86,
> and maybe nothing else, but I have this dim memory of some
> architectures being unhappy due to virtual caches, and writeback when
> the page table entry says it's read-only and clean.
>
> Our normal "clean pages" is very anal about this, and does
>
> flush_cache_page(vma, address, pte_pfn(*pte));
> entry = ptep_clear_flush(vma, address, pte);
> entry = pte_wrprotect(entry);
> entry = pte_mkclean(entry);
> set_pte_at(vma->vm_mm, address, pte, entry);
>
> when it cleans a page, and I note both the cache flush and the
> "clear_flush()" (which invalidates the pte and does a synchronous TLB
> flush) and we have magic semantics for that at least on s390 because
> there are some low-level architecture details wrt flushing TLB entries
> and modifying them.
>
> End result: I think the code is probably ok in practice because you
> don't mind the slightly fuzzy dirty bit state, and it's _probably_ ok
> on all architectures that use drm for the TLB invalidation side. But I
> think it bears a bit of thinking about.

Yes, there's been some considerable thinking behind this. We do have the
cache flush in the pre_vma callback, and as you mention the TLB flush
happens before we use the bitmap: any pte that may be subject to a race
is recorded in the bitmap, and the guarantee we want to provide with the
function is

a) All ptes that are dirtied before the function starts executing will
be recorded in the bitmap.
b) All ptes dirtied after that will either be recorded in the bitmap,
remain dirty or both.

I probably should add that in the docs.

And the actual writeback happens asynchronously a lot later *should* be
OK, as long as caches are synced for the dma operation. Of course if
this somehow fails on a particular architecture I guess we need to
rethink and use ptep_clear_flush() at least on that architecture.

Thanks,
Thomas.





>
>   Linus
>



Re: [PATCH] x86/cpu/vmware: use the full form of inl in VMWARE_PORT

2019-10-07 Thread Thomas Hellstrom
On 10/7/19 9:21 PM, Sami Tolvanen wrote:
> LLVM's assembler doesn't accept the short form inl (%%dx) instruction,
> but instead insists on the output register to be explicitly specified:
>
>   :1:7: error: invalid operand for instruction
>   inl (%dx)
>  ^
>   LLVM ERROR: Error parsing inline asm
>
> Use the full form of the instruction to fix the build.
>
> Signed-off-by: Sami Tolvanen 

Acked-by: Thomas Hellstrom 

> ---
>  arch/x86/kernel/cpu/vmware.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
> index 9735139cfdf8..46d732696c1c 100644
> --- a/arch/x86/kernel/cpu/vmware.c
> +++ b/arch/x86/kernel/cpu/vmware.c
> @@ -49,7 +49,7 @@
>  #define VMWARE_CMD_VCPU_RESERVED 31
>  
>  #define VMWARE_PORT(cmd, eax, ebx, ecx, edx) \
> - __asm__("inl (%%dx)" :  \
> + __asm__("inl (%%dx), %%eax" :   \
>   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
>   "a"(VMWARE_HYPERVISOR_MAGIC),   \
>   "c"(VMWARE_CMD_##cmd),  \




Re: [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges

2019-10-03 Thread Thomas Hellstrom
On 10/2/19 10:28 PM, Linus Torvalds wrote:
> On Wed, Oct 2, 2019 at 12:09 PM Thomas Hellström (VMware)
>  wrote:
>> Yes I typically tend towards using a "namespace_object_operation" naming
>> scheme, with "as_dirty" being the namespace here,
> We discourage that kind of mindless namespacing for core stuff.
>
> It makes sense in a driver or a filesystem: when there are 20+
> different filesystems all implementing the same operation, you can't
> have descriptive and natural names that are unique. So having a
> namespace prefix for the filesystem or driver makes sense. But even
> then it tends ot be just a simple name, not the op it does.

Understood.

>
>> Looking at Matthew's suggestion but lining up with
>> "unmap_mapping_range()", perhaps we could use "clean_mapping_range" and
>> "wp_mapping_range"?
> Yes, I agree with Willy that "dirty" is kind of implicit when the
> operation is to clean something, so the above sounds sane to me.
>
> The wp part I'm not entirely sure about: you're not actually
> write-protecting the range. You're doing something different. You're
> only doing it for shared writable mappings.

Both the cleaning operation and the wp operation operate on shared
writable mappings, and since they are also both restricted to entries
that may take part in dirty-tracking (they're ignoring pmds and puds),
so perhaps a "dirty" may make sense anyway, and to point out the similarity:

clean_mapping_dirty_range() and wp_mapping_dirty_range()"?

>  And I'd rather see
> separate 'struct mm_walk_ops' than shared ones that then have a flag
> in a differfent structure to change behavior.
>
> Yes, yes, some of the levels share the same logic, but that just means
> that you can use the same function pointer for that level in the
> different "clean" vs "wp" mm_walk_op.

I think that this comment and this last one:

> Also, why do you loop inside the pmd_entry function, instead of just
> having a pte_entry function?"

are tied together. The pagewalk code is kind of awkward since if you
provide a pte_entry function,
then huge pmds are unconditionally split, even if handled in pmd_entry,
forcing pmd-aware users to handle also ptes in pmd_entry(). I'm not
entirely sure why, but it looks like it's to avoid a race where huge
pmds could otherwise be unintentionally split if appearing *after* the
pmd_entry() call. Other pagewalk users do the same here,  see for
example clear_refs_pte_range();

https://elixir.bootlin.com/linux/latest/source/fs/proc/task_mmu.c#L1040

Also the pagewalk walk_pte_range() for some reason doesn't take the page
table lock, which means that a pte may well be cleared under us while we
have it cached for modification.

For these reasons we can't use the pte_entry, even internally and this
means we have three choices:

a) Selecting the pte function using a bool. Saves code and avoids extra
indirect function call.
b) Duplicating the pmd_entry with a different pte function, also
duplicating the ops. Avoids extra indirect function call but some extra
code.
c) Instead of the bool, use another function pointer in yet another ops
pointed to by the private structure. Saves some code.

I opted for a) here, but I can of course change that if needed?

>
> Also, looking closer at this patch, this makes me go "Whaa?":
>
> +   pte = (mm == _mm) ?
> +   pte_offset_kernel(pmd, addr) :
> +   pte_offset_map_lock(mm, pmd, addr, );
>
> because I don't think that's sensible. When do you have a vma in kernel space?
>
> Also, why do you loop inside the pmd_entry function, instead of just
> having a pte_entry function?

Yes, that was a blind copy from v1 of the code that used
"apply_to_page_range()". I'll fix that up.

Thanks,

/Thomas

>
>Linus
>



Re: [PATCH v3 1/2] x86: Don't let pgprot_modify() change the page encryption bit

2019-09-30 Thread Thomas Hellstrom
Hi,

On 9/18/19 7:57 PM, Dave Hansen wrote:
> On 9/17/19 6:01 AM, Thomas Hellström (VMware) wrote:
>> diff --git a/arch/x86/include/asm/pgtable_types.h 
>> b/arch/x86/include/asm/pgtable_types.h
>> index b5e49e6bac63..8267dd426b15 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -123,7 +123,7 @@
>>   */
>>  #define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | 
>> \
>>   _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
>> - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
>> + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC)
>>  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
> My only nit with what remains is that it expands the infestation of
> things that look like a simple macro but are not.
>
> I'm debating whether we want to go fix that now, though.
>
Any chance for an ack on this? It's really a small change that, as we've
found out, fixes an existing problem.

Thanks,

Thomas





Re: [PATCH 7/8] drm/vmwgfx: switch to own vma manager

2019-09-05 Thread Thomas Hellstrom
On Thu, 2019-09-05 at 09:05 +0200, Gerd Hoffmann wrote:
> Add struct drm_vma_offset_manager to vma_private, initialize it and
> pass it to ttm_bo_device_init().
> 
> With this in place the last user of ttm's embedded vma offset manager
> is gone and we can remove it (in a separate patch).
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1 +
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> 

Reviewed-by: Thomas Hellström 

I assume this will be merged through drm-misc?

/Thomas



[tip: x86/vmware] x86/vmware: Add a header file for hypercall definitions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: b4dd4f6e3648dfd66576515fd885a9a765c0
Gitweb:
https://git.kernel.org/tip/b4dd4f6e3648dfd66576515fd885a9a765c0
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:51 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:32:06 +02:00

x86/vmware: Add a header file for hypercall definitions

The new header is intended to be used by drivers using the backdoor.
Follow the KVM example using alternatives self-patching to choose
between vmcall, vmmcall and io instructions.

Also define two new CPU feature flags to indicate hypervisor support
for vmcall- and vmmcall instructions. The new XF86_FEATURE_VMW_VMMCALL
flag is needed because using XF86_FEATURE_VMMCALL might break QEMU/KVM
setups using the vmmouse driver. They rely on XF86_FEATURE_VMMCALL
on AMD to get the kvm_hypercall() right. But they do not yet implement
vmmcall for the VMware hypercall used by the vmmouse driver.

 [ bp: reflow hypercall %edx usage explanation comment. ]

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Cc: Aaron Lewis 
Cc: "David S. Miller" 
Cc: Fenghua Yu 
Cc: Greg Kroah-Hartman 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Josh Poimboeuf 
Cc: linux-graphics-maintai...@vmware.com
Cc: Mauro Carvalho Chehab 
Cc: Nicolas Ferre 
Cc: Robert Hoo 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-3-thomas...@shipmail.org
---
 MAINTAINERS|  1 +-
 arch/x86/include/asm/cpufeatures.h |  2 +-
 arch/x86/include/asm/vmware.h  | 53 +-
 arch/x86/kernel/cpu/vmware.c   |  6 ++-
 4 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/vmware.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cbcf16..47efc1b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17206,6 +17206,7 @@ M:  "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c
+F: arch/x86/include/asm/vmware.h
 
 VMWARE PVRDMA DRIVER
 M: Adit Ranadive 
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index e880f24..aaeae2f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -232,6 +232,8 @@
 #define X86_FEATURE_VMMCALL( 8*32+15) /* Prefer VMMCALL to VMCALL 
*/
 #define X86_FEATURE_XENPV  ( 8*32+16) /* "" Xen paravirtual guest 
*/
 #define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table 
access-dirty bit */
+#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports 
the VMCALL instruction */
+#define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
new file mode 100644
index 000..e00c9e8
--- /dev/null
+++ b/arch/x86/include/asm/vmware.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+#ifndef _ASM_X86_VMWARE_H
+#define _ASM_X86_VMWARE_H
+
+#include 
+#include 
+
+/*
+ * The hypercall definitions differ in the low word of the %edx argument
+ * in the following way: the old port base interface uses the port
+ * number to distinguish between high- and low bandwidth versions.
+ *
+ * The new vmcall interface instead uses a set of flags to select
+ * bandwidth mode and transfer direction. The flags should be loaded
+ * into %dx by any user and are automatically replaced by the port
+ * number if the VMWARE_HYPERVISOR_PORT method is used.
+ *
+ * In short, new driver code should strictly use the new definition of
+ * %dx content.
+ */
+
+/* Old port-based version */
+#define VMWARE_HYPERVISOR_PORT"0x5658"
+#define VMWARE_HYPERVISOR_PORT_HB "0x5659"
+
+/* Current vmcall / vmmcall version */
+#define VMWARE_HYPERVISOR_HB   BIT(0)
+#define VMWARE_HYPERVISOR_OUT  BIT(1)
+
+/* The low bandwidth call. The low word of edx is presumed clear. */
+#define VMWARE_HYPERCALL   \
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \
+ "vmcall", X86_FEATURE_VMCALL, \
+ "vmmcall", X86_FEATURE_VMW_VMMCALL)
+
+/*
+ * The high bandwidth out call. The low word of edx is presumed to have the
+ * HB and OUT bits set.
+ */
+#define VMWARE_HYPERCALL_HB_OUT
\
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_

[tip: x86/vmware] x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls

2019-08-28 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: bac7b4e843232a3a49a042410cf743341eb0887e
Gitweb:
https://git.kernel.org/tip/bac7b4e843232a3a49a042410cf743341eb0887e
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:50 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 10:48:30 +02:00

x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls

Vmware has historically used an INL instruction for this, but recent
hardware versions support using VMCALL/VMMCALL instead, so use this
method if supported at platform detection time. Explicitly code separate
macro versions since the alternatives self-patching has not been
performed at platform detection time.

Also put tighter constraints on the assembly input parameters.

Co-developed-by: Doug Covelli 
Signed-off-by: Doug Covelli 
Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: linux-graphics-maintai...@vmware.com
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-2-thomas...@shipmail.org
---
 arch/x86/kernel/cpu/vmware.c | 88 ---
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 3c64847..757dded 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -34,30 +34,65 @@
 #undef pr_fmt
 #define pr_fmt(fmt)"vmware: " fmt
 
-#define CPUID_VMWARE_INFO_LEAF 0x4000
+#define CPUID_VMWARE_INFO_LEAF   0x4000
+#define CPUID_VMWARE_FEATURES_LEAF   0x4010
+#define CPUID_VMWARE_FEATURES_ECX_VMMCALLBIT(0)
+#define CPUID_VMWARE_FEATURES_ECX_VMCALL BIT(1)
+
 #define VMWARE_HYPERVISOR_MAGIC0x564D5868
 #define VMWARE_HYPERVISOR_PORT 0x5658
 
-#define VMWARE_PORT_CMD_GETVERSION 10
-#define VMWARE_PORT_CMD_GETHZ  45
-#define VMWARE_PORT_CMD_GETVCPU_INFO   68
-#define VMWARE_PORT_CMD_LEGACY_X2APIC  3
-#define VMWARE_PORT_CMD_VCPU_RESERVED  31
+#define VMWARE_CMD_GETVERSION10
+#define VMWARE_CMD_GETHZ 45
+#define VMWARE_CMD_GETVCPU_INFO  68
+#define VMWARE_CMD_LEGACY_X2APIC  3
+#define VMWARE_CMD_VCPU_RESERVED 31
 
 #define VMWARE_PORT(cmd, eax, ebx, ecx, edx)   \
__asm__("inl (%%dx)" :  \
-   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
-   "0"(VMWARE_HYPERVISOR_MAGIC),   \
-   "1"(VMWARE_PORT_CMD_##cmd), \
-   "2"(VMWARE_HYPERVISOR_PORT), "3"(UINT_MAX) :\
-   "memory");
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :\
+   "memory")
+
+#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx) \
+   __asm__("vmcall" :  \
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(0), "b"(UINT_MAX) : \
+   "memory")
+
+#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx) \
+   __asm__("vmmcall" : \
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(0), "b"(UINT_MAX) : \
+   "memory")
+
+#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do {   \
+   switch (vmware_hypercall_mode) {\
+   case CPUID_VMWARE_FEATURES_ECX_VMCALL:  \
+   VMWARE_VMCALL(cmd, eax, ebx, ecx, edx); \
+   break;  \
+   case CPUID_VMWARE_FEATURES_ECX_VMMCALL: \
+   VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);\
+ 

[tip: x86/vmware] drm/vmwgfx: Update the backdoor call with support for new instructions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: 6abe3778cf5abd59b23b9037796f3eab8b7f1d98
Gitweb:
https://git.kernel.org/tip/6abe3778cf5abd59b23b9037796f3eab8b7f1d98
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:52 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:36:46 +02:00

drm/vmwgfx: Update the backdoor call with support for new instructions

Use the definition provided by include/asm/vmware.h

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Acked-by: Dave Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: pv-driv...@vmware.com
Cc: Thomas Gleixner 
Cc: VMware Graphics 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-4-thomas...@shipmail.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 21 -
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.h | 35 ++--
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
index 59e9d05..b1df3e3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
@@ -46,8 +46,6 @@
 #define RETRIES 3
 
 #define VMW_HYPERVISOR_MAGIC0x564D5868
-#define VMW_HYPERVISOR_PORT 0x5658
-#define VMW_HYPERVISOR_HB_PORT  0x5659
 
 #define VMW_PORT_CMD_MSG30
 #define VMW_PORT_CMD_HB_MSG 0
@@ -93,7 +91,7 @@ static int vmw_open_channel(struct rpc_channel *channel, 
unsigned int protocol)
 
VMW_PORT(VMW_PORT_CMD_OPEN_CHANNEL,
(protocol | GUESTMSG_FLAG_COOKIE), si, di,
-   VMW_HYPERVISOR_PORT,
+   0,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -126,7 +124,7 @@ static int vmw_close_channel(struct rpc_channel *channel)
 
VMW_PORT(VMW_PORT_CMD_CLOSE_CHANNEL,
0, si, di,
-   (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)),
+   channel->channel_id << 16,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -160,7 +158,8 @@ static unsigned long vmw_port_hb_out(struct rpc_channel 
*channel,
VMW_PORT_HB_OUT(
(MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG,
msg_len, si, di,
-   VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16),
+   VMWARE_HYPERVISOR_HB | (channel->channel_id << 16) |
+   VMWARE_HYPERVISOR_OUT,
VMW_HYPERVISOR_MAGIC, bp,
eax, ebx, ecx, edx, si, di);
 
@@ -181,7 +180,7 @@ static unsigned long vmw_port_hb_out(struct rpc_channel 
*channel,
 
VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_SENDPAYLOAD << 16),
 word, si, di,
-VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+channel->channel_id << 16,
 VMW_HYPERVISOR_MAGIC,
 eax, ebx, ecx, edx, si, di);
}
@@ -213,7 +212,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel 
*channel, char *reply,
VMW_PORT_HB_IN(
(MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG,
reply_len, si, di,
-   VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16),
+   VMWARE_HYPERVISOR_HB | (channel->channel_id << 16),
VMW_HYPERVISOR_MAGIC, bp,
eax, ebx, ecx, edx, si, di);
 
@@ -230,7 +229,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel 
*channel, char *reply,
 
VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_RECVPAYLOAD << 16),
 MESSAGE_STATUS_SUCCESS, si, di,
-VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+channel->channel_id << 16,
 VMW_HYPERVISOR_MAGIC,
 eax, ebx, ecx, edx, si, di);
 
@@ -269,7 +268,7 @@ static int vmw_send_msg(struct rpc_channel *channel, const 
char *msg)
 
VMW_PORT(VMW_PORT_CMD_SENDSIZE,
msg_len, si, di,
-   VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+   channel->channel_id << 16,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -327,7 +326,7 @@ static int vmw_recv_msg(struct rpc_channel *channel, void 
**msg,
 
VMW_PORT(VMW_PORT_CMD_RECVSIZE,
0, si, di,
-   (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)),
+

[tip: x86/vmware] input/vmmouse: Update the backdoor call with support for new instructions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom
The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: f7b15c74cffd760ec9959078982d8268a38456c4
Gitweb:
https://git.kernel.org/tip/f7b15c74cffd760ec9959078982d8268a38456c4
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:53 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:43:01 +02:00

input/vmmouse: Update the backdoor call with support for new instructions

Use the definition provided by include/asm/vmware.h.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Acked-by: Dmitry Torokhov 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: linux-in...@vger.kernel.org
Cc: Thomas Gleixner 
Cc: VMware Graphics 
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-5-thomas...@shipmail.org
---
 drivers/input/mouse/vmmouse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c
index 871e5b5..148245c 100644
--- a/drivers/input/mouse/vmmouse.c
+++ b/drivers/input/mouse/vmmouse.c
@@ -16,12 +16,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "psmouse.h"
 #include "vmmouse.h"
 
 #define VMMOUSE_PROTO_MAGIC0x564D5868U
-#define VMMOUSE_PROTO_PORT 0x5658
 
 /*
  * Main commands supported by the vmmouse hypervisor port.
@@ -84,7 +84,7 @@ struct vmmouse_data {
 #define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4)  \
 ({ \
unsigned long __dummy1, __dummy2;   \
-   __asm__ __volatile__ ("inl %%dx" :  \
+   __asm__ __volatile__ (VMWARE_HYPERCALL :\
"=a"(out1), \
"=b"(out2), \
"=c"(out3), \
@@ -94,7 +94,7 @@ struct vmmouse_data {
"a"(VMMOUSE_PROTO_MAGIC),   \
"b"(in1),   \
"c"(VMMOUSE_PROTO_CMD_##cmd),   \
-   "d"(VMMOUSE_PROTO_PORT) :   \
+   "d"(0) :\
"memory");  \
 })
 


Re: cleanup the walk_page_range interface

2019-08-08 Thread Thomas Hellstrom

On 8/8/19 11:56 PM, Christoph Hellwig wrote:

On Thu, Aug 08, 2019 at 10:50:37AM -0700, Linus Torvalds wrote:

Note that both Thomas and Steven have series touching this area pending,
and there are a couple consumer in flux too - the hmm tree already
conflicts with this series, and I have potential dma changes on top of
the consumers in Thomas and Steven's series, so we'll probably need a
git tree similar to the hmm one to synchronize these updates.

I'd be willing to just merge this now, if that helps. The conversion
is mechanical, and my only slight worry would be that at least for my
original patch I didn't build-test the (few) non-x86
architecture-specific cases. But I did end up looking at them fairly
closely  (basically using some grep/sed scripts to see that the
conversions I did matched the same patterns). And your changes look
like obvious improvements too where any mistake would have been caught
by the compiler.

I did cross compile the s390 and powerpc bits, but I do not have an
openrisc compiler.


So I'm not all that worried from a functionality standpoint, and if
this will help the next merge window, I'll happily pull now.

That would help with this series vs the others, but not with the other
series vs each other.


Although my series doesn't touch the pagewalk code, it rather borrowed 
some concepts from it and used for the apply_to_page_range() interface.


The reason being that the pagewalk code requires the mmap_sem to be held 
(mainly for trans-huge pages and reading the vma->vm_flags if I 
understand the code correctly). That is fine when you scan the vmas of a 
process, but the helpers I wrote need to instead scan all vmas pointing 
into a struct address_space, and taking the mmap_sem for each vma will 
create lock inversion problems.


/Thomas




Re: [PATCH 2/3] pagewalk: seperate function pointers from iterator data

2019-08-08 Thread Thomas Hellstrom

On 8/8/19 5:42 PM, Christoph Hellwig wrote:

The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we
are changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Signed-off-by: Christoph Hellwig 
---


Typo: For the patch title s/seperate/separate/

Otherwise for the series

Reviewed-by: Thomas Hellstrom 

/Thomas




[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

2019-07-16 Thread tip-bot for Thomas Hellstrom
Commit-ID:  406de552c2be6ded524c75d14a73cf7f027f587e
Gitweb: https://git.kernel.org/tip/406de552c2be6ded524c75d14a73cf7f027f587e
Author: Thomas Hellstrom 
AuthorDate: Thu, 28 Mar 2019 12:06:37 +
Committer:  Thomas Gleixner 
CommitDate: Wed, 17 Jul 2019 00:42:27 +0200

MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

Alok Kataria will be handing over VMware's maintainership of these
interfaces to Thomas Hellström, with pv-drivers as backup contact.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Thomas Gleixner 
Acked-by: Alok Kataria 
Acked-by: Juergen Gross 
Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com

---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..80fa7a4a0b56 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12074,7 +12074,8 @@ F:  Documentation/parport*.txt
 
 PARAVIRT_OPS INTERFACE
 M: Juergen Gross 
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
@@ -17087,7 +17088,8 @@ S:  Maintained
 F: drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M: Alok Kataria 
+M:     Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c


[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

2019-07-16 Thread tip-bot for Thomas Hellstrom
Commit-ID:  907cb11da7a725445dccc6c2ca2d428739f6cd71
Gitweb: https://git.kernel.org/tip/907cb11da7a725445dccc6c2ca2d428739f6cd71
Author: Thomas Hellstrom 
AuthorDate: Thu, 28 Mar 2019 12:06:37 +
Committer:  Thomas Gleixner 
CommitDate: Tue, 16 Jul 2019 23:13:50 +0200

MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

Alok Kataria will be handing over VMware's maintainership of these
interfaces to Thomas Hellström, with pv-drivers as backup contact.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Thomas Gleixner 
Acked-by: Alok Kataria 
Acked-by: Juergen Gross 
Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com
---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..80fa7a4a0b56 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12074,7 +12074,8 @@ F:  Documentation/parport*.txt
 
 PARAVIRT_OPS INTERFACE
 M: Juergen Gross 
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
@@ -17087,7 +17088,8 @@ S:  Maintained
 F: drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M: Alok Kataria 
+M:     Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c


Re: [PATCH] drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read

2019-05-20 Thread Thomas Hellstrom
Thanks, Murray,

I'll include in the next vmwgfx-fixes pull request.

On Mon, 2019-05-20 at 21:57 +1200, Murray McAllister wrote:
> If SVGA_3D_CMD_DX_SET_SHADER is called with a shader ID
> of SVGA3D_INVALID_ID, and a shader type of
> SVGA3D_SHADERTYPE_INVALID, the calculated binding.shader_slot
> will be 4294967295, leading to an out-of-bounds read in
> vmw_binding_loc()
> when the offset is calculated.
> 
> Signed-off-by: Murray McAllister 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> index 2ff7ba04d8c8..9aeb5448cfc1 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> @@ -2193,7 +2193,8 @@ static int vmw_cmd_dx_set_shader(struct
> vmw_private *dev_priv,
>  
>   cmd = container_of(header, typeof(*cmd), header);
>  
> - if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX) {
> + if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX ||
> + cmd->body.type < SVGA3D_SHADERTYPE_MIN) {
>   VMW_DEBUG_USER("Illegal shader type %u.\n",
>  (unsigned int) cmd->body.type);
>   return -EINVAL;


[PATCH 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks

2019-04-12 Thread Thomas Hellstrom
Add the callbacks necessary to implement emulated coherent memory for
surfaces. Add a flag to the gb_surface_create ioctl to indicate that
surface memory should be coherent.
Also bump the drm minor version to signal the availability of coherent
surfaces.

Signed-off-by: Thomas Hellstrom 
---
 .../device_include/svga3d_surfacedefs.h   | 209 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c   | 390 +-
 include/uapi/drm/vmwgfx_drm.h |   4 +-
 4 files changed, 600 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h 
b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
index f2bfd3d80598..d901206c04e3 100644
--- a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
+++ b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
@@ -1280,7 +1280,6 @@ svga3dsurface_get_pixel_offset(SVGA3dSurfaceFormat format,
return offset;
 }
 
-
 static inline u32
 svga3dsurface_get_image_offset(SVGA3dSurfaceFormat format,
   surf_size_struct baseLevelSize,
@@ -1375,4 +1374,212 @@ 
svga3dsurface_is_screen_target_format(SVGA3dSurfaceFormat format)
return svga3dsurface_is_dx_screen_target_format(format);
 }
 
+/**
+ * struct svga3dsurface_mip - Mimpmap level information
+ * @bytes: Bytes required in the backing store of this mipmap level.
+ * @img_stride: Byte stride per image.
+ * @row_stride: Byte stride per block row.
+ * @size: The size of the mipmap.
+ */
+struct svga3dsurface_mip {
+   size_t bytes;
+   size_t img_stride;
+   size_t row_stride;
+   struct drm_vmw_size size;
+
+};
+
+/**
+ * struct svga3dsurface_cache - Cached surface information
+ * @desc: Pointer to the surface descriptor
+ * @mip: Array of mipmap level information. Valid size is @num_mip_levels.
+ * @mip_chain_bytes: Bytes required in the backing store for the whole chain
+ * of mip levels.
+ * @num_mip_levels: Valid size of the @mip array. Number of mipmap levels in
+ * a chain.
+ * @num_layers: Number of slices in an array texture or number of faces in
+ * a cubemap texture.
+ */
+struct svga3dsurface_cache {
+   const struct svga3d_surface_desc *desc;
+   struct svga3dsurface_mip mip[DRM_VMW_MAX_MIP_LEVELS];
+   size_t mip_chain_bytes;
+   u32 num_mip_levels;
+   u32 num_layers;
+};
+
+/**
+ * struct svga3dsurface_loc - Surface location
+ * @sub_resource: Surface subresource. Defined as layer * num_mip_levels +
+ * mip_level.
+ * @x: X coordinate.
+ * @y: Y coordinate.
+ * @z: Z coordinate.
+ */
+struct svga3dsurface_loc {
+   u32 sub_resource;
+   u32 x, y, z;
+};
+
+/**
+ * svga3dsurface_subres - Compute the subresource from layer and mipmap.
+ * @cache: Surface layout data.
+ * @mip_level: The mipmap level.
+ * @layer: The surface layer (face or array slice).
+ *
+ * Return: The subresource.
+ */
+static inline u32 svga3dsurface_subres(const struct svga3dsurface_cache *cache,
+  u32 mip_level, u32 layer)
+{
+   return cache->num_mip_levels * layer + mip_level;
+}
+
+/**
+ * svga3dsurface_setup_cache - Build a surface cache entry
+ * @size: The surface base level dimensions.
+ * @format: The surface format.
+ * @num_mip_levels: Number of mipmap levels.
+ * @num_layers: Number of layers.
+ * @cache: Pointer to a struct svga3dsurface_cach object to be filled in.
+ */
+static inline void svga3dsurface_setup_cache(const struct drm_vmw_size *size,
+SVGA3dSurfaceFormat format,
+u32 num_mip_levels,
+u32 num_layers,
+u32 num_samples,
+struct svga3dsurface_cache *cache)
+{
+   const struct svga3d_surface_desc *desc;
+   u32 i;
+
+   memset(cache, 0, sizeof(*cache));
+   cache->desc = desc = svga3dsurface_get_desc(format);
+   cache->num_mip_levels = num_mip_levels;
+   cache->num_layers = num_layers;
+   for (i = 0; i < cache->num_mip_levels; i++) {
+   struct svga3dsurface_mip *mip = >mip[i];
+
+   mip->size = svga3dsurface_get_mip_size(*size, i);
+   mip->bytes = svga3dsurface_get_image_buffer_size
+   (desc, >size, 0) * num_samples;
+   mip->row_stride =
+   __KERNEL_DIV_ROUND_UP(mip->size.width,
+ desc->block_size.width) *
+   desc->bytes_per_block * num_samples;
+   mip->img_stride =
+   __KERNEL_DIV_ROUND_UP(mip->size.height,
+ desc->block_size.height) *
+   mip->row_stride;
+   cache->mip_chain_bytes += mip-&g

Re: [PATCH v2] lib/scatterlist: Provide a DMA page iterator

2019-02-07 Thread Thomas Hellstrom
On Thu, 2019-02-07 at 22:26 +, Jason Gunthorpe wrote:
> Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists
> w/o
> backing pages") introduced the sg_page_iter_dma_address() function
> without
> providing a way to use it in the general case. If the sg_dma_len() is
> not
> equal to the sg length callers cannot safely use the
> for_each_sg_page/sg_page_iter_dma_address combination.
> 
> Resolve this API mistake by providing a DMA specific iterator,
> for_each_sg_dma_page(), that uses the right length so
> sg_page_iter_dma_address() works as expected with all sglists.
> 
> A new iterator type is introduced to provide compile-time safety
> against
> wrongly mixing accessors and iterators.
> 
> Acked-by: Christoph Hellwig  (for scatterlist)
> Signed-off-by: Jason Gunthorpe 
> 

For the vmwgfx part, 
Acked-by: Thomas Hellstrom 

I'll take a deeper look to provide a vmwgfx fix as a follow up.

Thanks,
Thomas



Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs v2

2019-02-04 Thread Thomas Hellstrom
On Mon, 2019-02-04 at 13:11 +0100, Thomas Hellström wrote:
> On Mon, 2019-02-04 at 09:19 +0100, Christoph Hellwig wrote:
> > On Fri, Jan 25, 2019 at 09:12:13AM +0100, Thomas Hellstrom wrote:
> > > -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> > > - /*
> > > -  * No coherent page pool
> > > -  */
> > > - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> > > + /* No TTM coherent page pool? FIXME: Ask TTM instead! */
> > > + if (!(IS_ENABLED(CONFIG_SWIOTLB) ||
> > > IS_ENABLED(CONFIG_INTEL_IOMMU)) &&
> > > + (dev_priv->map_mode == vmw_dma_alloc_coherent))
> > >   return -EINVAL;
> > > -#endif
> > > +
> > 
> > I don't think this edited in change makes any sense.  The swiotlb
> > vs
> > dma-direct versions of dma_alloc_coherent are the same, so this
> > check
> > seems very obsfucating.
> 
> So this part of code is identical in functionality to the previous
> version. It checks whether the TTM module has the coherent page pool
> enabled. (an identical test is present in TTM). What we *really* need
> to do here instead is to ask TTM whether it has enabled its coherent
> page pool instead of trying to mimic TTM's test, and I have a
> changeset
> under review for that. But as mentioned previously, I don't want to
> change the TTM interface outside of a merge window, so we either have
> to live with the above for 5.0 or keep the old defines. I'd prefer
> the
> former so I don't have to respin the patch series once more.
> 
> Thanks,
> Thoams
> 

Hi, Christoph,

I need to get this merged this week. Could you please ack or ack
removing this hunk + updating the following patches for merge errors?

If no response, I'll add a Cc: tag on the patch and a #v1 to your s-o-
b.

Thanks,
Thomas




Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-17 Thread Thomas Hellstrom
On Thu, 2019-01-17 at 10:30 +0100, h...@lst.de wrote:
> On Wed, Jan 16, 2019 at 10:24:36AM -0700, Jason Gunthorpe wrote:
> > The fact is there is 0 industry interest in using RDMA on platforms
> > that can't do HW DMA cache coherency - the kernel syscalls required
> > to
> > do the cache flushing on the IO path would just destroy performance
> > to
> > the point of making RDMA pointless. Better to use netdev on those
> > platforms.
> 
> In general there is no syscall required for doing cache flushing, you
> just issue the proper instructions directly from userspace.

But what if there are other coherence issues? Like bounce-buffers?
I'd like to +1 on what Jason says about industry interest: FWIW, vmwgfx
is probably one of the graphics drivers that would lend itself best to
do a fully-dma-interface compliant graphics stack experiment. But being
a paravirtual driver, all platforms we can ever run on are fully
coherent unless someone introduces a fake incoherency by forcing
swiotlb. Putting many man-months of effort into supporting systems on
which we would never run on and can never test on can never make more
than academic sense.

>  
> 
> > The reality is that *all* the subsytems doing DMA kernel bypass are
> > ignoring the DMA mapping rules, I think we should support this
> > better,
> > and just accept that user space DMA will not be using syncing.
> > Block
> > access in cases when this is required, otherwise let it work as is
> > today.
> 
> In that case we just need to block userspace DMA access entirely.
> Which given the amount of problems it creates sounds like a pretty
> good idea anyway.

I'm not sure I'm following you here. Are you suggesting scratching
support for all (GP)GPU- and RDMA drivers?

Thanks,
Thomas




Re: [PATCH v4 2/3] locking: Implement an algorithm choice for Wound-Wait mutexes

2019-01-16 Thread Thomas Hellstrom
Hi,

On Wed, 2019-01-16 at 09:24 -0500, Rob Clark wrote:
> So, I guess this is to do w/ the magic of merge commits, but it looks
> like the hunk changing the crtc_ww_class got lost:

So what happened here is that this commit changed it to 
DEFINE_WD_CLASS
and the following commit changed it back again to
DEFINE_WW_CLASS
so that the algorithm change and only that came with the last commit. 
Apparently the history of that line got lost when merging since the
merge didn't change it; git blame doesn't show it changed, but when
looking at the commit history in gitk it's all clear. I guess that's a
feature of git.

The modeset locks now use true wound-wait rather than wait-die.

/Thomas

> 
>  ~/src/linux   master  git show --pretty=short
> 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> drivers/gpu/drm/drm_modeset_lock.c
> commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> Author: Thomas Hellstrom 
> 
> locking: Implement an algorithm choice for Wound-Wait mutexes
> 
> diff --git a/drivers/gpu/drm/drm_modeset_lock.c
> b/drivers/gpu/drm/drm_modeset_lock.c
> index 8a5100685875..638be2eb67b4 100644
> --- a/drivers/gpu/drm/drm_modeset_lock.c
> +++ b/drivers/gpu/drm/drm_modeset_lock.c
> @@ -70,7 +70,7 @@
>   * lists and lookup data structures.
>   */
> 
> -static DEFINE_WW_CLASS(crtc_ww_class);
> +static DEFINE_WD_CLASS(crtc_ww_class);
> 
>  /**
>   * drm_modeset_lock_all - take all modeset locks
>  ~/src/linux   master  git log --pretty=format:%H
> drivers/gpu/drm/drm_modeset_lock.c  | grep
> 08295b3b5beec9aac0f7a9db86f0fc3792039da3
>  ~/src/linux   master  1 
> 
> 
> BR,
> -R
> 
> 
> On Tue, Jun 19, 2018 at 4:29 AM Thomas Hellstrom <
> thellst...@vmware.com> wrote:
> > The current Wound-Wait mutex algorithm is actually not Wound-Wait
> > but
> > Wait-Die. Implement also Wound-Wait as a per-ww-class choice.
> > Wound-Wait
> > is, contrary to Wait-Die a preemptive algorithm and is known to
> > generate
> > fewer backoffs. Testing reveals that this is true if the
> > number of simultaneous contending transactions is small.
> > As the number of simultaneous contending threads increases, Wait-
> > Wound
> > becomes inferior to Wait-Die in terms of elapsed time.
> > Possibly due to the larger number of held locks of sleeping
> > transactions.
> > 
> > Update documentation and callers.
> > 
> > Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> > tag patch-18-06-15
> > 
> > Each thread runs 10 batches of lock / unlock 800 ww mutexes
> > randomly
> > chosen out of 10. Four core Intel x86_64:
> > 
> > Algorithm#threads   Rollbacks  time
> > Wound-Wait   4  ~100   ~17s.
> > Wait-Die 4  ~15~19s.
> > Wound-Wait   16 ~36~109s.
> > Wait-Die 16 ~45~82s.
> > 
> > Cc: Ingo Molnar 
> > Cc: Jonathan Corbet 
> > Cc: Gustavo Padovan 
> > Cc: Maarten Lankhorst 
> > Cc: Sean Paul 
> > Cc: David Airlie 
> > Cc: Davidlohr Bueso 
> > Cc: "Paul E. McKenney" 
> > Cc: Josh Triplett 
> > Cc: Thomas Gleixner 
> > Cc: Kate Stewart 
> > Cc: Philippe Ombredanne 
> > Cc: Greg Kroah-Hartman 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Co-authored-by: Peter Zijlstra 
> > Signed-off-by: Thomas Hellstrom 
> > 
> > ---
> > v2:
> > * Update API according to review comment by Greg Kroah-Hartman.
> > * Address review comments by Peter Zijlstra:
> >   - Avoid _Bool in composites
> >   - Fix typo
> >   - Use __mutex_owner() where applicable
> >   - Rely on built-in barriers for the main loop exit condition,
> > struct ww_acquire_ctx::wounded. Update code comments.
> >   - Explain unlocked use of list_empty().
> > v3:
> > * Adapt to and incorporate cleanup by Peter Zijlstra
> > * Remove unlocked use of list_empty().
> > v4:
> > * Move code related to adding a waiter to the lock waiter list to a
> >   separate function.
> > ---
> >  Documentation/locking/ww-mutex-design.txt |  57 +--
> >  drivers/dma-buf/reservation.c |   2 +-
> >  drivers/gpu/drm/drm_modeset_lock.c|   2 +-
> >  include/linux/ww_mutex.h  |  17 ++-
> >  kernel/locking/locktorture.c  |   2 +-
> >  kernel/locking/mutex.c| 165
> > +++---
> >  kernel/locking/test-ww_mutex.c|   2 +-
> >  lib/locking-selftest.c

Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom
On Tue, 2019-01-15 at 21:58 +0100, h...@lst.de wrote:
> On Tue, Jan 15, 2019 at 07:13:11PM +, Koenig, Christian wrote:
> > Thomas is correct that the interface you propose here doesn't work
> > at 
> > all for GPUs.
> > 
> > The kernel driver is not informed of flush/sync, but rather just
> > setups 
> > coherent mappings between system memory and devices.
> > 
> > In other words you have an array of struct pages and need to map
> > that to 
> > a specific device and so create dma_addresses for the mappings.
> 
> If you want a coherent mapping you need to use dma_alloc_coherent
> and dma_mmap_coherent and you are done, that is not the problem.
> That actually is one of the vmgfx modes, so I don't understand what
> problem we are trying to solve if you don't actually want a non-
> coherent mapping. 

For vmwgfx, not making dma_alloc_coherent default has a couple of
reasons:
1) Memory is associated with a struct device. It has not been clear
that it is exportable to other devices.
2) There seems to be restrictions in the system pages allowable. GPUs
generally prefer highmem pages but dma_alloc_coherent returns a virtual
address implying GFP_KERNEL? While not used by vmwgfx, TTM typically
prefers HIGHMEM pages to facilitate caching mode switching without
having to touch the kernel map. 
3) Historically we had APIs to allow coherent access to user-space
defined pages. That has gone away not but the infrastructure was built
around it.

dma_mmap_coherent isn't use because as the data moves between system
memory, swap and VRAM, PTEs of user-space mappings are adjusted
accordingly, meaning user-space doesn't have to unmap when an operation
is initiated that might mean the data is moved.


> Although last time I had that discussion with Daniel Vetter
> I was under the impressions that GPUs really wanted non-coherent
> mappings.

Intel historically has done things a bit differently. And it's also
possible that embedded platforms and ARM prefer this mode of operation,
but I haven't caught up on that discussion.

> 
> But if you want a coherent mapping you can't go to a struct page,
> because on many systems you can't just map arbitrary memory as
> uncachable.  It might either come from very special limited pools,
> or might need other magic applied to it so that it is not visible
> in the normal direct mapping, or at least not access through it.


The TTM subsystem has been relied on to provide coherent memory with
the option to switch caching mode of pages. But only on selected and
well tested platforms. On other platforms we simply do not load, and
that's fine for now.

But as mentioned multiple times, to make GPU drivers more compliant,
we'd really want that

bool dma_streaming_is_coherent(const struct device *)

API to help us decide when to load or not.

Thanks,
Thomas









Re: [PATCH -next] drm/vmwgfx: Remove set but not used variable 'srf'

2019-01-15 Thread Thomas Hellstrom
Yue,

Thanks,
Reviewed-by: Thomas Hellstrom 

Will include in the next -next pull.
/Thomas


On Wed, 2019-01-16 at 01:55 +, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/gpu/drm/vmwgfx/vmwgfx_surface.c: In function
> 'vmw_hw_surface_destroy':
> drivers/gpu/drm/vmwgfx/vmwgfx_surface.c:335:22: warning:
>  variable 'srf' set but not used [-Wunused-but-set-variable]
> 
> It never used since commit 543831cfc976 ï¼^"drm/vmwgfx: Break out
> surface and
> context management to separate files")
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> index 80a01cd..3b6976e 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> @@ -332,7 +332,6 @@ static void vmw_hw_surface_destroy(struct
> vmw_resource *res)
>  {
>  
>   struct vmw_private *dev_priv = res->dev_priv;
> - struct vmw_surface *srf;
>   void *cmd;
>  
>   if (res->func->destroy == vmw_gb_surface_destroy) {
> @@ -359,7 +358,6 @@ static void vmw_hw_surface_destroy(struct
> vmw_resource *res)
>*/
>  
>   mutex_lock(_priv->cmdbuf_mutex);
> - srf = vmw_res_to_srf(res);
>   dev_priv->used_memory_size -= res->backup_size;
>   mutex_unlock(_priv->cmdbuf_mutex);
>   }
> 
> 
> 


Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom
On Tue, 2019-01-15 at 16:20 +0100, h...@lst.de wrote:
> On Tue, Jan 15, 2019 at 03:24:55PM +0100, Christian König wrote:
> > Yeah, indeed. Bounce buffers are an absolute no-go for GPUs.
> > 
> > If the DMA API finds that a piece of memory is not directly
> > accessible by 
> > the GPU we need to return an error and not try to use bounce
> > buffers behind 
> > the surface.
> > 
> > That is something which always annoyed me with the DMA API, which
> > is 
> > otherwise rather cleanly defined.
> 
> That is exactly what I want to fix with my series to make
> DMA_ATTR_NON_CONSISTENT more useful and always available:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fiommu%2F2018-December%2F031985.htmldata=02%7C01%7Cthellstrom%40vmware.com%7Cb1799c4073024a824f9408d67afcfcea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636831624340834140sdata=JiRBfzZMvN3joJ4vKiErbzXAHaNzuBcLapRJDL%2Bt6Hc%3Dreserved=0
> 
> With that you allocate the memory using dma_alloc_attrs with
> DMA_ATTR_NON_CONSISTENT, and use dma_sync_single_* to transfer
> ownership to the cpu and back to the device, with a gurantee that
> there won't be any bouncing.  So far the interest by the parties that
> requested the feature has been rather lacklustre, though.

In the graphics case, it's probably because it doesn't fit the graphics
use-cases:

1) Memory typically needs to be mappable by another device. (the "dma-
buf" interface)
2) DMA buffers are exported to user-space and is sub-allocated by it.
Mostly there are no GPU user-space kernel interfaces to sync / flush
subregions and these syncs may happen on a smaller-than-cache-line
granularity.

So to help the graphics driver, that coherent flag would be much more
useful.

/Thomas



Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom
Hi, Christoph,

On Mon, 2019-01-14 at 10:48 +0100, Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 04:42:18PM -0700, Jason Gunthorpe wrote:
> > > Changes since the RFC:
> > > - Rework vmwgfx too [CH]
> > > - Use a distinct type for the DMA page iterator [CH]
> > > - Do not have a #ifdef [CH]
> > 
> > ChristophH: Will you ack?
> 
> This looks generally fine.
> 
> > Are you still OK with the vmwgfx reworking, or should we go back to
> > the original version that didn't have the type safety so this
> > driver
> > can be left broken?
> 
> I think the map method in vmgfx that just does virt_to_phys is
> pretty broken.  Thomas, can you check if you see any performance
> difference with just doing the proper dma mapping, as that gets the
> driver out of interface abuse land?

The performance difference is not really the main problem here. The
problem is that even though we utilize the streaming DMA interface, we
use it only since we have to for DMA-Remapping and assume that the
memory is coherent. To be able to be as compliant as possible and ditch
the virt-to-phys mode, we *need* a DMA interface flag that tells us
when the dma_sync_for_xxx are no-ops. If they aren't we'll refuse to
load for now. I'm not sure, but I think also nouveau and radeon suffer
from the same issue.

> 
> While we're at it I think we need to merge my series in this area
> for 5.0, because without that the driver is already broken.  Where
> should we merge it?

I can merge it through vmwgfx/drm-fixes. There is an outstanding issue
with patch 3. Do you want me to fix that up?

Thanks,
Thomas




Re: fix DMA ops layering violations in vmwgfx

2019-01-08 Thread Thomas Hellstrom
On Tue, 2019-01-08 at 14:12 +0100, h...@lst.de wrote:
> On Tue, Jan 08, 2019 at 09:51:45AM +0000, Thomas Hellstrom wrote:
> > Hi, Christoph,
> > 
> > On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> > > Hi Thomas,
> > > 
> > > vmwgfx has been doing some odd checks based on DMA ops which rely
> > > on deep DMA mapping layer internals, and I think the changes in
> > > Linux 4.21 finally broke most of these implicit assumptions.
> > 
> > Thanks. 
> > What we're really trying to do here is to try to detect the
> > situation
> > where DMA remapping using hardware IOMMUs is going on but memory is
> > still coherent, since the driver can currently only work with
> > coherent
> > memory[1]. Currently we use intel_iommu_enabled to detect this
> > situation, but it would be really helpful if there were a generic
> > bool
> > that advertizes this situation since we need to deal with other
> > IOMMUs
> > as well going forward. Any suggestion?
> 
> I'm missing the link of the [1] reference above. 

Sorry. I meant to remove the reference since the explanation would be
rather lengthy.

> But if you need
> coherent memory you should simply always use dma_alloc_coherent, that
> is the only gurantee you get.  If you use any other dma mapping
> methods
> you will otherwise need to explicitly transfer ownership by mapping/
> unmapping or using dma_sync* before and after every device access.

The problem is that graphics applications typically allocate huge
amounts of DMA memory. The GPU may want to map half of available system
memory or more. Moreover this memory might need to be mappable by
multiple devices, which would rule out dma_alloc_coherent() in many
situations.

Using SWIOTLB with bounce-buffers is also typically out of the question
for performance- and memory usage reasons. Moreover the sync operations
would often affect sub-regions of 2D and 3D textures, which makes the
dma_sync API inefficient even if it maps to no-ops. vmwgfx would rather
not load in these situations.

Finally graphics applications, do not always provide conservative sync
regions.

> 
> And the whole DMA API bypass using the phys mode is something that
> I'd really prefer not to see in any driver as it tends to cause
> major problems sooner or later.

I fully understand this. However, given the above, the DMA API is quite
awkward for graphics operation. It would be easier to motivate a full
removal of the phys mode if we could query the DMA API whether the
dma_sync operations are no-ops or not.

/Thomas






Re: [PATCH 4/4] drm/vmwgfx: unwind spaghetti code in vmw_dma_select_mode

2019-01-08 Thread Thomas Hellstrom
On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> Just use a simple if/else chain to select the DMA mode.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Thomas Hellstrom 


> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 25 ++---
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index c2060f6cc9e8..86387735a90b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -566,34 +566,21 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   [vmw_dma_map_populate] = "Keeping DMA mappings.",
>   [vmw_dma_map_bind] = "Giving up DMA mappings early."};
>  
> - if (intel_iommu_enabled) {
> + if (vmw_force_coherent)
> + dev_priv->map_mode = vmw_dma_alloc_coherent;
> + else if (intel_iommu_enabled)
>   dev_priv->map_mode = vmw_dma_map_populate;
> - goto out_fixup;
> - }
> -
> - if (!(vmw_force_iommu || vmw_force_coherent)) {
> + else if (!vmw_force_iommu)
>   dev_priv->map_mode = vmw_dma_phys;
> - DRM_INFO("DMA map mode: %s\n", names[dev_priv-
> >map_mode]);
> - return 0;
> - }
> -
> -#ifdef CONFIG_SWIOTLB
> - if (swiotlb_nr_tbl())
> + else if (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl())
>   dev_priv->map_mode = vmw_dma_alloc_coherent;
>   else
> -#endif
>   dev_priv->map_mode = vmw_dma_map_populate;
>  
> -out_fixup:
> - if (dev_priv->map_mode == vmw_dma_map_populate &&
> - vmw_restrict_iommu)
> + if (dev_priv->map_mode == vmw_dma_map_populate &&
> vmw_restrict_iommu)
>   dev_priv->map_mode = vmw_dma_map_bind;
>  
> - if (vmw_force_coherent)
> - dev_priv->map_mode = vmw_dma_alloc_coherent;
> -
>   DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
> -
>   return 0;
>  }
>  


Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs

2019-01-08 Thread Thomas Hellstrom
Hi,


On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
...

> intel_iommu_enabled is defined as always false for
> !CONFIG_INTEL_IOMMU,
> so remove the ifdefs around it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
> 
> -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> - /*
> -  * No coherent page pool
> -  */
> - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> - return -EINVAL;
> -#endif
> 

Actually this hunk is incorrect, it tries to determine whether the TTM
subsystem maintains a coherent page pool or not. If not, we can't use
vmw_dma_alloc_coherent.

/Thomas



Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs

2019-01-08 Thread Thomas Hellstrom
Reviewed-by: Thomas Hellstrom 

On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> intel_iommu_enabled is defined as always false for
> !CONFIG_INTEL_IOMMU,
> so remove the ifdefs around it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 18 --
>  1 file changed, 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 69e325b2d954..236052ad233c 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -567,12 +567,10 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   [vmw_dma_map_bind] = "Giving up DMA mappings early."};
>   const struct dma_map_ops *dma_ops = get_dma_ops(dev_priv->dev-
> >dev);
>  
> -#ifdef CONFIG_INTEL_IOMMU
>   if (intel_iommu_enabled) {
>   dev_priv->map_mode = vmw_dma_map_populate;
>   goto out_fixup;
>   }
> -#endif
>  
>   if (!(vmw_force_iommu || vmw_force_coherent)) {
>   dev_priv->map_mode = vmw_dma_phys;
> @@ -589,9 +587,7 @@ static int vmw_dma_select_mode(struct vmw_private
> *dev_priv)
>   dev_priv->map_mode = vmw_dma_map_populate;
>  #endif
>  
> -#ifdef CONFIG_INTEL_IOMMU
>  out_fixup:
> -#endif
>   if (dev_priv->map_mode == vmw_dma_map_populate &&
>   vmw_restrict_iommu)
>   dev_priv->map_mode = vmw_dma_map_bind;
> @@ -599,13 +595,6 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   if (vmw_force_coherent)
>   dev_priv->map_mode = vmw_dma_alloc_coherent;
>  
> -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> - /*
> -  * No coherent page pool
> -  */
> - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> - return -EINVAL;
> -#endif
>   DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
>  
>   return 0;
> @@ -619,7 +608,6 @@ static int vmw_dma_select_mode(struct vmw_private
> *dev_priv)
>   * With 32-bit we can only handle 32 bit PFNs. Optionally set that
>   * restriction also for 64-bit systems.
>   */
> -#ifdef CONFIG_INTEL_IOMMU
>  static int vmw_dma_masks(struct vmw_private *dev_priv)
>  {
>   struct drm_device *dev = dev_priv->dev;
> @@ -631,12 +619,6 @@ static int vmw_dma_masks(struct vmw_private
> *dev_priv)
>   }
>   return 0;
>  }
> -#else
> -static int vmw_dma_masks(struct vmw_private *dev_priv)
> -{
> - return 0;
> -}
> -#endif
>  
>  static int vmw_driver_load(struct drm_device *dev, unsigned long
> chipset)
>  {


Re: fix DMA ops layering violations in vmwgfx

2019-01-08 Thread Thomas Hellstrom
Hi, Christoph,

On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> Hi Thomas,
> 
> vmwgfx has been doing some odd checks based on DMA ops which rely
> on deep DMA mapping layer internals, and I think the changes in
> Linux 4.21 finally broke most of these implicit assumptions.

Thanks. 
What we're really trying to do here is to try to detect the situation
where DMA remapping using hardware IOMMUs is going on but memory is
still coherent, since the driver can currently only work with coherent
memory[1]. Currently we use intel_iommu_enabled to detect this
situation, but it would be really helpful if there were a generic bool
that advertizes this situation since we need to deal with other IOMMUs
as well going forward. Any suggestion?

Comments on the patches separately.

Thanks,
Thomas




> 
> The real fix is in patch 3, but I think the others are important
> to make it clear what is actually going on.


Re: [PATCH][drm-next] drm/selftest: fix spelling mistake "dimention" -> "dimenstion"

2018-12-10 Thread Thomas Hellstrom
On Mon, 2018-12-10 at 09:26 +, Colin King wrote:

Reviewed-by: Thomas Hellstrom 

I'll take this in the next pull request unless I'm told otherwise.

/Thomas

> From: Colin Ian King 
> 
> There is a spelling mistake in a pr_err message, fix this.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/selftests/test-drm_damage_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> b/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> index a2f753205a3e..9d2bcdf8bc29 100644
> --- a/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> +++ b/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> @@ -53,7 +53,7 @@ static bool check_damage_clip(struct
> drm_plane_state *state, struct drm_rect *r,
>   int src_y2 = (state->src.y2 >> 16) + !!(state->src.y2 &
> 0x);
>  
>   if (x1 >= x2 || y1 >= y2) {
> - pr_err("Cannot have damage clip with no dimention.\n");
> + pr_err("Cannot have damage clip with no dimension.\n");
>   return false;
>   }
>  


[tip:locking/core] locking/mutex: Fix mutex debug call and ww_mutex documentation

2018-09-10 Thread tip-bot for Thomas Hellstrom
Commit-ID:  e13e2366d8415e029fe96a62502955083e272cef
Gitweb: https://git.kernel.org/tip/e13e2366d8415e029fe96a62502955083e272cef
Author: Thomas Hellstrom 
AuthorDate: Mon, 3 Sep 2018 16:07:08 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 10 Sep 2018 10:05:10 +0200

locking/mutex: Fix mutex debug call and ww_mutex documentation

The following commit:

  08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")

introduced a reference in the documentation to a function that was
removed in an earlier commit.

It also forgot to remove a call to debug_mutex_add_waiter() which is now
unconditionally called by __mutex_add_waiter().

Fix those bugs.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dri-de...@lists.freedesktop.org
Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")
Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellst...@vmware.com
Signed-off-by: Ingo Molnar 
---
 kernel/locking/mutex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 1a81a1257b3f..3f8a35104285 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -389,7 +389,7 @@ static bool __ww_mutex_wound(struct mutex *lock,
/*
 * wake_up_process() paired with set_current_state()
 * inserts sufficient barriers to make sure @owner either sees
-* it's wounded in __ww_mutex_lock_check_stamp() or has a
+* it's wounded in __ww_mutex_check_kill() or has a
 * wakeup pending to re-read the wounded state.
 */
if (owner != current)
@@ -946,7 +946,6 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
}
 
debug_mutex_lock_common(lock, );
-   debug_mutex_add_waiter(lock, , current);
 
lock_contended(>dep_map, ip);
 


[tip:locking/core] locking/mutex: Fix mutex debug call and ww_mutex documentation

2018-09-10 Thread tip-bot for Thomas Hellstrom
Commit-ID:  e13e2366d8415e029fe96a62502955083e272cef
Gitweb: https://git.kernel.org/tip/e13e2366d8415e029fe96a62502955083e272cef
Author: Thomas Hellstrom 
AuthorDate: Mon, 3 Sep 2018 16:07:08 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 10 Sep 2018 10:05:10 +0200

locking/mutex: Fix mutex debug call and ww_mutex documentation

The following commit:

  08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")

introduced a reference in the documentation to a function that was
removed in an earlier commit.

It also forgot to remove a call to debug_mutex_add_waiter() which is now
unconditionally called by __mutex_add_waiter().

Fix those bugs.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dri-de...@lists.freedesktop.org
Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")
Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellst...@vmware.com
Signed-off-by: Ingo Molnar 
---
 kernel/locking/mutex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 1a81a1257b3f..3f8a35104285 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -389,7 +389,7 @@ static bool __ww_mutex_wound(struct mutex *lock,
/*
 * wake_up_process() paired with set_current_state()
 * inserts sufficient barriers to make sure @owner either sees
-* it's wounded in __ww_mutex_lock_check_stamp() or has a
+* it's wounded in __ww_mutex_check_kill() or has a
 * wakeup pending to re-read the wounded state.
 */
if (owner != current)
@@ -946,7 +946,6 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
}
 
debug_mutex_lock_common(lock, );
-   debug_mutex_add_waiter(lock, , current);
 
lock_contended(>dep_map, ip);
 


Re: [PATCH v4 0/4] locking,drm: Fix ww mutex naming / algorithm inconsistency

2018-06-19 Thread Thomas Hellstrom

On 06/19/2018 11:45 AM, Peter Zijlstra wrote:

I suspect you want this through the DRM tree? Ingo are you OK with that?



Yes, I can ask Dave to pull this. Ingo?

Thanks,

Thomas




Re: [PATCH v4 0/4] locking,drm: Fix ww mutex naming / algorithm inconsistency

2018-06-19 Thread Thomas Hellstrom

On 06/19/2018 11:45 AM, Peter Zijlstra wrote:

I suspect you want this through the DRM tree? Ingo are you OK with that?



Yes, I can ask Dave to pull this. Ingo?

Thanks,

Thomas




Re: [PATCH 1/3] locking: WW mutex cleanup

2018-06-19 Thread Thomas Hellstrom

On 06/19/2018 11:44 AM, Peter Zijlstra wrote:

On Tue, Jun 19, 2018 at 10:24:43AM +0200, Thomas Hellstrom wrote:

From: Peter Ziljstra 

Make the WW mutex code more readable by adding comments, splitting up
functions and pointing out that we're actually using the Wait-Die
algorithm.

Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Co-authored-by: Thomas Hellstrom 
Signed-off-by: Thomas Hellstrom 
---
  Documentation/locking/ww-mutex-design.txt |  12 +-
  include/linux/ww_mutex.h  |  28 ++---
  kernel/locking/mutex.c| 202 ++
  3 files changed, 145 insertions(+), 97 deletions(-)

Acked-by: Peter Zijlstra (Intel) 


Hi Peter,

Do you want to add a SOB, since you're the main author?

Thomas





Re: [PATCH 1/3] locking: WW mutex cleanup

2018-06-19 Thread Thomas Hellstrom

On 06/19/2018 11:44 AM, Peter Zijlstra wrote:

On Tue, Jun 19, 2018 at 10:24:43AM +0200, Thomas Hellstrom wrote:

From: Peter Ziljstra 

Make the WW mutex code more readable by adding comments, splitting up
functions and pointing out that we're actually using the Wait-Die
algorithm.

Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Co-authored-by: Thomas Hellstrom 
Signed-off-by: Thomas Hellstrom 
---
  Documentation/locking/ww-mutex-design.txt |  12 +-
  include/linux/ww_mutex.h  |  28 ++---
  kernel/locking/mutex.c| 202 ++
  3 files changed, 145 insertions(+), 97 deletions(-)

Acked-by: Peter Zijlstra (Intel) 


Hi Peter,

Do you want to add a SOB, since you're the main author?

Thomas





Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-14 Thread Thomas Hellstrom

On 06/14/2018 03:29 PM, Matthew Wilcox wrote:

On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote:

On 06/14/2018 01:36 PM, Peter Zijlstra wrote:

Currently you don't allow mixing WD and WW contexts (which is not
immediately obvious from the above code), and the above hard relies on
that. Are there sensible use cases for mixing them? IOW will your
current restriction stand without hassle?

Contexts _must_ agree on the algorithm used to resolve deadlocks. With
Wait-Die, for example, older transactions will wait if a lock is held by a
younger transaction and with Wound-Wait, younger transactions will wait if a
lock is held by an older transaction so there is no way of mixing them.

Maybe the compiler should be enforcing that; ie make it a different type?


It's intended to be enforced by storing the algorithm choice in the 
WW_MUTEX_CLASS which must be common for an acquire context and the 
ww_mutexes it acquires. However, I don't think there is a check that 
that holds. I guess we could add it as a DEBUG_MUTEX test in 
ww_mutex_lock().


Thanks,

Thomas




Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-14 Thread Thomas Hellstrom

On 06/14/2018 03:29 PM, Matthew Wilcox wrote:

On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote:

On 06/14/2018 01:36 PM, Peter Zijlstra wrote:

Currently you don't allow mixing WD and WW contexts (which is not
immediately obvious from the above code), and the above hard relies on
that. Are there sensible use cases for mixing them? IOW will your
current restriction stand without hassle?

Contexts _must_ agree on the algorithm used to resolve deadlocks. With
Wait-Die, for example, older transactions will wait if a lock is held by a
younger transaction and with Wound-Wait, younger transactions will wait if a
lock is held by an older transaction so there is no way of mixing them.

Maybe the compiler should be enforcing that; ie make it a different type?


It's intended to be enforced by storing the algorithm choice in the 
WW_MUTEX_CLASS which must be common for an acquire context and the 
ww_mutexes it acquires. However, I don't think there is a check that 
that holds. I guess we could add it as a DEBUG_MUTEX test in 
ww_mutex_lock().


Thanks,

Thomas




[PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-13 Thread Thomas Hellstrom
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 10 batches of lock / unlock 800 ww mutexes randomly
chosen out of 10. Four core Intel x86_64:

Algorithm#threads   Rollbacks  time
Wound-Wait   4  ~100   ~17s.
Wait-Die 4  ~15~19s.
Wound-Wait   16 ~36~109s.
Wait-Die 16 ~45~82s.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Thomas Hellstrom 
---
 Documentation/locking/ww-mutex-design.txt | 57 ++
 drivers/dma-buf/reservation.c |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c|  2 +-
 include/linux/ww_mutex.h  | 19 --
 kernel/locking/locktorture.c  |  2 +-
 kernel/locking/mutex.c| 98 ---
 kernel/locking/test-ww_mutex.c|  2 +-
 lib/locking-selftest.c|  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die

[PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-13 Thread Thomas Hellstrom
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 10 batches of lock / unlock 800 ww mutexes randomly
chosen out of 10. Four core Intel x86_64:

Algorithm#threads   Rollbacks  time
Wound-Wait   4  ~100   ~17s.
Wait-Die 4  ~15~19s.
Wound-Wait   16 ~36~109s.
Wait-Die 16 ~45~82s.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Thomas Hellstrom 
---
 Documentation/locking/ww-mutex-design.txt | 57 ++
 drivers/dma-buf/reservation.c |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c|  2 +-
 include/linux/ww_mutex.h  | 19 --
 kernel/locking/locktorture.c  |  2 +-
 kernel/locking/mutex.c| 98 ---
 kernel/locking/test-ww_mutex.c|  2 +-
 lib/locking-selftest.c|  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die

[PATCH RFC 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-05-25 Thread Thomas Hellstrom
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs in some cases. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die, probably due to the larger number of held locks
of sleeping transactions.

Update documentation and callers.

Signed-off-by: Thomas Hellstrom <thellst...@vmware.com>
---
 Documentation/locking/ww-mutex-design.txt |  57 +
 drivers/dma-buf/reservation.c |   2 +-
 drivers/gpu/drm/drm_modeset_lock.c|   2 +-
 include/linux/ww_mutex.h  |  19 --
 kernel/locking/locktorture.c  |   2 +-
 kernel/locking/mutex.c| 101 +++---
 kernel/locking/test-ww_mutex.c|   2 +-
 lib/locking-selftest.c|   2 +-
 8 files changed, 154 insertions(+), 33 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
   w

[PATCH RFC 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-05-25 Thread Thomas Hellstrom
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs in some cases. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die, probably due to the larger number of held locks
of sleeping transactions.

Update documentation and callers.

Signed-off-by: Thomas Hellstrom 
---
 Documentation/locking/ww-mutex-design.txt |  57 +
 drivers/dma-buf/reservation.c |   2 +-
 drivers/gpu/drm/drm_modeset_lock.c|   2 +-
 include/linux/ww_mutex.h  |  19 --
 kernel/locking/locktorture.c  |   2 +-
 kernel/locking/mutex.c| 101 +++---
 kernel/locking/test-ww_mutex.c|   2 +-
 lib/locking-selftest.c|   2 +-
 8 files changed, 154 insertions(+), 33 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
   without an acquire context

[PATCH RFC 0/2] locking: Fix ww_mutex algorithm inconsistency.

2018-05-25 Thread Thomas Hellstrom
The algorithm used for linux Wound/Wait mutexes, is actually not Wound/Wait
but Wait/Die. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Rather than renaming them across the tree to something like Wait/Die mutexes or
Deadlock Avoidance mutexes, this patch set implements also the Wound/Wait
algorithm It shouldn't touch the binary ordinary mutex paths when compiled
with otpimization.
We use Wound/Wait for the modeset locks which in theory may benefit slightly
from Wound/Wait rather than Wait/Die, but Wait/Die is actually superior for
larger number of simultaneous contending transactions so we keep the original
implementation as a choice, and don't touch the algorithm used by the
reservation objects.

Performance- and functional testing has been done using 

git://people.freedesktop.org/~thomash/ww_mutex_test

Using WW_BUILTIN and 4 and 16 competing threads.

Another option is of course to ignore the the naming / algorithm inconsistency.



[PATCH RFC 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.

2018-05-25 Thread Thomas Hellstrom
For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom <thellst...@vmware.com>
---
 drivers/gpu/drm/drm_modeset_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c 
b/drivers/gpu/drm/drm_modeset_lock.c
index f22a7ef41de1..294997765a2c 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class, true);
+static DEFINE_WW_CLASS(crtc_ww_class, false);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
-- 
2.14.3



[PATCH RFC 0/2] locking: Fix ww_mutex algorithm inconsistency.

2018-05-25 Thread Thomas Hellstrom
The algorithm used for linux Wound/Wait mutexes, is actually not Wound/Wait
but Wait/Die. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Rather than renaming them across the tree to something like Wait/Die mutexes or
Deadlock Avoidance mutexes, this patch set implements also the Wound/Wait
algorithm It shouldn't touch the binary ordinary mutex paths when compiled
with otpimization.
We use Wound/Wait for the modeset locks which in theory may benefit slightly
from Wound/Wait rather than Wait/Die, but Wait/Die is actually superior for
larger number of simultaneous contending transactions so we keep the original
implementation as a choice, and don't touch the algorithm used by the
reservation objects.

Performance- and functional testing has been done using 

git://people.freedesktop.org/~thomash/ww_mutex_test

Using WW_BUILTIN and 4 and 16 competing threads.

Another option is of course to ignore the the naming / algorithm inconsistency.



[PATCH RFC 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.

2018-05-25 Thread Thomas Hellstrom
For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom 
---
 drivers/gpu/drm/drm_modeset_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c 
b/drivers/gpu/drm/drm_modeset_lock.c
index f22a7ef41de1..294997765a2c 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class, true);
+static DEFINE_WW_CLASS(crtc_ww_class, false);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
-- 
2.14.3



Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-27 Thread Thomas Hellstrom

On 04/27/2018 06:56 PM, Robin Murphy wrote:

Hi Thomas,

On 25/04/18 14:21, Thomas Hellstrom wrote:

Hi, Robin,

Thanks for the patch. It was some time since I put together that 
code, but I remember hitting something similar to


https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linuxquestions.org_questions_linux-2Dkernel-2D70_-2527nents-2527-2Dargument-2Dof-2Ddma-5Funmap-5Fsg-2D4175621964_=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=UACKfhMfw9wac0BNUWXnAiivjaBgY_jAEupre0zXoOQ=8NQNd-XBCViYHJH4fHk-RluFYd9CDjbYzXl_BWhC0ig= 



Even if it's clear from the documentation that orig_nents should be 
used.


Hmmm, it's odd that you would see issues - it's always been something 
that CONFIG_DMA_API_DEBUG would have screamed about, and as far as I'm 
aware for x86, nents and orig_nents should always end up equal anyway. 
I would definitely be interested to see the specific fault details if 
it can be reproduced. I suppose one possibility is that there's some 
path where you inadvertently unmap something which was never mapped, 
but passing nents=0 means you manage to get away with it without the 
DMA API backend trying to interpret any bogus DMA addresses/lengths.


FWIW, the rationale is that sync_sg/unmap_sg operate on sg->page 
(which can always be translated back to a meaningful CPU address for 
cache/write buffer maintenance), not sg->dma_address (which sometimes 
cannot), therefore passing a truncated list will have the effect of 
just not syncing the tail end of the buffer, which is clearly bad.


Robin.

I agree. I browsed the current software- and hardware iommu dma backends 
and all of them seem to set nents == orig_nents.
Still, according to the docs sg_map() is free to erase any sg->page - 
related information and given that, it would be more natural if all 
operations after sg_map() would operate on sg->dma_address. I mean if 
sg_map would truncate the sg list length to 1, and erase all other 
information (which it clearly is free to do according to the docs), it 
would be pretty meaningless to supply orig_nents for the unmapping 
operation?



/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which 
may

be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but 
it's

always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy <robin.mur...@arm.com>
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c

index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct 
vmw_ttm_tt *vmw_tt)

  {
  struct device *dev = vmw_tt->dev_priv->dev->dev;
+    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
  DMA_BIDIRECTIONAL);
-    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  /**







Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-27 Thread Thomas Hellstrom

On 04/27/2018 06:56 PM, Robin Murphy wrote:

Hi Thomas,

On 25/04/18 14:21, Thomas Hellstrom wrote:

Hi, Robin,

Thanks for the patch. It was some time since I put together that 
code, but I remember hitting something similar to


https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linuxquestions.org_questions_linux-2Dkernel-2D70_-2527nents-2527-2Dargument-2Dof-2Ddma-5Funmap-5Fsg-2D4175621964_=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=UACKfhMfw9wac0BNUWXnAiivjaBgY_jAEupre0zXoOQ=8NQNd-XBCViYHJH4fHk-RluFYd9CDjbYzXl_BWhC0ig= 



Even if it's clear from the documentation that orig_nents should be 
used.


Hmmm, it's odd that you would see issues - it's always been something 
that CONFIG_DMA_API_DEBUG would have screamed about, and as far as I'm 
aware for x86, nents and orig_nents should always end up equal anyway. 
I would definitely be interested to see the specific fault details if 
it can be reproduced. I suppose one possibility is that there's some 
path where you inadvertently unmap something which was never mapped, 
but passing nents=0 means you manage to get away with it without the 
DMA API backend trying to interpret any bogus DMA addresses/lengths.


FWIW, the rationale is that sync_sg/unmap_sg operate on sg->page 
(which can always be translated back to a meaningful CPU address for 
cache/write buffer maintenance), not sg->dma_address (which sometimes 
cannot), therefore passing a truncated list will have the effect of 
just not syncing the tail end of the buffer, which is clearly bad.


Robin.

I agree. I browsed the current software- and hardware iommu dma backends 
and all of them seem to set nents == orig_nents.
Still, according to the docs sg_map() is free to erase any sg->page - 
related information and given that, it would be more natural if all 
operations after sg_map() would operate on sg->dma_address. I mean if 
sg_map would truncate the sg list length to 1, and erase all other 
information (which it clearly is free to do according to the docs), it 
would be pretty meaningless to supply orig_nents for the unmapping 
operation?



/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which 
may

be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but 
it's

always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c

index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct 
vmw_ttm_tt *vmw_tt)

  {
  struct device *dev = vmw_tt->dev_priv->dev->dev;
+    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
  DMA_BIDIRECTIONAL);
-    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  /**







Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-25 Thread Thomas Hellstrom

Hi, Robin,

Thanks for the patch. It was some time since I put together that code, 
but I remember hitting something similar to


https://www.linuxquestions.org/questions/linux-kernel-70/%27nents%27-argument-of-dma_unmap_sg-4175621964/

Even if it's clear from the documentation that orig_nents should be used.

/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which may
be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but it's
always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
  {
struct device *dev = vmw_tt->dev_priv->dev->dev;
  
+	vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;

dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
DMA_BIDIRECTIONAL);
-   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  
  /**





Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-25 Thread Thomas Hellstrom

Hi, Robin,

Thanks for the patch. It was some time since I put together that code, 
but I remember hitting something similar to


https://www.linuxquestions.org/questions/linux-kernel-70/%27nents%27-argument-of-dma_unmap_sg-4175621964/

Even if it's clear from the documentation that orig_nents should be used.

/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which may
be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but it's
always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
  {
struct device *dev = vmw_tt->dev_priv->dev->dev;
  
+	vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;

dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
DMA_BIDIRECTIONAL);
-   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  
  /**





Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 03:47 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 01:35:25PM +0200, Thomas Hellstrom wrote:

On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk <lukasz.spint...@displaylink.com>

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk <lukasz.spint...@displaylink.com>
Signed-off-by: Deepak Rawat <dra...@vmware.com>

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
drivers/gpu/drm/drm_atomic.c| 42 
+
drivers/gpu/drm/drm_atomic_helper.c |  4 
drivers/gpu/drm/drm_mode_config.c   |  5 +
drivers/gpu/drm/drm_plane.c | 12 +++
include/drm/drm_mode_config.h   | 15 +
include/drm/drm_plane.h | 16 ++
include/uapi/drm/drm_mode.h | 15 +
7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
}
/**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
 * drm_atomic_get_plane_state - get plane state
 * @state: global atomic state object
 * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 03:47 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 01:35:25PM +0200, Thomas Hellstrom wrote:

On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
drivers/gpu/drm/drm_atomic.c| 42 
+
drivers/gpu/drm/drm_atomic_helper.c |  4 
drivers/gpu/drm/drm_mode_config.c   |  5 +
drivers/gpu/drm/drm_plane.c | 12 +++
include/drm/drm_mode_config.h   | 15 +
include/drm/drm_plane.h | 16 ++
include/uapi/drm/drm_mode.h | 15 +
7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
}
/**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
 * drm_atomic_get_plane_state - get plane state
 * @state: global atomic state object
 * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
   

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-05 Thread Thomas Hellstrom

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
underlying b

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-05 Thread Thomas Hellstrom

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
underlying buffer object.

Are you saying that people are already

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:10 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 10:49:09AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat <dra...@vmware.com>

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
   drivers/gpu/drm/drm_atomic_helper.c | 122 

   include/drm/drm_atomic_helper.h |  39 
   2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
   }
   EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(>plane_src, >fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(>crtc_src, -iter->translate_crtc_x,
+  -i

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:10 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 10:49:09AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
   drivers/gpu/drm/drm_atomic_helper.c | 122 

   include/drm/drm_atomic_helper.h |  39 
   2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
   }
   EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(>plane_src, >fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(>crtc_src, -iter->translate_crtc_x,
+  -iter->translate_crtc_x);

We can als

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:03 PM, Daniel Vetter wrote:


On the topic of input validation: Should the kernel check in shared code
that all the clip rects are sensible, i.e. within the dimensions of the
fb? Relying on drivers for that seems like a bad idea.


I guess we could do that.

There seems to be different needs for different kinds of iterators, but 
otherwise
I was thinking that an iterator could just skip invalid rects if found. 
Then we
don't need a separate validation step, but OTOH user-space won't get 
notified if

that was intended.

I'm not 100% sure user-space would do anything sensible with any error 
information, though.


/Thomas




That could be done in core code in drm_atomic_plane_check().
-Daniel

/Thomas


___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_dri-2Ddevel=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=JV7fhQ4zTiyiKsY8M0Zlf81l7szYePUke8kwSvQv1T8=HEcbr-8aoWqRvWGY6RcL1QeAtEyxMRLHbsOtdJFk78I=





Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:03 PM, Daniel Vetter wrote:


On the topic of input validation: Should the kernel check in shared code
that all the clip rects are sensible, i.e. within the dimensions of the
fb? Relying on drivers for that seems like a bad idea.


I guess we could do that.

There seems to be different needs for different kinds of iterators, but 
otherwise
I was thinking that an iterator could just skip invalid rects if found. 
Then we
don't need a separate validation step, but OTOH user-space won't get 
notified if

that was intended.

I'm not 100% sure user-space would do anything sensible with any error 
information, though.


/Thomas




That could be done in core code in drm_atomic_plane_check().
-Daniel

/Thomas


___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_dri-2Ddevel=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=JV7fhQ4zTiyiKsY8M0Zlf81l7szYePUke8kwSvQv1T8=HEcbr-8aoWqRvWGY6RcL1QeAtEyxMRLHbsOtdJFk78I=





Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk <lukasz.spint...@displaylink.com>

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk <lukasz.spint...@displaylink.com>
Signed-off-by: Deepak Rawat <dra...@vmware.com>

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
   drivers/gpu/drm/drm_atomic.c| 42 
+
   drivers/gpu/drm/drm_atomic_helper.c |  4 
   drivers/gpu/drm/drm_mode_config.c   |  5 +
   drivers/gpu/drm/drm_plane.c | 12 +++
   include/drm/drm_mode_config.h   | 15 +
   include/drm/drm_plane.h | 16 ++
   include/uapi/drm/drm_mode.h | 15 +
   7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
   }
   /**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
* drm_atomic_get_plane_state - get plane state
* @state: global atomic state object
* @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
 

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
   drivers/gpu/drm/drm_atomic.c| 42 
+
   drivers/gpu/drm/drm_atomic_helper.c |  4 
   drivers/gpu/drm/drm_mode_config.c   |  5 +
   drivers/gpu/drm/drm_plane.c | 12 +++
   include/drm/drm_mode_config.h   | 15 +
   include/drm/drm_plane.h | 16 ++
   include/uapi/drm/drm_mode.h | 15 +
   7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
   }
   /**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
* drm_atomic_get_plane_state - get plane state
* @state: global atomic state object
* @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
  drivers/gpu/drm/drm_atomic.c| 42 +
  drivers/gpu/drm/drm_atomic_helper.c |  4 
  drivers/gpu/drm/drm_mode_config.c   |  5 +
  drivers/gpu/drm/drm_plane.c | 12 +++
  include/drm/drm_mode_config.h   | 15 +
  include/drm/drm_plane.h | 16 ++
  include/uapi/drm/drm_mode.h | 15 +
  7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
  }
  
  /**

+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
   * drm_atomic_get_plane_state - get plane state
   * @state: global atomic state object
   * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_encoding;
} else if (property == plane->color_range_property) {
 

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
  drivers/gpu/drm/drm_atomic.c| 42 +
  drivers/gpu/drm/drm_atomic_helper.c |  4 
  drivers/gpu/drm/drm_mode_config.c   |  5 +
  drivers/gpu/drm/drm_plane.c | 12 +++
  include/drm/drm_mode_config.h   | 15 +
  include/drm/drm_plane.h | 16 ++
  include/uapi/drm/drm_mode.h | 15 +
  7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
  }
  
  /**

+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
   * drm_atomic_get_plane_state - get plane state
   * @state: global atomic state object
   * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_encoding;
} else if (property == plane->color_range_property) {
*val = state->color_range;
+   } else if (property == 

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:52 AM, Daniel Vetter wrote:


TYPE_PLANE I have no idea who needs that. I suggest we just drop it.


I'm assuming CRTC plane coordinates here. They are used for uploading 
contents of hardware planes. Like, in the simplest case, cursor images.


/Thomas



Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:52 AM, Daniel Vetter wrote:


TYPE_PLANE I have no idea who needs that. I suggest we just drop it.


I'm assuming CRTC plane coordinates here. They are used for uploading 
contents of hardware planes. Like, in the simplest case, cursor images.


/Thomas



Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
  drivers/gpu/drm/drm_atomic_helper.c | 122 
  include/drm/drm_atomic_helper.h |  39 
  2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
  }
  EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(>plane_src, >fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(>crtc_src, -iter->translate_crtc_x,
+  -iter->translate_crtc_x);

We can also scale.


I suggest we leave out scaling for now until someone actually needs it.




+   drm_rect_intersect(>crtc_src, >plane_src);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init);
+
+/**
+ * drm_atomic_helper_damage_iter_next - advance the damage iterator
+ * @iter: The 

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom

On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
  drivers/gpu/drm/drm_atomic_helper.c | 122 
  include/drm/drm_atomic_helper.h |  39 
  2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
  }
  EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(>plane_src, >fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(>crtc_src, -iter->translate_crtc_x,
+  -iter->translate_crtc_x);

We can also scale.


I suggest we leave out scaling for now until someone actually needs it.




+   drm_rect_intersect(>crtc_src, >plane_src);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init);
+
+/**
+ * drm_atomic_helper_damage_iter_next - advance the damage iterator
+ * @iter: The iterator to advance.
+ 

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:
On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> 
wrote:
Add an atomic helper to implement dirtyfb support.  This is 
needed to
support DSI command-mode panels with x11 userspace (ie. when we 
can't

rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
Background: there are a number of different folks working on 
getting
upstream kernel working on various different phones/tablets with 
qcom

SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH 
bits

with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic 
dirty-

rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.
Adding Noralf, who iirc already posted the full dirty helpers 
already somewhere.

-Daniel
I've asked Deepak to RFC the core changes suggested for the full 
dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to

get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip 
context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to 
restrict the

damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty 
flag like
you do here, or provide a content_age state member. Since we clear 
the dirty
flag on state copies, I guess that would be sufficient. The blob 
rectangles

would then become a hint to restrict the damage region.
I'm not entirely following here - I thought for frontbuffer 
rendering the
dirty rects have always just been a hint, and that the driver was 
always

free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated 
around in

different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already 
need to

upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be 
tempting for
user-space apps to just set the dirty bit instead of providing the 
full

damage region.
Or I'm not following you here, because I don't quite see the 
difference

between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content 
and the
need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new 
content and
the need for manual upload is identified by the dirty flag, (since 
the fb

stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the 
same

way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to 
communicate

front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3)

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:
On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  
wrote:
Add an atomic helper to implement dirtyfb support.  This is 
needed to
support DSI command-mode panels with x11 userspace (ie. when we 
can't

rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on 
getting
upstream kernel working on various different phones/tablets with 
qcom

SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH 
bits

with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic 
dirty-

rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.
Adding Noralf, who iirc already posted the full dirty helpers 
already somewhere.

-Daniel
I've asked Deepak to RFC the core changes suggested for the full 
dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to

get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip 
context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to 
restrict the

damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty 
flag like
you do here, or provide a content_age state member. Since we clear 
the dirty
flag on state copies, I guess that would be sufficient. The blob 
rectangles

would then become a hint to restrict the damage region.
I'm not entirely following here - I thought for frontbuffer 
rendering the
dirty rects have always just been a hint, and that the driver was 
always

free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated 
around in

different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already 
need to

upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be 
tempting for
user-space apps to just set the dirty bit instead of providing the 
full

damage region.
Or I'm not following you here, because I don't quite see the 
difference

between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content 
and the
need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new 
content and
the need for manual upload is identified by the dirty flag, (since 
the fb

stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the 
same

way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to 
communicate

front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
underlying b

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the same
underlying buffer object.

Are you saying that people are already

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark <robdcl...@gmail.com> wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark <robdcl...@gmail.com>
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel


OK, let me rephrase:

From the driver's point-of-view, in the atomic world, new content and 
the need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new content 
and the need for manual upload is identified by the dirty flag, (since 
the fb stays the same and the driver thus never identifies a page-flip).


In both these situations, clip rects can provide a hint to restrict the 
dirty region.


Now I was thinking about the preferred way for user-space to communicate 
front buffer rendering through the atomic ioctl:


1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the 
same underlying buffer object.


Are you saying that people are already using 3) and we should keep using 
that?


/Thomas






Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel


OK, let me rephrase:

From the driver's point-of-view, in the atomic world, new content and 
the need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new content 
and the need for manual upload is identified by the dirty flag, (since 
the fb stays the same and the driver thus never identifies a page-flip).


In both these situations, clip rects can provide a hint to restrict the 
dirty region.


Now I was thinking about the preferred way for user-space to communicate 
front buffer rendering through the atomic ioctl:


1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the 
same underlying buffer object.


Are you saying that people are already using 3) and we should keep using 
that?


/Thomas






Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel


I've asked Deepak to RFC the core changes suggested for the full dirty 
blob on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to get to the desired coordinates.


One thing to perhaps discuss is how we would like to fit this with 
front-buffer rendering and the dirty ioctl. In the page-flip context, 
the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict 
the damage region that can be fully ignored by the driver, new content 
is indicated by a new framebuffer.


We could do the same for frontbuffer rendering: Either set a dirty flag 
like you do here, or provide a content_age state member. Since we clear 
the dirty flag on state copies, I guess that would be sufficient. The 
blob rectangles would then become a hint to restrict the damage region.


Another approach would be to have the presence of dirty rects without 
framebuffer change to indicate frontbuffer rendering.


I think I like the first approach best, although it may be tempting for 
user-space apps to just set the dirty bit instead of providing the full 
damage region.


/Thomas



Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel


I've asked Deepak to RFC the core changes suggested for the full dirty 
blob on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to get to the desired coordinates.


One thing to perhaps discuss is how we would like to fit this with 
front-buffer rendering and the dirty ioctl. In the page-flip context, 
the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict 
the damage region that can be fully ignored by the driver, new content 
is indicated by a new framebuffer.


We could do the same for frontbuffer rendering: Either set a dirty flag 
like you do here, or provide a content_age state member. Since we clear 
the dirty flag on state copies, I guess that would be sufficient. The 
blob rectangles would then become a hint to restrict the damage region.


Another approach would be to have the presence of dirty rects without 
framebuffer change to indicate frontbuffer rendering.


I think I like the first approach best, although it may be tempting for 
user-space apps to just set the dirty bit instead of providing the full 
damage region.


/Thomas



Re: Signal handling in a page fault handler

2018-04-03 Thread Thomas Hellstrom

On 04/03/2018 02:33 PM, Chris Wilson wrote:

Quoting Matthew Wilcox (2018-04-02 15:10:58)

Souptick and I have been auditing the various page fault handler routines
and we've noticed that graphics drivers assume that a signal should be
able to interrupt a page fault.  In contrast, the page cache takes great
care to allow only fatal signals to interrupt a page fault.

I believe (but have not verified) that a non-fatal signal being delivered
to a task which is in the middle of a page fault may well end up in an
infinite loop, attempting to handle the page fault and failing forever.

Here's one of the simpler ones:

 ret = mutex_lock_interruptible(_obj->lock);
 if (ret)
 return VM_FAULT_NOPAGE;

(many other drivers do essentially the same thing including i915)

On seeing NOPAGE, the fault handler believes the PTE is in the page
table, so does nothing before it returns to arch code at which point
I get lost in the magic assembler macros.  I believe it will end up
returning to userspace if the signal is non-fatal, at which point it'll
go right back into the page fault handler, and mutex_lock_interruptible()
will immediately fail.  So we've converted a sleeping lock into the most
expensive spinlock.

I'll ask the obvious question: why isn't the signal handled on return to
userspace?


+1




I don't think the graphics drivers really want to be interrupted by
any signal.

Assume the worst case and we may block for 10s. Even a 10ms delay may be
unacceptable to some signal handlers (one presumes). For the number one
^C usecase, yes that may be reduced to only bother if it's killable, but
I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19)
that want to be able to interrupt random blockages.
-Chris


I think the TTM page fault handler originally set the standard for this. 
First, IMO any critical section that waits for the GPU (like typically 
the page fault handler does), should be locked at least killable. The 
need for interruptible locks came from the X server's silken mouse 
relying on signals for smooth mouse operations: You didn't want the X 
server to be stuck in the kernel waiting for GPU completion when it 
should handle the cursor move request.. Now that doesn't seem to be the 
case anymore but to reiterate Chris' question, why would the signal 
persist once returned to user-space?


/Thomas








___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





Re: Signal handling in a page fault handler

2018-04-03 Thread Thomas Hellstrom

On 04/03/2018 02:33 PM, Chris Wilson wrote:

Quoting Matthew Wilcox (2018-04-02 15:10:58)

Souptick and I have been auditing the various page fault handler routines
and we've noticed that graphics drivers assume that a signal should be
able to interrupt a page fault.  In contrast, the page cache takes great
care to allow only fatal signals to interrupt a page fault.

I believe (but have not verified) that a non-fatal signal being delivered
to a task which is in the middle of a page fault may well end up in an
infinite loop, attempting to handle the page fault and failing forever.

Here's one of the simpler ones:

 ret = mutex_lock_interruptible(_obj->lock);
 if (ret)
 return VM_FAULT_NOPAGE;

(many other drivers do essentially the same thing including i915)

On seeing NOPAGE, the fault handler believes the PTE is in the page
table, so does nothing before it returns to arch code at which point
I get lost in the magic assembler macros.  I believe it will end up
returning to userspace if the signal is non-fatal, at which point it'll
go right back into the page fault handler, and mutex_lock_interruptible()
will immediately fail.  So we've converted a sleeping lock into the most
expensive spinlock.

I'll ask the obvious question: why isn't the signal handled on return to
userspace?


+1




I don't think the graphics drivers really want to be interrupted by
any signal.

Assume the worst case and we may block for 10s. Even a 10ms delay may be
unacceptable to some signal handlers (one presumes). For the number one
^C usecase, yes that may be reduced to only bother if it's killable, but
I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19)
that want to be able to interrupt random blockages.
-Chris


I think the TTM page fault handler originally set the standard for this. 
First, IMO any critical section that waits for the GPU (like typically 
the page fault handler does), should be locked at least killable. The 
need for interruptible locks came from the X server's silken mouse 
relying on signals for smooth mouse operations: You didn't want the X 
server to be stuck in the kernel waiting for GPU completion when it 
should handle the cursor move request.. Now that doesn't seem to be the 
case anymore but to reiterate Chris' question, why would the signal 
persist once returned to user-space?


/Thomas








___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel





Re: [PATCH] drm/vmwgfx: Return boolean instead of integer in vmw_fence_obj_signaled

2018-01-19 Thread Thomas Hellstrom

On 01/19/2018 01:02 AM, Gustavo A. R. Silva wrote:

Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Thomas Hellström 

I'll queue this for 4.17.

Thanks,
Thomas





---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75c..35fd6f9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -496,7 +496,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
struct vmw_fence_manager *fman = fman_from_fence(fence);
  
  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >base.flags))

-   return 1;
+   return true;
  
  	vmw_fences_update(fman);
  





Re: [PATCH] drm/vmwgfx: Return boolean instead of integer in vmw_fence_obj_signaled

2018-01-19 Thread Thomas Hellstrom

On 01/19/2018 01:02 AM, Gustavo A. R. Silva wrote:

Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Thomas Hellström 

I'll queue this for 4.17.

Thanks,
Thomas





---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75c..35fd6f9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -496,7 +496,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
struct vmw_fence_manager *fman = fman_from_fence(fence);
  
  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, >base.flags))

-   return 1;
+   return true;
  
  	vmw_fences_update(fman);
  





Re: linux-next: Signed-off-by missing for commit in the drm-misc-fixes tree

2018-01-18 Thread Thomas Hellstrom

Hi, Stephen,

That would be me.
I've historically only added my signed-off-by:s on commits I've (co-) 
authored myself, as the committer info is already registered,


But I take it that's not the correct approach?
/Thomas



On 01/18/2018 09:23 PM, Stephen Rothwell wrote:

Hi all,

Commit

   8a510a5c7526 ("drm/vmwgfx: fix memory corruption with legacy/sou connectors")

is missing a Signed-off-by from its committer.





Re: linux-next: Signed-off-by missing for commit in the drm-misc-fixes tree

2018-01-18 Thread Thomas Hellstrom

Hi, Stephen,

That would be me.
I've historically only added my signed-off-by:s on commits I've (co-) 
authored myself, as the committer info is already registered,


But I take it that's not the correct approach?
/Thomas



On 01/18/2018 09:23 PM, Stephen Rothwell wrote:

Hi all,

Commit

   8a510a5c7526 ("drm/vmwgfx: fix memory corruption with legacy/sou connectors")

is missing a Signed-off-by from its committer.





Re: [PATCH] drm/vmwgfx: fix memory corruption with legacy/sou connectors

2018-01-17 Thread Thomas Hellstrom

On 01/17/2018 04:16 PM, Rob Clark wrote:

From: Rob Clark <rcl...@redhat.com>

It looks like in all cases 'struct vmw_connector_state' is used.  But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed.  Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.

Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state"
Cc: <sta...@vger.kernel.org>
Signed-off-by: Rob Clark <rcl...@redhat.com>

Reviewed-by: Thomas Hellstrom <thellst...@vmware.com>

Thanks, Rob!

/Thomas



---
  drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
index b8a09807c5de..3824595fece1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -266,8 +266,8 @@ static const struct drm_connector_funcs 
vmw_legacy_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_ldu_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
index bc5f6026573d..63a4cd794b73 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
@@ -420,8 +420,8 @@ static const struct drm_connector_funcs 
vmw_sou_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_sou_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };





Re: [PATCH] drm/vmwgfx: fix memory corruption with legacy/sou connectors

2018-01-17 Thread Thomas Hellstrom

On 01/17/2018 04:16 PM, Rob Clark wrote:

From: Rob Clark 

It looks like in all cases 'struct vmw_connector_state' is used.  But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed.  Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.

Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state"
Cc: 
Signed-off-by: Rob Clark 

Reviewed-by: Thomas Hellstrom 

Thanks, Rob!

/Thomas



---
  drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
index b8a09807c5de..3824595fece1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -266,8 +266,8 @@ static const struct drm_connector_funcs 
vmw_legacy_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_ldu_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
index bc5f6026573d..63a4cd794b73 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
@@ -420,8 +420,8 @@ static const struct drm_connector_funcs 
vmw_sou_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_sou_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };





Re: drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-17 Thread Thomas Hellstrom

On 01/17/2018 07:33 AM, Thomas Hellstrom wrote:

Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel 
on this patch?


Thanks,
Thomas



Actually, checkpatch.pl complains about -ENOSYS, so I'm sending a pull 
request for v1 of this patch instead, to make everybody happy. I've 
rephrased the commit log somewhat. Hope that's OK.


/Thomas



Re: drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-17 Thread Thomas Hellstrom

On 01/17/2018 07:33 AM, Thomas Hellstrom wrote:

Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel 
on this patch?


Thanks,
Thomas



Actually, checkpatch.pl complains about -ENOSYS, so I'm sending a pull 
request for v1 of this patch instead, to make everybody happy. I've 
rephrased the commit log somewhat. Hope that's OK.


/Thomas



drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom

Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel on 
this patch?


Thanks,
Thomas



drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom

Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel on 
this patch?


Thanks,
Thomas



Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom

Hi, Arnd,

Sinclair's on paternal leave and I thought this patch was already in 
drm-next. My bad.

Dave, is it too late to pull this in for the next merge window?

/Thomas


On 01/16/2018 06:18 PM, Arnd Bergmann wrote:

DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and
DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit
seconds/microseconds format.

As of commit c61eef726a78 ("drm: add support for monotonic vblank
timestamps"), other DRM drivers use monotonic times for drm_event_vblank,
but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from
the y2038/y2106 overflow as well as time jumps.

For consistency, this changes vmwgfx to use ktime_get_ts64 as well,
which solves those problems and avoids the deprecated do_gettimeofday()
function.

This should be transparent to to user space, as long as it doesn't
compare the time against the result of gettimeofday().

Link: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10076599_=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=8M6vawBo0zDsjbqhzV0xpOwAzX7Zm-uyGlIDnQ7-Gms=sHGuz0aoE9aLjVp5GALo8mYrN1bwOHW6mGpJIZmhwAc=
Signed-off-by: Arnd Bergmann 
---
Originally sent on Nov 27. Sinclair Yeh said he'd pick it up
for the next pull request, but it's not in linux-next yet.

Resending the unchanged patch, please pick it up when you have time,
or feel free to ignore this email in case it's already in some tree
that just isn't part of linux-next but will be sent during the
next merge window.
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75cf5e6c..9ed544f8958f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -901,11 +901,12 @@ static void vmw_event_fence_action_seq_passed(struct 
vmw_fence_action *action)
spin_lock_irq(>event_lock);
  
  	if (likely(eaction->tv_sec != NULL)) {

-   struct timeval tv;
+   struct timespec64 ts;
  
-		do_gettimeofday();

-   *eaction->tv_sec = tv.tv_sec;
-   *eaction->tv_usec = tv.tv_usec;
+   ktime_get_ts64();
+   /* monotonic time, so no y2038 overflow */
+   *eaction->tv_sec = ts.tv_sec;
+   *eaction->tv_usec = ts.tv_nsec / NSEC_PER_USEC;
}
  
  	drm_send_event_locked(dev, eaction->event);





Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom

Hi, Arnd,

Sinclair's on paternal leave and I thought this patch was already in 
drm-next. My bad.

Dave, is it too late to pull this in for the next merge window?

/Thomas


On 01/16/2018 06:18 PM, Arnd Bergmann wrote:

DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and
DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit
seconds/microseconds format.

As of commit c61eef726a78 ("drm: add support for monotonic vblank
timestamps"), other DRM drivers use monotonic times for drm_event_vblank,
but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from
the y2038/y2106 overflow as well as time jumps.

For consistency, this changes vmwgfx to use ktime_get_ts64 as well,
which solves those problems and avoids the deprecated do_gettimeofday()
function.

This should be transparent to to user space, as long as it doesn't
compare the time against the result of gettimeofday().

Link: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10076599_=DwIBAg=uilaK90D4TOVoH58JNXRgQ=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ=8M6vawBo0zDsjbqhzV0xpOwAzX7Zm-uyGlIDnQ7-Gms=sHGuz0aoE9aLjVp5GALo8mYrN1bwOHW6mGpJIZmhwAc=
Signed-off-by: Arnd Bergmann 
---
Originally sent on Nov 27. Sinclair Yeh said he'd pick it up
for the next pull request, but it's not in linux-next yet.

Resending the unchanged patch, please pick it up when you have time,
or feel free to ignore this email in case it's already in some tree
that just isn't part of linux-next but will be sent during the
next merge window.
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75cf5e6c..9ed544f8958f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -901,11 +901,12 @@ static void vmw_event_fence_action_seq_passed(struct 
vmw_fence_action *action)
spin_lock_irq(>event_lock);
  
  	if (likely(eaction->tv_sec != NULL)) {

-   struct timeval tv;
+   struct timespec64 ts;
  
-		do_gettimeofday();

-   *eaction->tv_sec = tv.tv_sec;
-   *eaction->tv_usec = tv.tv_usec;
+   ktime_get_ts64();
+   /* monotonic time, so no y2038 overflow */
+   *eaction->tv_sec = ts.tv_sec;
+   *eaction->tv_usec = ts.tv_nsec / NSEC_PER_USEC;
}
  
  	drm_send_event_locked(dev, eaction->event);





Re: [PATCH] drm/vmwgfx: remove DRM_ERROR message, stops log spamming

2017-09-12 Thread Thomas Hellstrom

Hi, Colin,

On 09/12/2017 07:35 PM, Colin King wrote:

From: Colin Ian King 

mmap'ing the device multiple times will spam the kernel log with the
DRM_ERROR message about illegal mmap'ing the old fifo space.
How are you hitting this? Multiple mappings should be fine as long as 
mapping offsets are correct,
so hitting this message should indicate that the user-space app is doing 
something seriously wrong, and

having it present in the log should probably help more than it hurts.

/Thomas




Since
the mmap'ing will fail with an -EINVAL there is no need to emit this
message, so just remove it.

Signed-off-by: Colin Ian King 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index e771091d2cd3..1e633c602fb1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -33,10 +33,8 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_file *file_priv;
struct vmw_private *dev_priv;
  
-	if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) {

-   DRM_ERROR("Illegal attempt to mmap old fifo space.\n");
+   if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET))
return -EINVAL;
-   }
  
  	file_priv = filp->private_data;

dev_priv = vmw_priv(file_priv->minor->dev);





Re: [PATCH] drm/vmwgfx: remove DRM_ERROR message, stops log spamming

2017-09-12 Thread Thomas Hellstrom

Hi, Colin,

On 09/12/2017 07:35 PM, Colin King wrote:

From: Colin Ian King 

mmap'ing the device multiple times will spam the kernel log with the
DRM_ERROR message about illegal mmap'ing the old fifo space.
How are you hitting this? Multiple mappings should be fine as long as 
mapping offsets are correct,
so hitting this message should indicate that the user-space app is doing 
something seriously wrong, and

having it present in the log should probably help more than it hurts.

/Thomas




Since
the mmap'ing will fail with an -EINVAL there is no need to emit this
message, so just remove it.

Signed-off-by: Colin Ian King 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index e771091d2cd3..1e633c602fb1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -33,10 +33,8 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_file *file_priv;
struct vmw_private *dev_priv;
  
-	if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) {

-   DRM_ERROR("Illegal attempt to mmap old fifo space.\n");
+   if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET))
return -EINVAL;
-   }
  
  	file_priv = filp->private_data;

dev_priv = vmw_priv(file_priv->minor->dev);





Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-30 Thread Thomas Hellstrom

On 08/30/2017 10:30 AM, Daniel Vetter wrote:

On Wed, Aug 30, 2017 at 08:21:46AM +0200, Thomas Hellstrom wrote:

On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav...@gmail.com>
---
   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
   }
-static struct dma_fence_ops vmw_fence_ops = {
+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,

Reviewed-by: Thomas Hellstrom <thellst...@vmware.com>

Does this mean you'll merge it, or does this mean you'll expect someone
else to merge this?

I'm always confused when maintainers reply with an r-b/ack for a patch
only touching their driver, and no further information at all.
-Daniel


For patches only touching our driver, I'd say we're always responsible 
for sorting out how it's going to be merged.


Since Sinclair is maintaining the vmwgfx trees I thought I'd give him a 
chance to comment on how he wanted it merged.


/Thomas





Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-30 Thread Thomas Hellstrom

On 08/30/2017 10:30 AM, Daniel Vetter wrote:

On Wed, Aug 30, 2017 at 08:21:46AM +0200, Thomas Hellstrom wrote:

On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
   }
-static struct dma_fence_ops vmw_fence_ops = {
+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,

Reviewed-by: Thomas Hellstrom 

Does this mean you'll merge it, or does this mean you'll expect someone
else to merge this?

I'm always confused when maintainers reply with an r-b/ack for a patch
only touching their driver, and no further information at all.
-Daniel


For patches only touching our driver, I'd say we're always responsible 
for sorting out how it's going to be merged.


Since Sinclair is maintaining the vmwgfx trees I thought I'd give him a 
chance to comment on how he wanted it merged.


/Thomas





Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-30 Thread Thomas Hellstrom

On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav...@gmail.com>
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
  }
  
-static struct dma_fence_ops vmw_fence_ops = {

+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,


Reviewed-by: Thomas Hellstrom <thellst...@vmware.com>




Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-30 Thread Thomas Hellstrom

On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
  }
  
-static struct dma_fence_ops vmw_fence_ops = {

+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,


Reviewed-by: Thomas Hellstrom 




Re: [PATCH 28/29] drm/vmwgfx: switch to drm_*{get,put} helpers

2017-08-03 Thread Thomas Hellstrom

On 08/03/2017 01:58 PM, Cihangir Akturk wrote:

drm_*_reference() and drm_*_unreference() functions are just
compatibility alias for drm_*_get() and drm_*_put() adn should not be


s/adn/and/

used by new code. So convert all users of compatibility functions to use
the new APIs.

Signed-off-by: Cihangir Akturk <cakt...@gmail.com>
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
index 6f4cb46..d43dce9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
@@ -500,7 +500,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par,
}
  
  	if (cur_fb) {

-   drm_framebuffer_unreference(cur_fb);
+   drm_framebuffer_put(cur_fb);
par->set_fb = NULL;
}
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c

index 5ec24fd..fd4a988 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -316,7 +316,7 @@ int vmw_present_ioctl(struct drm_device *dev, void *data,
  out_no_surface:
ttm_read_unlock(_priv->reservation_sem);
  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:
@@ -393,7 +393,7 @@ int vmw_present_readback_ioctl(struct drm_device *dev, void 
*data,
  
  	ttm_read_unlock(_priv->reservation_sem);

  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:


Apart from the above,

Reviewed-by: Thomas Hellstrom <thellst...@vmware.com>

(Assuming this gets pulled together with the whole series, not vmwgfx-next)

Thanks,

Thomas




Re: [PATCH 28/29] drm/vmwgfx: switch to drm_*{get,put} helpers

2017-08-03 Thread Thomas Hellstrom

On 08/03/2017 01:58 PM, Cihangir Akturk wrote:

drm_*_reference() and drm_*_unreference() functions are just
compatibility alias for drm_*_get() and drm_*_put() adn should not be


s/adn/and/

used by new code. So convert all users of compatibility functions to use
the new APIs.

Signed-off-by: Cihangir Akturk 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
index 6f4cb46..d43dce9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
@@ -500,7 +500,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par,
}
  
  	if (cur_fb) {

-   drm_framebuffer_unreference(cur_fb);
+   drm_framebuffer_put(cur_fb);
par->set_fb = NULL;
}
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c

index 5ec24fd..fd4a988 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -316,7 +316,7 @@ int vmw_present_ioctl(struct drm_device *dev, void *data,
  out_no_surface:
ttm_read_unlock(_priv->reservation_sem);
  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:
@@ -393,7 +393,7 @@ int vmw_present_readback_ioctl(struct drm_device *dev, void 
*data,
  
  	ttm_read_unlock(_priv->reservation_sem);

  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:


Apart from the above,

Reviewed-by: Thomas Hellstrom 

(Assuming this gets pulled together with the whole series, not vmwgfx-next)

Thanks,

Thomas




  1   2   3   4   >