from:"\"Thomas Hellstrom\""

Re: linux-next: Fixes tags need some work in the tip tree

2019-10-22 Thread Thomas Hellstrom

On 10/22/19 10:17 PM, Stephen Rothwell wrote:
> Hi all,
>
> n commit
>
>   6fee2a0be0ec ("x86/cpu/vmware: Fix platform detection VMWARE_PORT macro")
>
> Fixes tag
>
>   Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")

The cited subject is missing a leading "x86/vmware:". Ingo, please let
me know if you want me to respin those.

Thanks,

Thomas

[tip: x86/urgent] x86/cpu/vmware: Fix platform detection VMWARE_PORT macro

2019-10-21 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: 6fee2a0be0ecae939d4b6cd8297d88b5cbb61654
Gitweb:
https://git.kernel.org/tip/6fee2a0be0ecae939d4b6cd8297d88b5cbb61654
Author:Thomas Hellstrom 
AuthorDate:Mon, 21 Oct 2019 19:24:03 +02:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00

x86/cpu/vmware: Fix platform detection VMWARE_PORT macro

The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT
definition, but expects it to be an integer. However, when it was moved
to the new vmware.h include file, it was changed to be a string to better
fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the
platform detection VMWARE_PORT functionality.

Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB
definitions to be integers, and use __stringify() for their stringified
form when needed.

Signed-off-by: Thomas Hellstrom 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sean Christopherson 
Cc: Thomas Gleixner 
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas...@shipmail.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/vmware.h | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
index 3caac90..ac9fc51 100644
--- a/arch/x86/include/asm/vmware.h
+++ b/arch/x86/include/asm/vmware.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * The hypercall definitions differ in the low word of the %edx argument
@@ -20,8 +21,8 @@
  */
 
 /* Old port-based version */
-#define VMWARE_HYPERVISOR_PORT"0x5658"
-#define VMWARE_HYPERVISOR_PORT_HB "0x5659"
+#define VMWARE_HYPERVISOR_PORT0x5658
+#define VMWARE_HYPERVISOR_PORT_HB 0x5659
 
 /* Current vmcall / vmmcall version */
 #define VMWARE_HYPERVISOR_HB   BIT(0)
@@ -29,7 +30,7 @@
 
 /* The low bandwidth call. The low word of edx is presumed clear. */
 #define VMWARE_HYPERCALL   \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \
  "inl (%%dx), %%eax",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
@@ -39,7 +40,8 @@
  * HB and OUT bits set.
  */
 #define VMWARE_HYPERCALL_HB_OUT
\
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep outsb", \
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; 
" \
+ "rep outsb",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
 
@@ -48,7 +50,8 @@
  * HB bit set.
  */
 #define VMWARE_HYPERCALL_HB_IN \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_HB ", %%dx; rep insb", \
+   ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; 
" \
+ "rep insb",   \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)
 #endif

[tip: x86/urgent] x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm

2019-10-21 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: db633a4e0e6eda69b6065e3e106f9ea13a0676c3
Gitweb:
https://git.kernel.org/tip/db633a4e0e6eda69b6065e3e106f9ea13a0676c3
Author:Thomas Hellstrom 
AuthorDate:Mon, 21 Oct 2019 19:24:02 +02:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 00:51:44 +02:00

x86/cpu/vmware: Use the full form of INL in VMWARE_HYPERCALL, for clang/llvm

LLVM's assembler doesn't accept the short form INL instruction:

  inl (%%dx)

but instead insists on the output register to be explicitly specified.

This was previously fixed for the VMWARE_PORT macro. Fix it also for
the VMWARE_HYPERCALL macro.

Suggested-by: Sami Tolvanen 
Signed-off-by: Thomas Hellstrom 
Reviewed-by: Nick Desaulniers 
Cc: Borislav Petkov 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Sean Christopherson 
Cc: Thomas Gleixner 
Cc: clang-built-li...@googlegroups.com
Fixes: b4dd4f6e3648 ("Add a header file for hypercall definitions")
Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas...@shipmail.org
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/vmware.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
index e00c9e8..3caac90 100644
--- a/arch/x86/include/asm/vmware.h
+++ b/arch/x86/include/asm/vmware.h
@@ -29,7 +29,8 @@
 
 /* The low bandwidth call. The low word of edx is presumed clear. */
 #define VMWARE_HYPERCALL   \
-   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; "\
+ "inl (%%dx), %%eax",  \
  "vmcall", X86_FEATURE_VMCALL, \
  "vmmcall", X86_FEATURE_VMW_VMMCALL)

Re: [PATCH v6 0/8] Emulated coherent graphics memory take 2

2019-10-21 Thread Thomas Hellstrom

On 10/14/19 3:22 PM, Thomas Hellström (VMware) wrote:
> From: Thomas Hellström 
>
> Graphics APIs like OpenGL 4.4 and Vulkan require the graphics driver
> to provide coherent graphics memory, meaning that the GPU sees any
> content written to the coherent memory on the next GPU operation that
> touches that memory, and the CPU sees any content written by the GPU
> to that memory immediately after any fence object trailing the GPU
> operation is signaled.
>
> Paravirtual drivers that otherwise require explicit synchronization
> needs to do this by hooking up dirty tracking to pagefault handlers
> and buffer object validation.
>
> Provide mm helpers needed for this and that also allow for huge pmd-
> and pud entries (patch 1-3), and the associated vmwgfx code (patch 4-7).
>
> The code has been tested and exercised by a tailored version of mesa
> where we disable all explicit synchronization and assume graphics memory
> is coherent. The performance loss varies of course; a typical number is
> around 5%.
>
> I would like to merge this code through the DRM tree, so an ack to include
> the new mm helpers in that merge would be greatly appreciated.
>
> Changes since RFC:
> - Merge conflict changes moved to the correct patch. Fixes intra-patchset
>   compile errors.
> - Be more aggressive when turning ttm vm code into helpers. This makes sure
>   we can use a const qualifier on the vmwgfx vm_ops.
> - Reinstate a lost comment an fix an error path that was broken when turning
>   the ttm vm code into helpers.
> - Remove explicit type-casts of struct vm_area_struct::vm_private_data
> - Clarify the locking inversion that makes us not being able to use the mm
>   pagewalk code.
>
> Changes since v1:
> - Removed the vmwgfx maintainer entry for as_dirty_helpers.c, updated
>   commit message accordingly
> - Removed the TTM patches from the series as they are merged separately
>   through DRM.
> Changes since v2:
> - Split out the pagewalk code from as_dirty_helpers.c and document locking.
> - Add pre_vma and post_vma callbacks to the pagewalk code.
> - Remove huge pmd and -pud asserts that would trip when we protect vmas with
>   struct address_space::i_mmap_rwsem rather than with
>   struct vm_area_struct::mmap_sem.
> - Do some naming cleanup in as_dirty_helpers.c
> Changes since v3:
> - Extensive renaming of the dirty helpers including the filename.
> - Update walk_page_mapping() doc.
> - Update the pagewalk code to not unconditionally split pmds if a pte_entry()
>   callback is present. Update the dirty helper pmd_entry accordingly.
> - Use separate walk ops for the dirty helpers.
> - Update the pagewalk code to take the pagetable lock in walk_pte_range.
> Changes since v4:
> - Fix pte pointer confusion in patch 2/8
> - Skip the pagewalk code conditional split patch for now, and update the
>   mapping_dirty_helper accordingly. That problem will be solved in a cleaner
>   way in a follow-up patchset.
> Changes since v5:
> - Fix tlb flushing when we have other pending tlb flushes.
>   
> Cc: Andrew Morton 
> Cc: Matthew Wilcox 
> Cc: Will Deacon 
> Cc: Peter Zijlstra 
> Cc: Rik van Riel 
> Cc: Minchan Kim 
> Cc: Michal Hocko 
> Cc: Huang Ying 
> Cc: Jérôme Glisse 
> Cc: Kirill A. Shutemov 
>
Kirill, Linus

I have a formal Ack for two of the four mm patches. Is there a chance I
can get an ack to merge the mm patches of this series through drm with
the vmwgfx patches?

Thanks,

Thomas

Re: [PATCH v4 3/9] mm: pagewalk: Don't split transhuge pmds when a pmd_entry is present

2019-10-09 Thread Thomas Hellstrom

Hi,

On 10/9/19 7:17 PM, Linus Torvalds wrote:
> On Wed, Oct 9, 2019 at 10:03 AM Thomas Hellström (VMware)
>  wrote:
>> Nope, it handles the hugepages by ignoring them, since they should be
>> read-only, but if pmd_entry() was called with something else than a
>> hugepage, then it requests the fallback, but never a split.
> But  PAGE_WALK_FALLBACK _is_ a split.
>
> Oh, except you did this
>
> -   split_huge_pmd(walk->vma, pmd, addr);
> +   if (!ops->pmd_entry)
> +   split_huge_pmd(walk->vma, pmd, addr);
>
>
> so it avoids the split.
>
> No, that's unacceptable. And makes no sense anyway. If it doesn't
> split the pmd, then it shouldn't walk the pte's - because they don't
> exist. And if it's not a hugepmd, then the split is a no-op, so the
> test makes no sense.
>
> I hadn't noticed that part of the patch. That simply can't be right. I
> don't think you've tested this, because you never actually have
> hugepages, do you?
>
> You didn't notice or realize that split_huge_pmd() just does that
>
> if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd)   \
> || pmd_devmap(*pmd))\
>
> thing and doesn't do anythign at all if it's not huge.
>
> So no. That code makes no sense at all, and I didn't realize how
> senseless it was, becasue I stupidly missed that "make the split
> conditional" - which is insane and wrong - and I thought that you
> wanted PAGE_WALK_FALLBACK to split a pmd and fall back to per-pte
> entries, which is what the name implies.
>
> But that's not what you wanted at all.
>
> Just get rid of PAGE_WALK_FALLBACK entirely then, and make the rule be
> that a zero return value just means "split and do ptes". Which is what
> you want (see above why "split" simply is wrong, and isn't an issue
> for you anyway.
>
> That won't change any existing cases, since even if they do have a
> zero return value, they don't have a pte_entry() callback, so they
> won't do that "split and do ptes" anyway.
>
>  Linus
>
Hmm, so we have the following cases we need to handle when returning
from the pmd_entry() handler.

1) Huge pmd was handled - Returns 0 and continues.
2) A pmd is otherwise unstable, typically someone just zapped a huge
pmd. Returns PAGE_WALK_FALLBACK, gets caught in the pmd_trans_unstable()
test and retries.
3) A pte directory - Returns PAGE_WALK_FALLBACK, falls through, avoids
the split and continues to the next level. Yeah that split avoidance
test is indeed made unnecessary by the preceding pmd_trans_unstable() test.

-   split_huge_pmd(walk->vma, pmd, addr);
+   if (!ops->pmd_entry)
+   split_huge_pmd(walk->vma, pmd, addr);

But as the commit message says, PAGE_WALK_FALLBACK is necessary to have
a virtual address range being handled once and only once. Therefore we
must distinguish between 1) and 2) since 2) must be retried until it's
handled correctly.

So we need the PAGE_WALK_FALLBACK. And if we instead were to combine 1)
and 3) in a single return value and use, for example PAGE_WALK_RETRY for
2)  the following could happen.

a) we handle the huge pmd and return 0 from pte_entry().
b) another process splits it.
c) we fall through to the pte level and handle the same address range
again...

So to summarize, yes the test in the code you cite is unnecessary. But
if we want to guarantee a virtual address range being handled once and
only once we need the PAGE_WALK_FALLBACK, (perhaps renamed). If not, we
can do without it similar to your original patch.

Thanks,

/Thomas

Re: [PATCH v4 5/9] mm: Add write-protect and clean utilities for address space ranges

2019-10-08 Thread Thomas Hellstrom

On 10/8/19 7:07 PM, Linus Torvalds wrote:
> On Tue, Oct 8, 2019 at 2:15 AM Thomas Hellström (VMware)
>  wrote:
>> Add two utilities to 1) write-protect and 2) clean all ptes pointing into
>> a range of an address space.
> The code looks much simpler and easier to understand now, I think, but
> that also makes some thing more obvious..
>
>> +static int clean_record_pte(pte_t *pte, unsigned long addr,
>> +   unsigned long end, struct mm_walk *walk)
>> +{
>> +   struct wp_walk *wpwalk = walk->private;
>> +   struct clean_walk *cwalk = to_clean_walk(wpwalk);
>> +   pte_t ptent = *pte;
>> +
>> +   if (pte_dirty(ptent)) {
>> +   pgoff_t pgoff = ((addr - walk->vma->vm_start) >> PAGE_SHIFT) 
>> +
>> +   walk->vma->vm_pgoff - cwalk->bitmap_pgoff;
>> +   pte_t old_pte = ptep_modify_prot_start(walk->vma, addr, pte);
>> +
>> +   ptent = pte_mkclean(old_pte);
>> +   ptep_modify_prot_commit(walk->vma, addr, pte, old_pte, 
>> ptent);
>> +
>> +   wpwalk->total++;
>> +   wpwalk->tlbflush_start = min(wpwalk->tlbflush_start, addr);
>> +   wpwalk->tlbflush_end = max(wpwalk->tlbflush_end,
>> +  addr + PAGE_SIZE);
>> +
>> +   __set_bit(pgoff, cwalk->bitmap);
> The above looks fundamentally racy.
>
> You clean the pte in memory, but it remains dirty and writable in the
> TLB, and you only flush it much later.
>
> So now writes can continue to happen to that page, without it
> necessarily being marked dirty again in the page tables, because all
> the CPU TLB caches say "it's already dirty".
>
> This may be ok - you've moved the diry bit into that bitmap, and you
> will flush the TLB before then taking action on the bitmap. So you
> haven't really _lost_ any dirty bits, but it still may be worth a
> comment.
>
> You do have comments about the _other_ issues (ie the whole "If a
> caller needs to make sure all dirty ptes are picked up and none
> additional are added..") but you don't actually have comments about
> the TLB race basically potentially now causing "missing" dirty stuff.
>
> And this may actually be a big problem on some architectures. Not x86,
> and maybe nothing else, but I have this dim memory of some
> architectures being unhappy due to virtual caches, and writeback when
> the page table entry says it's read-only and clean.
>
> Our normal "clean pages" is very anal about this, and does
>
> flush_cache_page(vma, address, pte_pfn(*pte));
> entry = ptep_clear_flush(vma, address, pte);
> entry = pte_wrprotect(entry);
> entry = pte_mkclean(entry);
> set_pte_at(vma->vm_mm, address, pte, entry);
>
> when it cleans a page, and I note both the cache flush and the
> "clear_flush()" (which invalidates the pte and does a synchronous TLB
> flush) and we have magic semantics for that at least on s390 because
> there are some low-level architecture details wrt flushing TLB entries
> and modifying them.
>
> End result: I think the code is probably ok in practice because you
> don't mind the slightly fuzzy dirty bit state, and it's _probably_ ok
> on all architectures that use drm for the TLB invalidation side. But I
> think it bears a bit of thinking about.

Yes, there's been some considerable thinking behind this. We do have the
cache flush in the pre_vma callback, and as you mention the TLB flush
happens before we use the bitmap: any pte that may be subject to a race
is recorded in the bitmap, and the guarantee we want to provide with the
function is

a) All ptes that are dirtied before the function starts executing will
be recorded in the bitmap.
b) All ptes dirtied after that will either be recorded in the bitmap,
remain dirty or both.

I probably should add that in the docs.

And the actual writeback happens asynchronously a lot later *should* be
OK, as long as caches are synced for the dma operation. Of course if
this somehow fails on a particular architecture I guess we need to
rethink and use ptep_clear_flush() at least on that architecture.

Thanks,
Thomas.





>
>   Linus
>

Re: [PATCH] x86/cpu/vmware: use the full form of inl in VMWARE_PORT

2019-10-07 Thread Thomas Hellstrom

On 10/7/19 9:21 PM, Sami Tolvanen wrote:
> LLVM's assembler doesn't accept the short form inl (%%dx) instruction,
> but instead insists on the output register to be explicitly specified:
>
>   :1:7: error: invalid operand for instruction
>   inl (%dx)
>  ^
>   LLVM ERROR: Error parsing inline asm
>
> Use the full form of the instruction to fix the build.
>
> Signed-off-by: Sami Tolvanen 

Acked-by: Thomas Hellstrom 

> ---
>  arch/x86/kernel/cpu/vmware.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
> index 9735139cfdf8..46d732696c1c 100644
> --- a/arch/x86/kernel/cpu/vmware.c
> +++ b/arch/x86/kernel/cpu/vmware.c
> @@ -49,7 +49,7 @@
>  #define VMWARE_CMD_VCPU_RESERVED 31
>  
>  #define VMWARE_PORT(cmd, eax, ebx, ecx, edx) \
> - __asm__("inl (%%dx)" :  \
> + __asm__("inl (%%dx), %%eax" :   \
>   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
>   "a"(VMWARE_HYPERVISOR_MAGIC),   \
>   "c"(VMWARE_CMD_##cmd),  \

Re: [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges

2019-10-03 Thread Thomas Hellstrom

On 10/2/19 10:28 PM, Linus Torvalds wrote:
> On Wed, Oct 2, 2019 at 12:09 PM Thomas Hellström (VMware)
>  wrote:
>> Yes I typically tend towards using a "namespace_object_operation" naming
>> scheme, with "as_dirty" being the namespace here,
> We discourage that kind of mindless namespacing for core stuff.
>
> It makes sense in a driver or a filesystem: when there are 20+
> different filesystems all implementing the same operation, you can't
> have descriptive and natural names that are unique. So having a
> namespace prefix for the filesystem or driver makes sense. But even
> then it tends ot be just a simple name, not the op it does.

Understood.

>
>> Looking at Matthew's suggestion but lining up with
>> "unmap_mapping_range()", perhaps we could use "clean_mapping_range" and
>> "wp_mapping_range"?
> Yes, I agree with Willy that "dirty" is kind of implicit when the
> operation is to clean something, so the above sounds sane to me.
>
> The wp part I'm not entirely sure about: you're not actually
> write-protecting the range. You're doing something different. You're
> only doing it for shared writable mappings.

Both the cleaning operation and the wp operation operate on shared
writable mappings, and since they are also both restricted to entries
that may take part in dirty-tracking (they're ignoring pmds and puds),
so perhaps a "dirty" may make sense anyway, and to point out the similarity:

clean_mapping_dirty_range() and wp_mapping_dirty_range()"?

>  And I'd rather see
> separate 'struct mm_walk_ops' than shared ones that then have a flag
> in a differfent structure to change behavior.
>
> Yes, yes, some of the levels share the same logic, but that just means
> that you can use the same function pointer for that level in the
> different "clean" vs "wp" mm_walk_op.

I think that this comment and this last one:

> Also, why do you loop inside the pmd_entry function, instead of just
> having a pte_entry function?"

are tied together. The pagewalk code is kind of awkward since if you
provide a pte_entry function,
then huge pmds are unconditionally split, even if handled in pmd_entry,
forcing pmd-aware users to handle also ptes in pmd_entry(). I'm not
entirely sure why, but it looks like it's to avoid a race where huge
pmds could otherwise be unintentionally split if appearing *after* the
pmd_entry() call. Other pagewalk users do the same here,  see for
example clear_refs_pte_range();

https://elixir.bootlin.com/linux/latest/source/fs/proc/task_mmu.c#L1040

Also the pagewalk walk_pte_range() for some reason doesn't take the page
table lock, which means that a pte may well be cleared under us while we
have it cached for modification.

For these reasons we can't use the pte_entry, even internally and this
means we have three choices:

a) Selecting the pte function using a bool. Saves code and avoids extra
indirect function call.
b) Duplicating the pmd_entry with a different pte function, also
duplicating the ops. Avoids extra indirect function call but some extra
code.
c) Instead of the bool, use another function pointer in yet another ops
pointed to by the private structure. Saves some code.

I opted for a) here, but I can of course change that if needed?

>
> Also, looking closer at this patch, this makes me go "Whaa?":
>
> +   pte = (mm == &init_mm) ?
> +   pte_offset_kernel(pmd, addr) :
> +   pte_offset_map_lock(mm, pmd, addr, &ptl);
>
> because I don't think that's sensible. When do you have a vma in kernel space?
>
> Also, why do you loop inside the pmd_entry function, instead of just
> having a pte_entry function?

Yes, that was a blind copy from v1 of the code that used
"apply_to_page_range()". I'll fix that up.

Thanks,

/Thomas

>
>Linus
>

Re: [PATCH v3 1/2] x86: Don't let pgprot_modify() change the page encryption bit

2019-09-30 Thread Thomas Hellstrom

Hi,

On 9/18/19 7:57 PM, Dave Hansen wrote:
> On 9/17/19 6:01 AM, Thomas Hellström (VMware) wrote:
>> diff --git a/arch/x86/include/asm/pgtable_types.h 
>> b/arch/x86/include/asm/pgtable_types.h
>> index b5e49e6bac63..8267dd426b15 100644
>> --- a/arch/x86/include/asm/pgtable_types.h
>> +++ b/arch/x86/include/asm/pgtable_types.h
>> @@ -123,7 +123,7 @@
>>   */
>>  #define _PAGE_CHG_MASK  (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | 
>> \
>>   _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \
>> - _PAGE_SOFT_DIRTY | _PAGE_DEVMAP)
>> + _PAGE_SOFT_DIRTY | _PAGE_DEVMAP | _PAGE_ENC)
>>  #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
> My only nit with what remains is that it expands the infestation of
> things that look like a simple macro but are not.
>
> I'm debating whether we want to go fix that now, though.
>
Any chance for an ack on this? It's really a small change that, as we've
found out, fixes an existing problem.

Thanks,

Thomas

Re: [PATCH 7/8] drm/vmwgfx: switch to own vma manager

2019-09-05 Thread Thomas Hellstrom

On Thu, 2019-09-05 at 09:05 +0200, Gerd Hoffmann wrote:
> Add struct drm_vma_offset_manager to vma_private, initialize it and
> pass it to ttm_bo_device_init().
> 
> With this in place the last user of ttm's embedded vma offset manager
> is gone and we can remove it (in a separate patch).
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 1 +
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 6 +-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> 

Reviewed-by: Thomas Hellström 

I assume this will be merged through drm-misc?

/Thomas

[tip: x86/vmware] x86/vmware: Add a header file for hypercall definitions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: b4dd4f6e3648dfd66576515fd885a9a765c0
Gitweb:
https://git.kernel.org/tip/b4dd4f6e3648dfd66576515fd885a9a765c0
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:51 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:32:06 +02:00

x86/vmware: Add a header file for hypercall definitions

The new header is intended to be used by drivers using the backdoor.
Follow the KVM example using alternatives self-patching to choose
between vmcall, vmmcall and io instructions.

Also define two new CPU feature flags to indicate hypervisor support
for vmcall- and vmmcall instructions. The new XF86_FEATURE_VMW_VMMCALL
flag is needed because using XF86_FEATURE_VMMCALL might break QEMU/KVM
setups using the vmmouse driver. They rely on XF86_FEATURE_VMMCALL
on AMD to get the kvm_hypercall() right. But they do not yet implement
vmmcall for the VMware hypercall used by the vmmouse driver.

 [ bp: reflow hypercall %edx usage explanation comment. ]

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Cc: Aaron Lewis 
Cc: "David S. Miller" 
Cc: Fenghua Yu 
Cc: Greg Kroah-Hartman 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Josh Poimboeuf 
Cc: linux-graphics-maintai...@vmware.com
Cc: Mauro Carvalho Chehab 
Cc: Nicolas Ferre 
Cc: Robert Hoo 
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-3-thomas...@shipmail.org
---
 MAINTAINERS|  1 +-
 arch/x86/include/asm/cpufeatures.h |  2 +-
 arch/x86/include/asm/vmware.h  | 53 +-
 arch/x86/kernel/cpu/vmware.c   |  6 ++-
 4 files changed, 61 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/vmware.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9cbcf16..47efc1b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -17206,6 +17206,7 @@ M:  "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c
+F: arch/x86/include/asm/vmware.h
 
 VMWARE PVRDMA DRIVER
 M: Adit Ranadive 
diff --git a/arch/x86/include/asm/cpufeatures.h 
b/arch/x86/include/asm/cpufeatures.h
index e880f24..aaeae2f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -232,6 +232,8 @@
 #define X86_FEATURE_VMMCALL( 8*32+15) /* Prefer VMMCALL to VMCALL 
*/
 #define X86_FEATURE_XENPV  ( 8*32+16) /* "" Xen paravirtual guest 
*/
 #define X86_FEATURE_EPT_AD ( 8*32+17) /* Intel Extended Page Table 
access-dirty bit */
+#define X86_FEATURE_VMCALL ( 8*32+18) /* "" Hypervisor supports 
the VMCALL instruction */
+#define X86_FEATURE_VMW_VMMCALL( 8*32+19) /* "" VMware prefers 
VMMCALL hypercall instruction */
 
 /* Intel-defined CPU features, CPUID level 0x0007:0 (EBX), word 9 */
 #define X86_FEATURE_FSGSBASE   ( 9*32+ 0) /* RDFSBASE, WRFSBASE, 
RDGSBASE, WRGSBASE instructions*/
diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
new file mode 100644
index 000..e00c9e8
--- /dev/null
+++ b/arch/x86/include/asm/vmware.h
@@ -0,0 +1,53 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+#ifndef _ASM_X86_VMWARE_H
+#define _ASM_X86_VMWARE_H
+
+#include 
+#include 
+
+/*
+ * The hypercall definitions differ in the low word of the %edx argument
+ * in the following way: the old port base interface uses the port
+ * number to distinguish between high- and low bandwidth versions.
+ *
+ * The new vmcall interface instead uses a set of flags to select
+ * bandwidth mode and transfer direction. The flags should be loaded
+ * into %dx by any user and are automatically replaced by the port
+ * number if the VMWARE_HYPERVISOR_PORT method is used.
+ *
+ * In short, new driver code should strictly use the new definition of
+ * %dx content.
+ */
+
+/* Old port-based version */
+#define VMWARE_HYPERVISOR_PORT"0x5658"
+#define VMWARE_HYPERVISOR_PORT_HB "0x5659"
+
+/* Current vmcall / vmmcall version */
+#define VMWARE_HYPERVISOR_HB   BIT(0)
+#define VMWARE_HYPERVISOR_OUT  BIT(1)
+
+/* The low bandwidth call. The low word of edx is presumed clear. */
+#define VMWARE_HYPERCALL   \
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT ", %%dx; inl (%%dx)", \
+ "vmcall", X86_FEATURE_VMCALL, \
+ "vmmcall", X86_FEATURE_VMW_VMMCALL)
+
+/*
+ * The high bandwidth out call. The low word of edx is presumed to have the
+ * HB and OUT bits set.
+ */
+#define VMWARE_HYPERCALL_HB_OUT
\
+   ALTERNATIVE_2("movw $" VMWARE_HYPERVISOR_PORT_

[tip: x86/vmware] x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls

2019-08-28 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: bac7b4e843232a3a49a042410cf743341eb0887e
Gitweb:
https://git.kernel.org/tip/bac7b4e843232a3a49a042410cf743341eb0887e
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:50 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 10:48:30 +02:00

x86/vmware: Update platform detection code for VMCALL/VMMCALL hypercalls

Vmware has historically used an INL instruction for this, but recent
hardware versions support using VMCALL/VMMCALL instead, so use this
method if supported at platform detection time. Explicitly code separate
macro versions since the alternatives self-patching has not been
performed at platform detection time.

Also put tighter constraints on the assembly input parameters.

Co-developed-by: Doug Covelli 
Signed-off-by: Doug Covelli 
Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: linux-graphics-maintai...@vmware.com
Cc: Thomas Gleixner 
Cc: virtualizat...@lists.linux-foundation.org
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-2-thomas...@shipmail.org
---
 arch/x86/kernel/cpu/vmware.c | 88 ---
 1 file changed, 71 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 3c64847..757dded 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -34,30 +34,65 @@
 #undef pr_fmt
 #define pr_fmt(fmt)"vmware: " fmt
 
-#define CPUID_VMWARE_INFO_LEAF 0x4000
+#define CPUID_VMWARE_INFO_LEAF   0x4000
+#define CPUID_VMWARE_FEATURES_LEAF   0x4010
+#define CPUID_VMWARE_FEATURES_ECX_VMMCALLBIT(0)
+#define CPUID_VMWARE_FEATURES_ECX_VMCALL BIT(1)
+
 #define VMWARE_HYPERVISOR_MAGIC0x564D5868
 #define VMWARE_HYPERVISOR_PORT 0x5658
 
-#define VMWARE_PORT_CMD_GETVERSION 10
-#define VMWARE_PORT_CMD_GETHZ  45
-#define VMWARE_PORT_CMD_GETVCPU_INFO   68
-#define VMWARE_PORT_CMD_LEGACY_X2APIC  3
-#define VMWARE_PORT_CMD_VCPU_RESERVED  31
+#define VMWARE_CMD_GETVERSION10
+#define VMWARE_CMD_GETHZ 45
+#define VMWARE_CMD_GETVCPU_INFO  68
+#define VMWARE_CMD_LEGACY_X2APIC  3
+#define VMWARE_CMD_VCPU_RESERVED 31
 
 #define VMWARE_PORT(cmd, eax, ebx, ecx, edx)   \
__asm__("inl (%%dx)" :  \
-   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
-   "0"(VMWARE_HYPERVISOR_MAGIC),   \
-   "1"(VMWARE_PORT_CMD_##cmd), \
-   "2"(VMWARE_HYPERVISOR_PORT), "3"(UINT_MAX) :\
-   "memory");
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :\
+   "memory")
+
+#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx) \
+   __asm__("vmcall" :  \
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(0), "b"(UINT_MAX) : \
+   "memory")
+
+#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx) \
+   __asm__("vmmcall" : \
+   "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :\
+   "a"(VMWARE_HYPERVISOR_MAGIC),   \
+   "c"(VMWARE_CMD_##cmd),  \
+   "d"(0), "b"(UINT_MAX) : \
+   "memory")
+
+#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do {   \
+   switch (vmware_hypercall_mode) {\
+   case CPUID_VMWARE_FEATURES_ECX_VMCALL:  \
+   VMWARE_VMCALL(cmd, eax, ebx, ecx, edx); \
+   break;  \
+   case CPUID_VMWARE_FEATURES_ECX_VMMCALL: \
+   VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);\
+

[tip: x86/vmware] drm/vmwgfx: Update the backdoor call with support for new instructions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: 6abe3778cf5abd59b23b9037796f3eab8b7f1d98
Gitweb:
https://git.kernel.org/tip/6abe3778cf5abd59b23b9037796f3eab8b7f1d98
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:52 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:36:46 +02:00

drm/vmwgfx: Update the backdoor call with support for new instructions

Use the definition provided by include/asm/vmware.h

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Acked-by: Dave Airlie 
Cc: Daniel Vetter 
Cc: dri-de...@lists.freedesktop.org
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: pv-driv...@vmware.com
Cc: Thomas Gleixner 
Cc: VMware Graphics 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-4-thomas...@shipmail.org
---
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 21 -
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.h | 35 ++--
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
index 59e9d05..b1df3e3 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg.c
@@ -46,8 +46,6 @@
 #define RETRIES 3
 
 #define VMW_HYPERVISOR_MAGIC0x564D5868
-#define VMW_HYPERVISOR_PORT 0x5658
-#define VMW_HYPERVISOR_HB_PORT  0x5659
 
 #define VMW_PORT_CMD_MSG30
 #define VMW_PORT_CMD_HB_MSG 0
@@ -93,7 +91,7 @@ static int vmw_open_channel(struct rpc_channel *channel, 
unsigned int protocol)
 
VMW_PORT(VMW_PORT_CMD_OPEN_CHANNEL,
(protocol | GUESTMSG_FLAG_COOKIE), si, di,
-   VMW_HYPERVISOR_PORT,
+   0,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -126,7 +124,7 @@ static int vmw_close_channel(struct rpc_channel *channel)
 
VMW_PORT(VMW_PORT_CMD_CLOSE_CHANNEL,
0, si, di,
-   (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)),
+   channel->channel_id << 16,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -160,7 +158,8 @@ static unsigned long vmw_port_hb_out(struct rpc_channel 
*channel,
VMW_PORT_HB_OUT(
(MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG,
msg_len, si, di,
-   VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16),
+   VMWARE_HYPERVISOR_HB | (channel->channel_id << 16) |
+   VMWARE_HYPERVISOR_OUT,
VMW_HYPERVISOR_MAGIC, bp,
eax, ebx, ecx, edx, si, di);
 
@@ -181,7 +180,7 @@ static unsigned long vmw_port_hb_out(struct rpc_channel 
*channel,
 
VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_SENDPAYLOAD << 16),
 word, si, di,
-VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+channel->channel_id << 16,
 VMW_HYPERVISOR_MAGIC,
 eax, ebx, ecx, edx, si, di);
}
@@ -213,7 +212,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel 
*channel, char *reply,
VMW_PORT_HB_IN(
(MESSAGE_STATUS_SUCCESS << 16) | VMW_PORT_CMD_HB_MSG,
reply_len, si, di,
-   VMW_HYPERVISOR_HB_PORT | (channel->channel_id << 16),
+   VMWARE_HYPERVISOR_HB | (channel->channel_id << 16),
VMW_HYPERVISOR_MAGIC, bp,
eax, ebx, ecx, edx, si, di);
 
@@ -230,7 +229,7 @@ static unsigned long vmw_port_hb_in(struct rpc_channel 
*channel, char *reply,
 
VMW_PORT(VMW_PORT_CMD_MSG | (MSG_TYPE_RECVPAYLOAD << 16),
 MESSAGE_STATUS_SUCCESS, si, di,
-VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+channel->channel_id << 16,
 VMW_HYPERVISOR_MAGIC,
 eax, ebx, ecx, edx, si, di);
 
@@ -269,7 +268,7 @@ static int vmw_send_msg(struct rpc_channel *channel, const 
char *msg)
 
VMW_PORT(VMW_PORT_CMD_SENDSIZE,
msg_len, si, di,
-   VMW_HYPERVISOR_PORT | (channel->channel_id << 16),
+   channel->channel_id << 16,
VMW_HYPERVISOR_MAGIC,
eax, ebx, ecx, edx, si, di);
 
@@ -327,7 +326,7 @@ static int vmw_recv_msg(struct rpc_channel *channel, void 
**msg,
 
VMW_PORT(VMW_PORT_CMD_RECVSIZE,
0, si, di,
-   (VMW_HYPERVISOR_PORT | (channel->channel_id << 16)),
+

[tip: x86/vmware] input/vmmouse: Update the backdoor call with support for new instructions

2019-08-28 Thread tip-bot2 for Thomas Hellstrom

The following commit has been merged into the x86/vmware branch of tip:

Commit-ID: f7b15c74cffd760ec9959078982d8268a38456c4
Gitweb:
https://git.kernel.org/tip/f7b15c74cffd760ec9959078982d8268a38456c4
Author:Thomas Hellstrom 
AuthorDate:Wed, 28 Aug 2019 10:03:53 +02:00
Committer: Borislav Petkov 
CommitterDate: Wed, 28 Aug 2019 13:43:01 +02:00

input/vmmouse: Update the backdoor call with support for new instructions

Use the definition provided by include/asm/vmware.h.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Borislav Petkov 
Reviewed-by: Doug Covelli 
Acked-by: Dmitry Torokhov 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: linux-in...@vger.kernel.org
Cc: Thomas Gleixner 
Cc: VMware Graphics 
Cc: 
Cc: x86-ml 
Link: https://lkml.kernel.org/r/20190828080353.12658-5-thomas...@shipmail.org
---
 drivers/input/mouse/vmmouse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c
index 871e5b5..148245c 100644
--- a/drivers/input/mouse/vmmouse.c
+++ b/drivers/input/mouse/vmmouse.c
@@ -16,12 +16,12 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "psmouse.h"
 #include "vmmouse.h"
 
 #define VMMOUSE_PROTO_MAGIC0x564D5868U
-#define VMMOUSE_PROTO_PORT 0x5658
 
 /*
  * Main commands supported by the vmmouse hypervisor port.
@@ -84,7 +84,7 @@ struct vmmouse_data {
 #define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4)  \
 ({ \
unsigned long __dummy1, __dummy2;   \
-   __asm__ __volatile__ ("inl %%dx" :  \
+   __asm__ __volatile__ (VMWARE_HYPERCALL :\
"=a"(out1), \
"=b"(out2), \
"=c"(out3), \
@@ -94,7 +94,7 @@ struct vmmouse_data {
"a"(VMMOUSE_PROTO_MAGIC),   \
"b"(in1),   \
"c"(VMMOUSE_PROTO_CMD_##cmd),   \
-   "d"(VMMOUSE_PROTO_PORT) :   \
+   "d"(0) :\
"memory");  \
 })

Re: cleanup the walk_page_range interface

2019-08-08 Thread Thomas Hellstrom


On 8/8/19 11:56 PM, Christoph Hellwig wrote:

On Thu, Aug 08, 2019 at 10:50:37AM -0700, Linus Torvalds wrote:

Note that both Thomas and Steven have series touching this area pending,
and there are a couple consumer in flux too - the hmm tree already
conflicts with this series, and I have potential dma changes on top of
the consumers in Thomas and Steven's series, so we'll probably need a
git tree similar to the hmm one to synchronize these updates.

I'd be willing to just merge this now, if that helps. The conversion
is mechanical, and my only slight worry would be that at least for my
original patch I didn't build-test the (few) non-x86
architecture-specific cases. But I did end up looking at them fairly
closely  (basically using some grep/sed scripts to see that the
conversions I did matched the same patterns). And your changes look
like obvious improvements too where any mistake would have been caught
by the compiler.

I did cross compile the s390 and powerpc bits, but I do not have an
openrisc compiler.


So I'm not all that worried from a functionality standpoint, and if
this will help the next merge window, I'll happily pull now.

That would help with this series vs the others, but not with the other
series vs each other.


Although my series doesn't touch the pagewalk code, it rather borrowed 
some concepts from it and used for the apply_to_page_range() interface.


The reason being that the pagewalk code requires the mmap_sem to be held 
(mainly for trans-huge pages and reading the vma->vm_flags if I 
understand the code correctly). That is fine when you scan the vmas of a 
process, but the helpers I wrote need to instead scan all vmas pointing 
into a struct address_space, and taking the mmap_sem for each vma will 
create lock inversion problems.


/Thomas

Re: [PATCH 2/3] pagewalk: seperate function pointers from iterator data

2019-08-08 Thread Thomas Hellstrom


On 8/8/19 5:42 PM, Christoph Hellwig wrote:

The mm_walk structure currently mixed data and code.  Split out the
operations vectors into a new mm_walk_ops structure, and while we
are changing the API also declare the mm_walk structure inside the
walk_page_range and walk_page_vma functions.

Based on patch from Linus Torvalds.

Signed-off-by: Christoph Hellwig 
---


Typo: For the patch title s/seperate/separate/

Otherwise for the series

Reviewed-by: Thomas Hellstrom 

/Thomas

[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

2019-07-16 Thread tip-bot for Thomas Hellstrom

Commit-ID:  406de552c2be6ded524c75d14a73cf7f027f587e
Gitweb: https://git.kernel.org/tip/406de552c2be6ded524c75d14a73cf7f027f587e
Author: Thomas Hellstrom 
AuthorDate: Thu, 28 Mar 2019 12:06:37 +
Committer:  Thomas Gleixner 
CommitDate: Wed, 17 Jul 2019 00:42:27 +0200

MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

Alok Kataria will be handing over VMware's maintainership of these
interfaces to Thomas Hellström, with pv-drivers as backup contact.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Thomas Gleixner 
Acked-by: Alok Kataria 
Acked-by: Juergen Gross 
Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com

---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..80fa7a4a0b56 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12074,7 +12074,8 @@ F:  Documentation/parport*.txt
 
 PARAVIRT_OPS INTERFACE
 M: Juergen Gross 
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
@@ -17087,7 +17088,8 @@ S:  Maintained
 F: drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c

[tip:x86/urgent] MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

2019-07-16 Thread tip-bot for Thomas Hellstrom

Commit-ID:  907cb11da7a725445dccc6c2ca2d428739f6cd71
Gitweb: https://git.kernel.org/tip/907cb11da7a725445dccc6c2ca2d428739f6cd71
Author: Thomas Hellstrom 
AuthorDate: Thu, 28 Mar 2019 12:06:37 +
Committer:  Thomas Gleixner 
CommitDate: Tue, 16 Jul 2019 23:13:50 +0200

MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE

Alok Kataria will be handing over VMware's maintainership of these
interfaces to Thomas Hellström, with pv-drivers as backup contact.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Thomas Gleixner 
Acked-by: Alok Kataria 
Acked-by: Juergen Gross 
Link: https://lkml.kernel.org/r/20190328120558.29897-1-thellst...@vmware.com
---
 MAINTAINERS | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f5533d1bda2e..80fa7a4a0b56 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12074,7 +12074,8 @@ F:  Documentation/parport*.txt
 
 PARAVIRT_OPS INTERFACE
 M: Juergen Gross 
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
@@ -17087,7 +17088,8 @@ S:  Maintained
 F: drivers/misc/vmw_balloon.c
 
 VMWARE HYPERVISOR INTERFACE
-M: Alok Kataria 
+M: Thomas Hellstrom 
+M: "VMware, Inc." 
 L: virtualizat...@lists.linux-foundation.org
 S: Supported
 F: arch/x86/kernel/cpu/vmware.c

Re: [PATCH] drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read

2019-05-20 Thread Thomas Hellstrom

Thanks, Murray,

I'll include in the next vmwgfx-fixes pull request.

On Mon, 2019-05-20 at 21:57 +1200, Murray McAllister wrote:
> If SVGA_3D_CMD_DX_SET_SHADER is called with a shader ID
> of SVGA3D_INVALID_ID, and a shader type of
> SVGA3D_SHADERTYPE_INVALID, the calculated binding.shader_slot
> will be 4294967295, leading to an out-of-bounds read in
> vmw_binding_loc()
> when the offset is calculated.
> 
> Signed-off-by: Murray McAllister 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> index 2ff7ba04d8c8..9aeb5448cfc1 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
> @@ -2193,7 +2193,8 @@ static int vmw_cmd_dx_set_shader(struct
> vmw_private *dev_priv,
>  
>   cmd = container_of(header, typeof(*cmd), header);
>  
> - if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX) {
> + if (cmd->body.type >= SVGA3D_SHADERTYPE_DX10_MAX ||
> + cmd->body.type < SVGA3D_SHADERTYPE_MIN) {
>   VMW_DEBUG_USER("Illegal shader type %u.\n",
>  (unsigned int) cmd->body.type);
>   return -EINVAL;

[PATCH 9/9] drm/vmwgfx: Add surface dirty-tracking callbacks

2019-04-12 Thread Thomas Hellstrom

Add the callbacks necessary to implement emulated coherent memory for
surfaces. Add a flag to the gb_surface_create ioctl to indicate that
surface memory should be coherent.
Also bump the drm minor version to signal the availability of coherent
surfaces.

Signed-off-by: Thomas Hellstrom 
---
 .../device_include/svga3d_surfacedefs.h   | 209 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |   4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_surface.c   | 390 +-
 include/uapi/drm/vmwgfx_drm.h |   4 +-
 4 files changed, 600 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h 
b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
index f2bfd3d80598..d901206c04e3 100644
--- a/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
+++ b/drivers/gpu/drm/vmwgfx/device_include/svga3d_surfacedefs.h
@@ -1280,7 +1280,6 @@ svga3dsurface_get_pixel_offset(SVGA3dSurfaceFormat format,
return offset;
 }
 
-
 static inline u32
 svga3dsurface_get_image_offset(SVGA3dSurfaceFormat format,
   surf_size_struct baseLevelSize,
@@ -1375,4 +1374,212 @@ 
svga3dsurface_is_screen_target_format(SVGA3dSurfaceFormat format)
return svga3dsurface_is_dx_screen_target_format(format);
 }
 
+/**
+ * struct svga3dsurface_mip - Mimpmap level information
+ * @bytes: Bytes required in the backing store of this mipmap level.
+ * @img_stride: Byte stride per image.
+ * @row_stride: Byte stride per block row.
+ * @size: The size of the mipmap.
+ */
+struct svga3dsurface_mip {
+   size_t bytes;
+   size_t img_stride;
+   size_t row_stride;
+   struct drm_vmw_size size;
+
+};
+
+/**
+ * struct svga3dsurface_cache - Cached surface information
+ * @desc: Pointer to the surface descriptor
+ * @mip: Array of mipmap level information. Valid size is @num_mip_levels.
+ * @mip_chain_bytes: Bytes required in the backing store for the whole chain
+ * of mip levels.
+ * @num_mip_levels: Valid size of the @mip array. Number of mipmap levels in
+ * a chain.
+ * @num_layers: Number of slices in an array texture or number of faces in
+ * a cubemap texture.
+ */
+struct svga3dsurface_cache {
+   const struct svga3d_surface_desc *desc;
+   struct svga3dsurface_mip mip[DRM_VMW_MAX_MIP_LEVELS];
+   size_t mip_chain_bytes;
+   u32 num_mip_levels;
+   u32 num_layers;
+};
+
+/**
+ * struct svga3dsurface_loc - Surface location
+ * @sub_resource: Surface subresource. Defined as layer * num_mip_levels +
+ * mip_level.
+ * @x: X coordinate.
+ * @y: Y coordinate.
+ * @z: Z coordinate.
+ */
+struct svga3dsurface_loc {
+   u32 sub_resource;
+   u32 x, y, z;
+};
+
+/**
+ * svga3dsurface_subres - Compute the subresource from layer and mipmap.
+ * @cache: Surface layout data.
+ * @mip_level: The mipmap level.
+ * @layer: The surface layer (face or array slice).
+ *
+ * Return: The subresource.
+ */
+static inline u32 svga3dsurface_subres(const struct svga3dsurface_cache *cache,
+  u32 mip_level, u32 layer)
+{
+   return cache->num_mip_levels * layer + mip_level;
+}
+
+/**
+ * svga3dsurface_setup_cache - Build a surface cache entry
+ * @size: The surface base level dimensions.
+ * @format: The surface format.
+ * @num_mip_levels: Number of mipmap levels.
+ * @num_layers: Number of layers.
+ * @cache: Pointer to a struct svga3dsurface_cach object to be filled in.
+ */
+static inline void svga3dsurface_setup_cache(const struct drm_vmw_size *size,
+SVGA3dSurfaceFormat format,
+u32 num_mip_levels,
+u32 num_layers,
+u32 num_samples,
+struct svga3dsurface_cache *cache)
+{
+   const struct svga3d_surface_desc *desc;
+   u32 i;
+
+   memset(cache, 0, sizeof(*cache));
+   cache->desc = desc = svga3dsurface_get_desc(format);
+   cache->num_mip_levels = num_mip_levels;
+   cache->num_layers = num_layers;
+   for (i = 0; i < cache->num_mip_levels; i++) {
+   struct svga3dsurface_mip *mip = &cache->mip[i];
+
+   mip->size = svga3dsurface_get_mip_size(*size, i);
+   mip->bytes = svga3dsurface_get_image_buffer_size
+   (desc, &mip->size, 0) * num_samples;
+   mip->row_stride =
+   __KERNEL_DIV_ROUND_UP(mip->size.width,
+ desc->block_size.width) *
+   desc->bytes_per_block * num_samples;
+   mip->img_stride =
+   __KERNEL_DIV_ROUND_UP(mip->size.height,
+ desc->block_size.height) *
+   mip->ro

Re: [PATCH v2] lib/scatterlist: Provide a DMA page iterator

2019-02-07 Thread Thomas Hellstrom

On Thu, 2019-02-07 at 22:26 +, Jason Gunthorpe wrote:
> Commit 2db76d7c3c6d ("lib/scatterlist: sg_page_iter: support sg lists
> w/o
> backing pages") introduced the sg_page_iter_dma_address() function
> without
> providing a way to use it in the general case. If the sg_dma_len() is
> not
> equal to the sg length callers cannot safely use the
> for_each_sg_page/sg_page_iter_dma_address combination.
> 
> Resolve this API mistake by providing a DMA specific iterator,
> for_each_sg_dma_page(), that uses the right length so
> sg_page_iter_dma_address() works as expected with all sglists.
> 
> A new iterator type is introduced to provide compile-time safety
> against
> wrongly mixing accessors and iterators.
> 
> Acked-by: Christoph Hellwig  (for scatterlist)
> Signed-off-by: Jason Gunthorpe 
> 

For the vmwgfx part, 
Acked-by: Thomas Hellstrom 

I'll take a deeper look to provide a vmwgfx fix as a follow up.

Thanks,
Thomas

Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs v2

2019-02-04 Thread Thomas Hellstrom

On Mon, 2019-02-04 at 13:11 +0100, Thomas Hellström wrote:
> On Mon, 2019-02-04 at 09:19 +0100, Christoph Hellwig wrote:
> > On Fri, Jan 25, 2019 at 09:12:13AM +0100, Thomas Hellstrom wrote:
> > > -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> > > - /*
> > > -  * No coherent page pool
> > > -  */
> > > - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> > > + /* No TTM coherent page pool? FIXME: Ask TTM instead! */
> > > + if (!(IS_ENABLED(CONFIG_SWIOTLB) ||
> > > IS_ENABLED(CONFIG_INTEL_IOMMU)) &&
> > > + (dev_priv->map_mode == vmw_dma_alloc_coherent))
> > >   return -EINVAL;
> > > -#endif
> > > +
> > 
> > I don't think this edited in change makes any sense.  The swiotlb
> > vs
> > dma-direct versions of dma_alloc_coherent are the same, so this
> > check
> > seems very obsfucating.
> 
> So this part of code is identical in functionality to the previous
> version. It checks whether the TTM module has the coherent page pool
> enabled. (an identical test is present in TTM). What we *really* need
> to do here instead is to ask TTM whether it has enabled its coherent
> page pool instead of trying to mimic TTM's test, and I have a
> changeset
> under review for that. But as mentioned previously, I don't want to
> change the TTM interface outside of a merge window, so we either have
> to live with the above for 5.0 or keep the old defines. I'd prefer
> the
> former so I don't have to respin the patch series once more.
> 
> Thanks,
> Thoams
> 

Hi, Christoph,

I need to get this merged this week. Could you please ack or ack
removing this hunk + updating the following patches for merge errors?

If no response, I'll add a Cc: tag on the patch and a #v1 to your s-o-
b.

Thanks,
Thomas

Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-17 Thread Thomas Hellstrom

On Thu, 2019-01-17 at 10:30 +0100, h...@lst.de wrote:
> On Wed, Jan 16, 2019 at 10:24:36AM -0700, Jason Gunthorpe wrote:
> > The fact is there is 0 industry interest in using RDMA on platforms
> > that can't do HW DMA cache coherency - the kernel syscalls required
> > to
> > do the cache flushing on the IO path would just destroy performance
> > to
> > the point of making RDMA pointless. Better to use netdev on those
> > platforms.
> 
> In general there is no syscall required for doing cache flushing, you
> just issue the proper instructions directly from userspace.

But what if there are other coherence issues? Like bounce-buffers?
I'd like to +1 on what Jason says about industry interest: FWIW, vmwgfx
is probably one of the graphics drivers that would lend itself best to
do a fully-dma-interface compliant graphics stack experiment. But being
a paravirtual driver, all platforms we can ever run on are fully
coherent unless someone introduces a fake incoherency by forcing
swiotlb. Putting many man-months of effort into supporting systems on
which we would never run on and can never test on can never make more
than academic sense.

>  
> 
> > The reality is that *all* the subsytems doing DMA kernel bypass are
> > ignoring the DMA mapping rules, I think we should support this
> > better,
> > and just accept that user space DMA will not be using syncing.
> > Block
> > access in cases when this is required, otherwise let it work as is
> > today.
> 
> In that case we just need to block userspace DMA access entirely.
> Which given the amount of problems it creates sounds like a pretty
> good idea anyway.

I'm not sure I'm following you here. Are you suggesting scratching
support for all (GP)GPU- and RDMA drivers?

Thanks,
Thomas

Re: [PATCH v4 2/3] locking: Implement an algorithm choice for Wound-Wait mutexes

2019-01-16 Thread Thomas Hellstrom

Hi,

On Wed, 2019-01-16 at 09:24 -0500, Rob Clark wrote:
> So, I guess this is to do w/ the magic of merge commits, but it looks
> like the hunk changing the crtc_ww_class got lost:

So what happened here is that this commit changed it to 
DEFINE_WD_CLASS
and the following commit changed it back again to
DEFINE_WW_CLASS
so that the algorithm change and only that came with the last commit. 
Apparently the history of that line got lost when merging since the
merge didn't change it; git blame doesn't show it changed, but when
looking at the commit history in gitk it's all clear. I guess that's a
feature of git.

The modeset locks now use true wound-wait rather than wait-die.

/Thomas

> 
>  ~/src/linux   master  git show --pretty=short
> 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> drivers/gpu/drm/drm_modeset_lock.c
> commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> Author: Thomas Hellstrom 
> 
> locking: Implement an algorithm choice for Wound-Wait mutexes
> 
> diff --git a/drivers/gpu/drm/drm_modeset_lock.c
> b/drivers/gpu/drm/drm_modeset_lock.c
> index 8a5100685875..638be2eb67b4 100644
> --- a/drivers/gpu/drm/drm_modeset_lock.c
> +++ b/drivers/gpu/drm/drm_modeset_lock.c
> @@ -70,7 +70,7 @@
>   * lists and lookup data structures.
>   */
> 
> -static DEFINE_WW_CLASS(crtc_ww_class);
> +static DEFINE_WD_CLASS(crtc_ww_class);
> 
>  /**
>   * drm_modeset_lock_all - take all modeset locks
>  ~/src/linux   master  git log --pretty=format:%H
> drivers/gpu/drm/drm_modeset_lock.c  | grep
> 08295b3b5beec9aac0f7a9db86f0fc3792039da3
>  ~/src/linux   master  1 
> 
> 
> BR,
> -R
> 
> 
> On Tue, Jun 19, 2018 at 4:29 AM Thomas Hellstrom <
> thellst...@vmware.com> wrote:
> > The current Wound-Wait mutex algorithm is actually not Wound-Wait
> > but
> > Wait-Die. Implement also Wound-Wait as a per-ww-class choice.
> > Wound-Wait
> > is, contrary to Wait-Die a preemptive algorithm and is known to
> > generate
> > fewer backoffs. Testing reveals that this is true if the
> > number of simultaneous contending transactions is small.
> > As the number of simultaneous contending threads increases, Wait-
> > Wound
> > becomes inferior to Wait-Die in terms of elapsed time.
> > Possibly due to the larger number of held locks of sleeping
> > transactions.
> > 
> > Update documentation and callers.
> > 
> > Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
> > tag patch-18-06-15
> > 
> > Each thread runs 10 batches of lock / unlock 800 ww mutexes
> > randomly
> > chosen out of 10. Four core Intel x86_64:
> > 
> > Algorithm#threads   Rollbacks  time
> > Wound-Wait   4  ~100   ~17s.
> > Wait-Die 4  ~15~19s.
> > Wound-Wait   16 ~36~109s.
> > Wait-Die 16 ~45~82s.
> > 
> > Cc: Ingo Molnar 
> > Cc: Jonathan Corbet 
> > Cc: Gustavo Padovan 
> > Cc: Maarten Lankhorst 
> > Cc: Sean Paul 
> > Cc: David Airlie 
> > Cc: Davidlohr Bueso 
> > Cc: "Paul E. McKenney" 
> > Cc: Josh Triplett 
> > Cc: Thomas Gleixner 
> > Cc: Kate Stewart 
> > Cc: Philippe Ombredanne 
> > Cc: Greg Kroah-Hartman 
> > Cc: linux-...@vger.kernel.org
> > Cc: linux-me...@vger.kernel.org
> > Cc: linaro-mm-...@lists.linaro.org
> > Co-authored-by: Peter Zijlstra 
> > Signed-off-by: Thomas Hellstrom 
> > 
> > ---
> > v2:
> > * Update API according to review comment by Greg Kroah-Hartman.
> > * Address review comments by Peter Zijlstra:
> >   - Avoid _Bool in composites
> >   - Fix typo
> >   - Use __mutex_owner() where applicable
> >   - Rely on built-in barriers for the main loop exit condition,
> > struct ww_acquire_ctx::wounded. Update code comments.
> >   - Explain unlocked use of list_empty().
> > v3:
> > * Adapt to and incorporate cleanup by Peter Zijlstra
> > * Remove unlocked use of list_empty().
> > v4:
> > * Move code related to adding a waiter to the lock waiter list to a
> >   separate function.
> > ---
> >  Documentation/locking/ww-mutex-design.txt |  57 +--
> >  drivers/dma-buf/reservation.c |   2 +-
> >  drivers/gpu/drm/drm_modeset_lock.c|   2 +-
> >  include/linux/ww_mutex.h  |  17 ++-
> >  kernel/locking/locktorture.c  |   2 +-
> >  kernel/locking/mutex.c| 165
> > +++---
> >  kernel/locking/test-ww_mutex.c|   2 +-
> >  lib/locking-selfte

Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom

On Tue, 2019-01-15 at 21:58 +0100, h...@lst.de wrote:
> On Tue, Jan 15, 2019 at 07:13:11PM +, Koenig, Christian wrote:
> > Thomas is correct that the interface you propose here doesn't work
> > at 
> > all for GPUs.
> > 
> > The kernel driver is not informed of flush/sync, but rather just
> > setups 
> > coherent mappings between system memory and devices.
> > 
> > In other words you have an array of struct pages and need to map
> > that to 
> > a specific device and so create dma_addresses for the mappings.
> 
> If you want a coherent mapping you need to use dma_alloc_coherent
> and dma_mmap_coherent and you are done, that is not the problem.
> That actually is one of the vmgfx modes, so I don't understand what
> problem we are trying to solve if you don't actually want a non-
> coherent mapping. 

For vmwgfx, not making dma_alloc_coherent default has a couple of
reasons:
1) Memory is associated with a struct device. It has not been clear
that it is exportable to other devices.
2) There seems to be restrictions in the system pages allowable. GPUs
generally prefer highmem pages but dma_alloc_coherent returns a virtual
address implying GFP_KERNEL? While not used by vmwgfx, TTM typically
prefers HIGHMEM pages to facilitate caching mode switching without
having to touch the kernel map. 
3) Historically we had APIs to allow coherent access to user-space
defined pages. That has gone away not but the infrastructure was built
around it.

dma_mmap_coherent isn't use because as the data moves between system
memory, swap and VRAM, PTEs of user-space mappings are adjusted
accordingly, meaning user-space doesn't have to unmap when an operation
is initiated that might mean the data is moved.

> Although last time I had that discussion with Daniel Vetter
> I was under the impressions that GPUs really wanted non-coherent
> mappings.

Intel historically has done things a bit differently. And it's also
possible that embedded platforms and ARM prefer this mode of operation,
but I haven't caught up on that discussion.

> 
> But if you want a coherent mapping you can't go to a struct page,
> because on many systems you can't just map arbitrary memory as
> uncachable.  It might either come from very special limited pools,
> or might need other magic applied to it so that it is not visible
> in the normal direct mapping, or at least not access through it.

The TTM subsystem has been relied on to provide coherent memory with
the option to switch caching mode of pages. But only on selected and
well tested platforms. On other platforms we simply do not load, and
that's fine for now.

But as mentioned multiple times, to make GPU drivers more compliant,
we'd really want that

bool dma_streaming_is_coherent(const struct device *)

API to help us decide when to load or not.

Thanks,
Thomas

Re: [PATCH -next] drm/vmwgfx: Remove set but not used variable 'srf'

2019-01-15 Thread Thomas Hellstrom

Yue,

Thanks,
Reviewed-by: Thomas Hellstrom 

Will include in the next -next pull.
/Thomas


On Wed, 2019-01-16 at 01:55 +, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/gpu/drm/vmwgfx/vmwgfx_surface.c: In function
> 'vmw_hw_surface_destroy':
> drivers/gpu/drm/vmwgfx/vmwgfx_surface.c:335:22: warning:
>  variable 'srf' set but not used [-Wunused-but-set-variable]
> 
> It never used since commit 543831cfc976 ï¼^"drm/vmwgfx: Break out
> surface and
> context management to separate files")
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> index 80a01cd..3b6976e 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
> @@ -332,7 +332,6 @@ static void vmw_hw_surface_destroy(struct
> vmw_resource *res)
>  {
>  
>   struct vmw_private *dev_priv = res->dev_priv;
> - struct vmw_surface *srf;
>   void *cmd;
>  
>   if (res->func->destroy == vmw_gb_surface_destroy) {
> @@ -359,7 +358,6 @@ static void vmw_hw_surface_destroy(struct
> vmw_resource *res)
>*/
>  
>   mutex_lock(&dev_priv->cmdbuf_mutex);
> - srf = vmw_res_to_srf(res);
>   dev_priv->used_memory_size -= res->backup_size;
>   mutex_unlock(&dev_priv->cmdbuf_mutex);
>   }
> 
> 
>

Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom

On Tue, 2019-01-15 at 16:20 +0100, h...@lst.de wrote:
> On Tue, Jan 15, 2019 at 03:24:55PM +0100, Christian König wrote:
> > Yeah, indeed. Bounce buffers are an absolute no-go for GPUs.
> > 
> > If the DMA API finds that a piece of memory is not directly
> > accessible by 
> > the GPU we need to return an error and not try to use bounce
> > buffers behind 
> > the surface.
> > 
> > That is something which always annoyed me with the DMA API, which
> > is 
> > otherwise rather cleanly defined.
> 
> That is exactly what I want to fix with my series to make
> DMA_ATTR_NON_CONSISTENT more useful and always available:
> 
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fiommu%2F2018-December%2F031985.html&data=02%7C01%7Cthellstrom%40vmware.com%7Cb1799c4073024a824f9408d67afcfcea%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636831624340834140&sdata=JiRBfzZMvN3joJ4vKiErbzXAHaNzuBcLapRJDL%2Bt6Hc%3D&reserved=0
> 
> With that you allocate the memory using dma_alloc_attrs with
> DMA_ATTR_NON_CONSISTENT, and use dma_sync_single_* to transfer
> ownership to the cpu and back to the device, with a gurantee that
> there won't be any bouncing.  So far the interest by the parties that
> requested the feature has been rather lacklustre, though.

In the graphics case, it's probably because it doesn't fit the graphics
use-cases:

1) Memory typically needs to be mappable by another device. (the "dma-
buf" interface)
2) DMA buffers are exported to user-space and is sub-allocated by it.
Mostly there are no GPU user-space kernel interfaces to sync / flush
subregions and these syncs may happen on a smaller-than-cache-line
granularity.

So to help the graphics driver, that coherent flag would be much more
useful.

/Thomas

Re: [PATCH] lib/scatterlist: Provide a DMA page iterator

2019-01-15 Thread Thomas Hellstrom

Hi, Christoph,

On Mon, 2019-01-14 at 10:48 +0100, Christoph Hellwig wrote:
> On Thu, Jan 10, 2019 at 04:42:18PM -0700, Jason Gunthorpe wrote:
> > > Changes since the RFC:
> > > - Rework vmwgfx too [CH]
> > > - Use a distinct type for the DMA page iterator [CH]
> > > - Do not have a #ifdef [CH]
> > 
> > ChristophH: Will you ack?
> 
> This looks generally fine.
> 
> > Are you still OK with the vmwgfx reworking, or should we go back to
> > the original version that didn't have the type safety so this
> > driver
> > can be left broken?
> 
> I think the map method in vmgfx that just does virt_to_phys is
> pretty broken.  Thomas, can you check if you see any performance
> difference with just doing the proper dma mapping, as that gets the
> driver out of interface abuse land?

The performance difference is not really the main problem here. The
problem is that even though we utilize the streaming DMA interface, we
use it only since we have to for DMA-Remapping and assume that the
memory is coherent. To be able to be as compliant as possible and ditch
the virt-to-phys mode, we *need* a DMA interface flag that tells us
when the dma_sync_for_xxx are no-ops. If they aren't we'll refuse to
load for now. I'm not sure, but I think also nouveau and radeon suffer
from the same issue.

> 
> While we're at it I think we need to merge my series in this area
> for 5.0, because without that the driver is already broken.  Where
> should we merge it?

I can merge it through vmwgfx/drm-fixes. There is an outstanding issue
with patch 3. Do you want me to fix that up?

Thanks,
Thomas

Re: fix DMA ops layering violations in vmwgfx

2019-01-08 Thread Thomas Hellstrom

On Tue, 2019-01-08 at 14:12 +0100, h...@lst.de wrote:
> On Tue, Jan 08, 2019 at 09:51:45AM +0000, Thomas Hellstrom wrote:
> > Hi, Christoph,
> > 
> > On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> > > Hi Thomas,
> > > 
> > > vmwgfx has been doing some odd checks based on DMA ops which rely
> > > on deep DMA mapping layer internals, and I think the changes in
> > > Linux 4.21 finally broke most of these implicit assumptions.
> > 
> > Thanks. 
> > What we're really trying to do here is to try to detect the
> > situation
> > where DMA remapping using hardware IOMMUs is going on but memory is
> > still coherent, since the driver can currently only work with
> > coherent
> > memory[1]. Currently we use intel_iommu_enabled to detect this
> > situation, but it would be really helpful if there were a generic
> > bool
> > that advertizes this situation since we need to deal with other
> > IOMMUs
> > as well going forward. Any suggestion?
> 
> I'm missing the link of the [1] reference above. 

Sorry. I meant to remove the reference since the explanation would be
rather lengthy.

> But if you need
> coherent memory you should simply always use dma_alloc_coherent, that
> is the only gurantee you get.  If you use any other dma mapping
> methods
> you will otherwise need to explicitly transfer ownership by mapping/
> unmapping or using dma_sync* before and after every device access.

The problem is that graphics applications typically allocate huge
amounts of DMA memory. The GPU may want to map half of available system
memory or more. Moreover this memory might need to be mappable by
multiple devices, which would rule out dma_alloc_coherent() in many
situations.

Using SWIOTLB with bounce-buffers is also typically out of the question
for performance- and memory usage reasons. Moreover the sync operations
would often affect sub-regions of 2D and 3D textures, which makes the
dma_sync API inefficient even if it maps to no-ops. vmwgfx would rather
not load in these situations.

Finally graphics applications, do not always provide conservative sync
regions.

> 
> And the whole DMA API bypass using the phys mode is something that
> I'd really prefer not to see in any driver as it tends to cause
> major problems sooner or later.

I fully understand this. However, given the above, the DMA API is quite
awkward for graphics operation. It would be easier to motivate a full
removal of the phys mode if we could query the DMA API whether the
dma_sync operations are no-ops or not.

/Thomas

Re: [PATCH 4/4] drm/vmwgfx: unwind spaghetti code in vmw_dma_select_mode

2019-01-08 Thread Thomas Hellstrom

On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> Just use a simple if/else chain to select the DMA mode.
> 
> Signed-off-by: Christoph Hellwig 

Reviewed-by: Thomas Hellstrom 


> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 25 ++---
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index c2060f6cc9e8..86387735a90b 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -566,34 +566,21 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   [vmw_dma_map_populate] = "Keeping DMA mappings.",
>   [vmw_dma_map_bind] = "Giving up DMA mappings early."};
>  
> - if (intel_iommu_enabled) {
> + if (vmw_force_coherent)
> + dev_priv->map_mode = vmw_dma_alloc_coherent;
> + else if (intel_iommu_enabled)
>   dev_priv->map_mode = vmw_dma_map_populate;
> - goto out_fixup;
> - }
> -
> - if (!(vmw_force_iommu || vmw_force_coherent)) {
> + else if (!vmw_force_iommu)
>   dev_priv->map_mode = vmw_dma_phys;
> - DRM_INFO("DMA map mode: %s\n", names[dev_priv-
> >map_mode]);
> - return 0;
> - }
> -
> -#ifdef CONFIG_SWIOTLB
> - if (swiotlb_nr_tbl())
> + else if (IS_ENABLED(CONFIG_SWIOTLB) && swiotlb_nr_tbl())
>   dev_priv->map_mode = vmw_dma_alloc_coherent;
>   else
> -#endif
>   dev_priv->map_mode = vmw_dma_map_populate;
>  
> -out_fixup:
> - if (dev_priv->map_mode == vmw_dma_map_populate &&
> - vmw_restrict_iommu)
> + if (dev_priv->map_mode == vmw_dma_map_populate &&
> vmw_restrict_iommu)
>   dev_priv->map_mode = vmw_dma_map_bind;
>  
> - if (vmw_force_coherent)
> - dev_priv->map_mode = vmw_dma_alloc_coherent;
> -
>   DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
> -
>   return 0;
>  }
>

Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs

2019-01-08 Thread Thomas Hellstrom

Hi,


On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
...

> intel_iommu_enabled is defined as always false for
> !CONFIG_INTEL_IOMMU,
> so remove the ifdefs around it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
> 
> -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> - /*
> -  * No coherent page pool
> -  */
> - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> - return -EINVAL;
> -#endif
> 

Actually this hunk is incorrect, it tries to determine whether the TTM
subsystem maintains a coherent page pool or not. If not, we can't use
vmw_dma_alloc_coherent.

/Thomas

Re: [PATCH 2/4] drm/vmwgfx: remove CONFIG_INTEL_IOMMU ifdefs

2019-01-08 Thread Thomas Hellstrom

Reviewed-by: Thomas Hellstrom 

On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> intel_iommu_enabled is defined as always false for
> !CONFIG_INTEL_IOMMU,
> so remove the ifdefs around it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 18 --
>  1 file changed, 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 69e325b2d954..236052ad233c 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -567,12 +567,10 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   [vmw_dma_map_bind] = "Giving up DMA mappings early."};
>   const struct dma_map_ops *dma_ops = get_dma_ops(dev_priv->dev-
> >dev);
>  
> -#ifdef CONFIG_INTEL_IOMMU
>   if (intel_iommu_enabled) {
>   dev_priv->map_mode = vmw_dma_map_populate;
>   goto out_fixup;
>   }
> -#endif
>  
>   if (!(vmw_force_iommu || vmw_force_coherent)) {
>   dev_priv->map_mode = vmw_dma_phys;
> @@ -589,9 +587,7 @@ static int vmw_dma_select_mode(struct vmw_private
> *dev_priv)
>   dev_priv->map_mode = vmw_dma_map_populate;
>  #endif
>  
> -#ifdef CONFIG_INTEL_IOMMU
>  out_fixup:
> -#endif
>   if (dev_priv->map_mode == vmw_dma_map_populate &&
>   vmw_restrict_iommu)
>   dev_priv->map_mode = vmw_dma_map_bind;
> @@ -599,13 +595,6 @@ static int vmw_dma_select_mode(struct
> vmw_private *dev_priv)
>   if (vmw_force_coherent)
>   dev_priv->map_mode = vmw_dma_alloc_coherent;
>  
> -#if !defined(CONFIG_SWIOTLB) && !defined(CONFIG_INTEL_IOMMU)
> - /*
> -  * No coherent page pool
> -  */
> - if (dev_priv->map_mode == vmw_dma_alloc_coherent)
> - return -EINVAL;
> -#endif
>   DRM_INFO("DMA map mode: %s\n", names[dev_priv->map_mode]);
>  
>   return 0;
> @@ -619,7 +608,6 @@ static int vmw_dma_select_mode(struct vmw_private
> *dev_priv)
>   * With 32-bit we can only handle 32 bit PFNs. Optionally set that
>   * restriction also for 64-bit systems.
>   */
> -#ifdef CONFIG_INTEL_IOMMU
>  static int vmw_dma_masks(struct vmw_private *dev_priv)
>  {
>   struct drm_device *dev = dev_priv->dev;
> @@ -631,12 +619,6 @@ static int vmw_dma_masks(struct vmw_private
> *dev_priv)
>   }
>   return 0;
>  }
> -#else
> -static int vmw_dma_masks(struct vmw_private *dev_priv)
> -{
> - return 0;
> -}
> -#endif
>  
>  static int vmw_driver_load(struct drm_device *dev, unsigned long
> chipset)
>  {

Re: fix DMA ops layering violations in vmwgfx

2019-01-08 Thread Thomas Hellstrom

Hi, Christoph,

On Sat, 2019-01-05 at 09:01 +0100, Christoph Hellwig wrote:
> Hi Thomas,
> 
> vmwgfx has been doing some odd checks based on DMA ops which rely
> on deep DMA mapping layer internals, and I think the changes in
> Linux 4.21 finally broke most of these implicit assumptions.

Thanks. 
What we're really trying to do here is to try to detect the situation
where DMA remapping using hardware IOMMUs is going on but memory is
still coherent, since the driver can currently only work with coherent
memory[1]. Currently we use intel_iommu_enabled to detect this
situation, but it would be really helpful if there were a generic bool
that advertizes this situation since we need to deal with other IOMMUs
as well going forward. Any suggestion?

Comments on the patches separately.

Thanks,
Thomas

> 
> The real fix is in patch 3, but I think the others are important
> to make it clear what is actually going on.

Re: [PATCH][drm-next] drm/selftest: fix spelling mistake "dimention" -> "dimenstion"

2018-12-10 Thread Thomas Hellstrom

On Mon, 2018-12-10 at 09:26 +, Colin King wrote:

Reviewed-by: Thomas Hellstrom 

I'll take this in the next pull request unless I'm told otherwise.

/Thomas

> From: Colin Ian King 
> 
> There is a spelling mistake in a pr_err message, fix this.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/selftests/test-drm_damage_helper.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> b/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> index a2f753205a3e..9d2bcdf8bc29 100644
> --- a/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> +++ b/drivers/gpu/drm/selftests/test-drm_damage_helper.c
> @@ -53,7 +53,7 @@ static bool check_damage_clip(struct
> drm_plane_state *state, struct drm_rect *r,
>   int src_y2 = (state->src.y2 >> 16) + !!(state->src.y2 &
> 0x);
>  
>   if (x1 >= x2 || y1 >= y2) {
> - pr_err("Cannot have damage clip with no dimention.\n");
> + pr_err("Cannot have damage clip with no dimension.\n");
>   return false;
>   }
>

[tip:locking/core] locking/mutex: Fix mutex debug call and ww_mutex documentation

2018-09-10 Thread tip-bot for Thomas Hellstrom

Commit-ID:  e13e2366d8415e029fe96a62502955083e272cef
Gitweb: https://git.kernel.org/tip/e13e2366d8415e029fe96a62502955083e272cef
Author: Thomas Hellstrom 
AuthorDate: Mon, 3 Sep 2018 16:07:08 +0200
Committer:  Ingo Molnar 
CommitDate: Mon, 10 Sep 2018 10:05:10 +0200

locking/mutex: Fix mutex debug call and ww_mutex documentation

The following commit:

  08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")

introduced a reference in the documentation to a function that was
removed in an earlier commit.

It also forgot to remove a call to debug_mutex_add_waiter() which is now
unconditionally called by __mutex_add_waiter().

Fix those bugs.

Signed-off-by: Thomas Hellstrom 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dri-de...@lists.freedesktop.org
Fixes: 08295b3b5bee ("Implement an algorithm choice for Wound-Wait mutexes")
Link: http://lkml.kernel.org/r/20180903140708.2401-1-thellst...@vmware.com
Signed-off-by: Ingo Molnar 
---
 kernel/locking/mutex.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 1a81a1257b3f..3f8a35104285 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -389,7 +389,7 @@ static bool __ww_mutex_wound(struct mutex *lock,
/*
 * wake_up_process() paired with set_current_state()
 * inserts sufficient barriers to make sure @owner either sees
-* it's wounded in __ww_mutex_lock_check_stamp() or has a
+* it's wounded in __ww_mutex_check_kill() or has a
 * wakeup pending to re-read the wounded state.
 */
if (owner != current)
@@ -946,7 +946,6 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
}
 
debug_mutex_lock_common(lock, &waiter);
-   debug_mutex_add_waiter(lock, &waiter, current);
 
lock_contended(&lock->dep_map, ip);

Re: [PATCH v4 0/4] locking,drm: Fix ww mutex naming / algorithm inconsistency

2018-06-19 Thread Thomas Hellstrom


On 06/19/2018 11:45 AM, Peter Zijlstra wrote:

I suspect you want this through the DRM tree? Ingo are you OK with that?



Yes, I can ask Dave to pull this. Ingo?

Thanks,

Thomas

Re: [PATCH 1/3] locking: WW mutex cleanup

2018-06-19 Thread Thomas Hellstrom


On 06/19/2018 11:44 AM, Peter Zijlstra wrote:

On Tue, Jun 19, 2018 at 10:24:43AM +0200, Thomas Hellstrom wrote:

From: Peter Ziljstra 

Make the WW mutex code more readable by adding comments, splitting up
functions and pointing out that we're actually using the Wait-Die
algorithm.

Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Co-authored-by: Thomas Hellstrom 
Signed-off-by: Thomas Hellstrom 
---
  Documentation/locking/ww-mutex-design.txt |  12 +-
  include/linux/ww_mutex.h  |  28 ++---
  kernel/locking/mutex.c| 202 ++
  3 files changed, 145 insertions(+), 97 deletions(-)

Acked-by: Peter Zijlstra (Intel) 


Hi Peter,

Do you want to add a SOB, since you're the main author?

Thomas

Re: [PATCH v2 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-14 Thread Thomas Hellstrom


On 06/14/2018 03:29 PM, Matthew Wilcox wrote:

On Thu, Jun 14, 2018 at 01:54:15PM +0200, Thomas Hellstrom wrote:

On 06/14/2018 01:36 PM, Peter Zijlstra wrote:

Currently you don't allow mixing WD and WW contexts (which is not
immediately obvious from the above code), and the above hard relies on
that. Are there sensible use cases for mixing them? IOW will your
current restriction stand without hassle?

Contexts _must_ agree on the algorithm used to resolve deadlocks. With
Wait-Die, for example, older transactions will wait if a lock is held by a
younger transaction and with Wound-Wait, younger transactions will wait if a
lock is held by an older transaction so there is no way of mixing them.

Maybe the compiler should be enforcing that; ie make it a different type?


It's intended to be enforced by storing the algorithm choice in the 
WW_MUTEX_CLASS which must be common for an acquire context and the 
ww_mutexes it acquires. However, I don't think there is a check that 
that holds. I guess we could add it as a DEBUG_MUTEX test in 
ww_mutex_lock().


Thanks,

Thomas

[PATCH 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-06-13 Thread Thomas Hellstrom

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.

Update documentation and callers.

Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-04

Each thread runs 10 batches of lock / unlock 800 ww mutexes randomly
chosen out of 10. Four core Intel x86_64:

Algorithm#threads   Rollbacks  time
Wound-Wait   4  ~100   ~17s.
Wait-Die 4  ~15~19s.
Wound-Wait   16 ~36~109s.
Wait-Die 16 ~45~82s.

Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Jonathan Corbet 
Cc: Gustavo Padovan 
Cc: Maarten Lankhorst 
Cc: Sean Paul 
Cc: David Airlie 
Cc: Davidlohr Bueso 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Thomas Gleixner 
Cc: Kate Stewart 
Cc: Philippe Ombredanne 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linaro-mm-...@lists.linaro.org
Signed-off-by: Thomas Hellstrom 
---
 Documentation/locking/ww-mutex-design.txt | 57 ++
 drivers/dma-buf/reservation.c |  2 +-
 drivers/gpu/drm/drm_modeset_lock.c|  2 +-
 include/linux/ww_mutex.h  | 19 --
 kernel/locking/locktorture.c  |  2 +-
 kernel/locking/mutex.c| 98 ---
 kernel/locking/test-ww_mutex.c|  2 +-
 lib/locking-selftest.c|  2 +-
 8 files changed, 152 insertions(+), 32 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_d

[PATCH RFC 1/2] locking: Implement an algorithm choice for Wound-Wait mutexes

2018-05-25 Thread Thomas Hellstrom

The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs in some cases. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die, probably due to the larger number of held locks
of sleeping transactions.

Update documentation and callers.

Signed-off-by: Thomas Hellstrom 
---
 Documentation/locking/ww-mutex-design.txt |  57 +
 drivers/dma-buf/reservation.c |   2 +-
 drivers/gpu/drm/drm_modeset_lock.c|   2 +-
 include/linux/ww_mutex.h  |  19 --
 kernel/locking/locktorture.c  |   2 +-
 kernel/locking/mutex.c| 101 +++---
 kernel/locking/test-ww_mutex.c|   2 +-
 lib/locking-selftest.c|   2 +-
 8 files changed, 154 insertions(+), 33 deletions(-)

diff --git a/Documentation/locking/ww-mutex-design.txt 
b/Documentation/locking/ww-mutex-design.txt
index 34c3a1b50b9a..29c85623b551 100644
--- a/Documentation/locking/ww-mutex-design.txt
+++ b/Documentation/locking/ww-mutex-design.txt
@@ -1,4 +1,4 @@
-Wait/Wound Deadlock-Proof Mutex Design
+Wound/Wait Deadlock-Proof Mutex Design
 ==
 
 Please read mutex-design.txt first, as it applies to wait/wound mutexes too.
@@ -32,10 +32,23 @@ the oldest task) wins, and the one with the higher 
reservation id (i.e. the
 younger task) unlocks all of the buffers that it has already locked, and then
 tries again.
 
-In the RDBMS literature this deadlock handling approach is called wait/wound:
-The older tasks waits until it can acquire the contended lock. The younger 
tasks
-needs to back off and drop all the locks it is currently holding, i.e. the
-younger task is wounded.
+In the RDBMS literature, a reservation ticket is associated with a transaction.
+and the deadlock handling approach is called Wait-Die. The name is based on
+the actions of a locking thread when it encounters an already locked mutex.
+If the transaction holding the lock is younger, the locking transaction waits.
+If the transaction holding the lock is older, the locking transaction backs off
+and dies. Hence Wait-Die.
+There is also another algorithm called Wound-Wait:
+If the transaction holding the lock is younger, the locking transaction
+preempts the transaction holding the lock, requiring it to back off. It
+Wounds the other transaction.
+If the transaction holding the lock is older, it waits for the other
+transaction. Hence Wound-Wait.
+The two algorithms are both fair in that a transaction will eventually succeed.
+However, the Wound-Wait algorithm is typically stated to generate fewer 
backoffs
+compared to Wait-Die, but is, on the other hand, associated with more work than
+Wait-Die when recovering from a backoff. Wound-Wait is also a preemptive
+algorithm which requires a reliable way to preempt another transaction.
 
 Concepts
 
@@ -47,10 +60,12 @@ Acquire context: To ensure eventual forward progress it is 
important the a task
 trying to acquire locks doesn't grab a new reservation id, but keeps the one it
 acquired when starting the lock acquisition. This ticket is stored in the
 acquire context. Furthermore the acquire context keeps track of debugging state
-to catch w/w mutex interface abuse.
+to catch w/w mutex interface abuse. An acquire context is representing a
+transaction.
 
 W/w class: In contrast to normal mutexes the lock class needs to be explicit 
for
-w/w mutexes, since it is required to initialize the acquire context.
+w/w mutexes, since it is required to initialize the acquire context. The lock
+class also specifies what algorithm to use, Wound-Wait or Wait-Die.
 
 Furthermore there are three different class of w/w lock acquire functions:
 
@@ -90,10 +105,15 @@ provided.
 Usage
 -
 
+The algorithm (Wait-Die vs Wound-Wait) is chosen using the _is_wait_die
+argument to DEFINE_WW_CLASS(). As a rough rule of thumb, use Wound-Wait iff you
+typically expect the number of simultaneous competing transactions to be small,
+and the rollback cost can be substantial.
+
 Three different ways to acquire locks within the same w/w class. Common
 definitions for methods #1 and #2:
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 struct obj {
struct ww_mutex lock;
@@ -243,7 +263,7 @@ struct obj {
struct list_head locked_list;
 };
 
-static DEFINE_WW_CLASS(ww_class);
+static DEFINE_WW_CLASS(ww_class, false);
 
 void __unlock_objs(struct list_head *list)
 {
@@ -312,12 +332,23 @@ Design:
   We maintain the following invariants for the wait list:
   (1) Waiters with an acquire context are sorted by stamp order; waiters
   without an acquire contex

[PATCH RFC 0/2] locking: Fix ww_mutex algorithm inconsistency.

2018-05-25 Thread Thomas Hellstrom

The algorithm used for linux Wound/Wait mutexes, is actually not Wound/Wait
but Wait/Die. See for example

http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html

Rather than renaming them across the tree to something like Wait/Die mutexes or
Deadlock Avoidance mutexes, this patch set implements also the Wound/Wait
algorithm It shouldn't touch the binary ordinary mutex paths when compiled
with otpimization.
We use Wound/Wait for the modeset locks which in theory may benefit slightly
from Wound/Wait rather than Wait/Die, but Wait/Die is actually superior for
larger number of simultaneous contending transactions so we keep the original
implementation as a choice, and don't touch the algorithm used by the
reservation objects.

Performance- and functional testing has been done using 

git://people.freedesktop.org/~thomash/ww_mutex_test

Using WW_BUILTIN and 4 and 16 competing threads.

Another option is of course to ignore the the naming / algorithm inconsistency.

[PATCH RFC 2/2] drm: Change deadlock-avoidance algorithm for the modeset locks.

2018-05-25 Thread Thomas Hellstrom

For modeset locks we don't expect a high number of contending
transactions so change algorithm from Wait-Die to Wound-Wait.

Signed-off-by: Thomas Hellstrom 
---
 drivers/gpu/drm/drm_modeset_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_modeset_lock.c 
b/drivers/gpu/drm/drm_modeset_lock.c
index f22a7ef41de1..294997765a2c 100644
--- a/drivers/gpu/drm/drm_modeset_lock.c
+++ b/drivers/gpu/drm/drm_modeset_lock.c
@@ -70,7 +70,7 @@
  * lists and lookup data structures.
  */
 
-static DEFINE_WW_CLASS(crtc_ww_class, true);
+static DEFINE_WW_CLASS(crtc_ww_class, false);
 
 /**
  * drm_modeset_lock_all - take all modeset locks
-- 
2.14.3

Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-27 Thread Thomas Hellstrom


On 04/27/2018 06:56 PM, Robin Murphy wrote:

Hi Thomas,

On 25/04/18 14:21, Thomas Hellstrom wrote:

Hi, Robin,

Thanks for the patch. It was some time since I put together that 
code, but I remember hitting something similar to


https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linuxquestions.org_questions_linux-2Dkernel-2D70_-2527nents-2527-2Dargument-2Dof-2Ddma-5Funmap-5Fsg-2D4175621964_&d=DwIDaQ&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=UACKfhMfw9wac0BNUWXnAiivjaBgY_jAEupre0zXoOQ&s=8NQNd-XBCViYHJH4fHk-RluFYd9CDjbYzXl_BWhC0ig&e= 



Even if it's clear from the documentation that orig_nents should be 
used.


Hmmm, it's odd that you would see issues - it's always been something 
that CONFIG_DMA_API_DEBUG would have screamed about, and as far as I'm 
aware for x86, nents and orig_nents should always end up equal anyway. 
I would definitely be interested to see the specific fault details if 
it can be reproduced. I suppose one possibility is that there's some 
path where you inadvertently unmap something which was never mapped, 
but passing nents=0 means you manage to get away with it without the 
DMA API backend trying to interpret any bogus DMA addresses/lengths.


FWIW, the rationale is that sync_sg/unmap_sg operate on sg->page 
(which can always be translated back to a meaningful CPU address for 
cache/write buffer maintenance), not sg->dma_address (which sometimes 
cannot), therefore passing a truncated list will have the effect of 
just not syncing the tail end of the buffer, which is clearly bad.


Robin.

I agree. I browsed the current software- and hardware iommu dma backends 
and all of them seem to set nents == orig_nents.
Still, according to the docs sg_map() is free to erase any sg->page - 
related information and given that, it would be more natural if all 
operations after sg_map() would operate on sg->dma_address. I mean if 
sg_map would truncate the sg list length to 1, and erase all other 
information (which it clearly is free to do according to the docs), it 
would be pretty meaningless to supply orig_nents for the unmapping 
operation?



/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which 
may

be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but 
it's

always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c

index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct 
vmw_ttm_tt *vmw_tt)

  {
  struct device *dev = vmw_tt->dev_priv->dev->dev;
+    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
  DMA_BIDIRECTIONAL);
-    vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  /**

Re: [PATCH] drm/vmwgfx: Fix scatterlist unmapping

2018-04-25 Thread Thomas Hellstrom


Hi, Robin,

Thanks for the patch. It was some time since I put together that code, 
but I remember hitting something similar to


https://www.linuxquestions.org/questions/linux-kernel-70/%27nents%27-argument-of-dma_unmap_sg-4175621964/

Even if it's clear from the documentation that orig_nents should be used.

/Thomas

On 04/13/2018 05:14 PM, Robin Murphy wrote:

dma_unmap_sg() should be called with the same number of entries
originally passed to dma_map_sg(), not the number it returned, which may
be fewer. Admittedly this driver probably never runs on non-coherent
architectures where getting that wrong could lead to data loss, but it's
always good to be correct, and it's trivially easy to fix by just
restoring the SG table state before the call instead of afterwards.

Signed-off-by: Robin Murphy 
---

Found by inspection while poking around TTM users.

  drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index 2fd091f9..971223d39469 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -369,9 +369,9 @@ static void vmw_ttm_unmap_from_dma(struct vmw_ttm_tt 
*vmw_tt)
  {
struct device *dev = vmw_tt->dev_priv->dev->dev;
  
+	vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;

dma_unmap_sg(dev, vmw_tt->sgt.sgl, vmw_tt->sgt.nents,
DMA_BIDIRECTIONAL);
-   vmw_tt->sgt.nents = vmw_tt->sgt.orig_nents;
  }
  
  /**

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 03:47 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 01:35:25PM +0200, Thomas Hellstrom wrote:

On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s&s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ&e=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
drivers/gpu/drm/drm_atomic.c| 42 
+
drivers/gpu/drm/drm_atomic_helper.c |  4 
drivers/gpu/drm/drm_mode_config.c   |  5 +
drivers/gpu/drm/drm_plane.c | 12 +++
include/drm/drm_mode_config.h   | 15 +
include/drm/drm_plane.h | 16 ++
include/uapi/drm/drm_mode.h | 15 +
7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
}
/**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
 * drm_atomic_get_plane_state - get plane state
 * @state: global atomic state object
 * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-05 Thread Thomas Hellstrom


On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointi

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 12:10 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 10:49:09AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
   drivers/gpu/drm/drm_atomic_helper.c | 122 

   include/drm/drm_atomic_helper.h |  39 
   2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
   }
   EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(&iter->plane_src, &iter->fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(&iter->crtc_src, -iter->translate_crtc_x,
+

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 12:03 PM, Daniel Vetter wrote:


On the topic of input validation: Should the kernel check in shared code
that all the clip rects are sensible, i.e. within the dimensions of the
fb? Relying on drivers for that seems like a bad idea.


I guess we could do that.

There seems to be different needs for different kinds of iterators, but 
otherwise
I was thinking that an iterator could just skip invalid rects if found. 
Then we
don't need a separate validation step, but OTOH user-space won't get 
notified if

that was intended.

I'm not 100% sure user-space would do anything sensible with any error 
information, though.


/Thomas




That could be done in core code in drm_atomic_plane_check().
-Daniel

/Thomas


___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_dri-2Ddevel&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=JV7fhQ4zTiyiKsY8M0Zlf81l7szYePUke8kwSvQv1T8&s=HEcbr-8aoWqRvWGY6RcL1QeAtEyxMRLHbsOtdJFk78I&e=

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 12:03 PM, Daniel Vetter wrote:

On Thu, Apr 05, 2018 at 11:00:15AM +0200, Thomas Hellstrom wrote:

On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s&s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ&e=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
   drivers/gpu/drm/drm_atomic.c| 42 
+
   drivers/gpu/drm/drm_atomic_helper.c |  4 
   drivers/gpu/drm/drm_mode_config.c   |  5 +
   drivers/gpu/drm/drm_plane.c | 12 +++
   include/drm/drm_mode_config.h   | 15 +
   include/drm/drm_plane.h | 16 ++
   include/uapi/drm/drm_mode.h | 15 +
   7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
   }
   /**
+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
* drm_atomic_get_plane_state - get plane state
* @state: global atomic state object
* @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_

Re: [RFC 1/3] drm: Add DAMAGE_CLIPS property to plane

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 09:35 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:06PM -0700, Deepak Rawat wrote:

From: Lukasz Spintzyk 

Optional plane property to mark damaged regions on the plane in
framebuffer coordinates of the framebuffer attached to the plane.

The layout of blob data is simply an array of drm_mode_rect with maximum
array size limited by DRM_MODE_FB_DIRTY_MAX_CLIPS. Unlike plane src
coordinates, damage clips are not in 16.16 fixed point.

Damage clips are a hint to kernel as which area of framebuffer has
changed since last page-flip. This should be helpful for some drivers
especially for virtual devices where each framebuffer change needs to
be transmitted over network, usb, etc.

Driver which are interested in enabling DAMAGE_CLIPS property for a
plane should enable this property using drm_plane_enable_damage_clips.

Signed-off-by: Lukasz Spintzyk 
Signed-off-by: Deepak Rawat 

The property uapi section is missing, see:

https://urldefense.proofpoint.com/v2/url?u=https-3A__dri.freedesktop.org_docs_drm_gpu_drm-2Dkms.html-23plane-2Dcomposition-2Dproperties&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=J31YNV1uz3IVRaorM-eo48msi9__sQurqRZrig2UE1s&s=vzsuquF1agbOw54HSC_18Kk2k9j83m8RusmDCtPAFWQ&e=

Plane composition feels like the best place to put this. Please use that
section to tie all the various bits together, including the helpers you're
adding in the following patches for drivers to use.

Bunch of nitpicks below, but overall I'm agreeing now with just going with
fb coordinate damage rects.

Like you say, the thing needed here now is userspace + driver actually
implementing this. I'd also say the compat helper to map the legacy
fb->dirty to this new atomic way of doing things should be included here
(gives us a lot more testing for these new paths).

Icing on the cake would be an igt to make sure kernel rejects invalid clip
rects correctly.


---
  drivers/gpu/drm/drm_atomic.c| 42 +
  drivers/gpu/drm/drm_atomic_helper.c |  4 
  drivers/gpu/drm/drm_mode_config.c   |  5 +
  drivers/gpu/drm/drm_plane.c | 12 +++
  include/drm/drm_mode_config.h   | 15 +
  include/drm/drm_plane.h | 16 ++
  include/uapi/drm/drm_mode.h | 15 +
  7 files changed, 109 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 7d25c42..9226d24 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -669,6 +669,40 @@ static void drm_atomic_crtc_print_state(struct drm_printer 
*p,
  }
  
  /**

+ * drm_atomic_set_damage_for_plane - sets the damage clips property to plane
+ * @state: plane state
+ * @blob: damage clips in framebuffer coordinates
+ *
+ * Returns:
+ *
+ * Zero on success, error code on failure.
+ */
+static int drm_atomic_set_damage_for_plane(struct drm_plane_state *state,
+  struct drm_property_blob *blob)
+{
+   if (blob == state->damage_clips)
+   return 0;
+
+   drm_property_blob_put(state->damage_clips);
+   state->damage_clips = NULL;
+
+   if (blob) {
+   uint32_t count = blob->length/sizeof(struct drm_rect);
+
+   if (count > DRM_MODE_FB_DIRTY_MAX_CLIPS)
+   return -EINVAL;
+
+   state->damage_clips = drm_property_blob_get(blob);
+   state->num_clips = count;
+   } else {
+   state->damage_clips = NULL;
+   state->num_clips = 0;
+   }
+
+   return 0;
+}
+
+/**
   * drm_atomic_get_plane_state - get plane state
   * @state: global atomic state object
   * @plane: plane to get state object for
@@ -793,6 +827,12 @@ static int drm_atomic_plane_set_property(struct drm_plane 
*plane,
state->color_encoding = val;
} else if (property == plane->color_range_property) {
state->color_range = val;
+   } else if (property == config->prop_damage_clips) {
+   struct drm_property_blob *blob =
+   drm_property_lookup_blob(dev, val);
+   int ret = drm_atomic_set_damage_for_plane(state, blob);

There's already a helper with size-checking built-in, see
drm_atomic_replace_property_blob_from_id(). Wrt computing num_clips I'd
just provide a little inline helper that does the
blob->length/sizeof(drm_rect) conversion (it's just a shift, so fast).


+   drm_property_blob_put(blob);
+   return ret;
} else if (plane->funcs->atomic_set_property) {
return plane->funcs->atomic_set_property(plane, state,
property, val);
@@ -856,6 +896,8 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
*val = state->color_encoding;
} else if (property == plane->color_range_property) {
*val = state->color_range;
+   } else if (property == conf

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 09:52 AM, Daniel Vetter wrote:


TYPE_PLANE I have no idea who needs that. I suggest we just drop it.


I'm assuming CRTC plane coordinates here. They are used for uploading 
contents of hardware planes. Like, in the simplest case, cursor images.


/Thomas

Re: [RFC 2/3] drm: Add helper iterator functions to iterate over plane damage.

2018-04-05 Thread Thomas Hellstrom


On 04/05/2018 09:52 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 04:49:07PM -0700, Deepak Rawat wrote:

With damage property in drm_plane_state, this patch adds helper iterator
to traverse the damage clips. Iterator will return the damage rectangles
in framebuffer, plane or crtc coordinates as need by driver
implementation.

Signed-off-by: Deepak Rawat 

I'd really like selftest/unittests for this stuff. There's an awful lot of
cornercases in this here (for any of the transformed iterators at least),
and unit tests is the best way to make sure we handle them all correctly.

Bonus points if you integrate the new selftests into igt so intel CI can
run them, seel igt/tests/drm_mm.c for an example. drm_mm selftest is also
the framework I'd copy for this stuff.


---
  drivers/gpu/drm/drm_atomic_helper.c | 122 
  include/drm/drm_atomic_helper.h |  39 
  2 files changed, 161 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 55b44e3..355b514 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -3865,3 +3865,125 @@ void 
__drm_atomic_helper_private_obj_duplicate_state(struct drm_private_obj *obj
memcpy(state, obj->state, sizeof(*state));
  }
  EXPORT_SYMBOL(__drm_atomic_helper_private_obj_duplicate_state);
+
+/**
+ * drm_atomic_helper_damage_iter_init - initialize the damage iterator
+ * @iter: The iterator to initialize.
+ * @type: Coordinate type caller is interested in.
+ * @state: plane_state from which to iterate the damage clips.
+ * @hdisplay: Width of crtc on which plane is scanned out.
+ * @vdisplay: Height of crtc on which plane is scanned out.
+ *
+ * Initialize an iterator that is used to translate and clip a set of damage
+ * rectangles in framebuffer coordinates to plane and crtc coordinates. The 
type
+ * argument specify which type of coordinate to iterate in.
+ *
+ * Returns: 0 on success and negative error code on error. If an error code is
+ * returned then it means the plane state should not update.
+ */
+int
+drm_atomic_helper_damage_iter_init(struct drm_atomic_helper_damage_iter *iter,
+  enum drm_atomic_helper_damage_clip_type type,
+  const struct drm_plane_state *state,
+  uint32_t hdisplay, uint32_t vdisplay)
+{
+   if (!state || !state->crtc || !state->fb)
+   return -EINVAL;
+
+   memset(iter, 0, sizeof(*iter));
+   iter->clips = (struct drm_rect *)state->damage_clips->data;
+   iter->num_clips = state->num_clips;
+   iter->type = type;
+
+   /*
+* Full update in case of scaling or rotation. In future support for
+* scaling/rotating damage clips can be added
+*/
+   if (state->crtc_w != (state->src_w >> 16) ||
+   state->crtc_h != state->src_h >> 16 || state->rotation != 0) {
+   iter->curr_clip = iter->num_clips;
+   return 0;

Given there's no user of this I have no idea how this manages to provoke a
full clip rect. selftest code would be perfect for stuff like this.

Also, I think we should provide a full clip for the case of num_clips ==
0, so that driver code can simply iterate over all clips and doesn't ever
have to handle the "no clip rects provided" case as a special case itself.


+   }
+
+   iter->fb_src.x1 = 0;
+   iter->fb_src.y1 = 0;
+   iter->fb_src.x2 = state->fb->width;
+   iter->fb_src.y2 = state->fb->height;
+
+   iter->plane_src.x1 = state->src_x >> 16;
+   iter->plane_src.y1 = state->src_y >> 16;
+   iter->plane_src.x2 = iter->plane_src.x1 + (state->src_w >> 16);
+   iter->plane_src.y2 = iter->plane_src.y1 + (state->src_h >> 16);
+   iter->translate_plane_x = -iter->plane_src.x1;
+   iter->translate_plane_y = -iter->plane_src.y1;
+
+   /* Clip plane src rect to fb dimensions */
+   drm_rect_intersect(&iter->plane_src, &iter->fb_src);

This smells like driver bug. Also, see Ville's recent efforts to improve
the atomic plane clipping, I think drm_plane_state already has all the
clip rects you want.


+
+   iter->crtc_src.x1 = 0;
+   iter->crtc_src.y1 = 0;
+   iter->crtc_src.x2 = hdisplay;
+   iter->crtc_src.y2 = vdisplay;
+   iter->translate_crtc_x = -(iter->plane_src.x1 - state->crtc_x);
+   iter->translate_crtc_x = -(iter->plane_src.y1 - state->crtc_y);
+
+   /* Clip crtc src rect to plane dimensions */
+   drm_rect_translate(&iter->crtc_src, -iter->translate_crtc_x,
+  -iter->translate_crtc_x);

We can also scale.


I suggest we leave out scaling for now until someone actually needs it.




+   drm_rect_intersect(&iter->crtc_src, &iter->plane_src);
+
+   return 0;
+}
+EXPORT_SYMBOL(drm_atomic_helper_damage_iter_init);
+
+/**
+ * drm_atomic_helper_damage_iter_next - advance the damage iterator
+ * @ite

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom


On 04/04/2018 12:28 PM, Thomas Hellstrom wrote:

On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:
On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  
wrote:
Add an atomic helper to implement dirtyfb support.  This is 
needed to
support DSI command-mode panels with x11 userspace (ie. when we 
can't

rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on 
getting
upstream kernel working on various different phones/tablets with 
qcom

SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH 
bits

with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic 
dirty-

rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.
Adding Noralf, who iirc already posted the full dirty helpers 
already somewhere.

-Daniel
I've asked Deepak to RFC the core changes suggested for the full 
dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to

get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip 
context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to 
restrict the

damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty 
flag like
you do here, or provide a content_age state member. Since we clear 
the dirty
flag on state copies, I guess that would be sufficient. The blob 
rectangles

would then become a hint to restrict the damage region.
I'm not entirely following here - I thought for frontbuffer 
rendering the
dirty rects have always just been a hint, and that the driver was 
always

free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated 
around in

different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already 
need to

upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be 
tempting for
user-space apps to just set the dirty bit instead of providing the 
full

damage region.
Or I'm not following you here, because I don't quite see the 
difference

between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content 
and the
need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new 
content and
the need for manual upload is identified by the dirty flag, (since 
the fb

stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the 
same

way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to 
communicate

front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom


On 04/04/2018 11:56 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 11:10:08AM +0200, Thomas Hellstrom wrote:

On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel

OK, let me rephrase:

 From the driver's point-of-view, in the atomic world, new content and the
need for manual upload is indicated by a change in fb attached to the plane.

With Rob's patch here, (correct me if I'm wrong) in addition new content and
the need for manual upload is identified by the dirty flag, (since the fb
stays the same and the driver thus never identifies a page-flip).

Hm, I'm not entirely sure Rob's approach is correct. Imo a pageflip
(atomic or not) should result in the entire buffer getting uploaded. The
dirty flag is kinda redundant, a flip with the same buffer works the same
way as a dirtyfb with the entire buffer as the dirty rectangle.


In both these situations, clip rects can provide a hint to restrict the
dirty region.

Now I was thinking about the preferred way for user-space to communicate
front buffer rendering through the atomic ioctl:

1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointi

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom


On 04/04/2018 10:43 AM, Daniel Vetter wrote:

On Wed, Apr 04, 2018 at 10:22:21AM +0200, Thomas Hellstrom wrote:

Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel

I've asked Deepak to RFC the core changes suggested for the full dirty blob
on dri-devel. It builds on DisplayLink's suggestion, with a simple helper to
get to the desired coordinates.

One thing to perhaps discuss is how we would like to fit this with
front-buffer rendering and the dirty ioctl. In the page-flip context, the
dirty rects, like egl's swapbuffer_with_damage is a hint to restrict the
damage region that can be fully ignored by the driver, new content is
indicated by a new framebuffer.

We could do the same for frontbuffer rendering: Either set a dirty flag like
you do here, or provide a content_age state member. Since we clear the dirty
flag on state copies, I guess that would be sufficient. The blob rectangles
would then become a hint to restrict the damage region.

I'm not entirely following here - I thought for frontbuffer rendering the
dirty rects have always just been a hint, and that the driver was always
free to re-upload the entire buffer to the screen.

And through a helper like Rob's proposing here (and have floated around in
different versions already) we'd essentially map a frontbuffer dirtyfb
call to a fake flip with dirty rect. Manual upload drivers already need to
upload the entire screen if they get a flip, since some userspace uses
that to flush out frontbuffer rendering (instead of calling dirtyfb).

So from that pov the new dirty flag is kinda not necessary imo.


Another approach would be to have the presence of dirty rects without
framebuffer change to indicate frontbuffer rendering.

I think I like the first approach best, although it may be tempting for
user-space apps to just set the dirty bit instead of providing the full
damage region.

Or I'm not following you here, because I don't quite see the difference
between dirtyfb and a flip.
-Daniel


OK, let me rephrase:

From the driver's point-of-view, in the atomic world, new content and 
the need for manual upload is indicated by a change in fb attached to 
the plane.


With Rob's patch here, (correct me if I'm wrong) in addition new content 
and the need for manual upload is identified by the dirty flag, (since 
the fb stays the same and the driver thus never identifies a page-flip).


In both these situations, clip rects can provide a hint to restrict the 
dirty region.


Now I was thinking about the preferred way for user-space to communicate 
front buffer rendering through the atomic ioctl:


1) Expose a dirty (or content_age property)
2) Attach a clip-rect blob property.
3) Fake a page-flip by ping-ponging two frame-buffers pointing to the 
same underlying buffer object.


Are you saying that people are already using 3) and we should keep using 
that?


/Thomas

Re: [RFC] drm/atomic+msm: add helper to implement legacy dirtyfb

2018-04-04 Thread Thomas Hellstrom


Hi,

On 04/04/2018 08:58 AM, Daniel Vetter wrote:

On Wed, Apr 4, 2018 at 12:42 AM, Rob Clark  wrote:

Add an atomic helper to implement dirtyfb support.  This is needed to
support DSI command-mode panels with x11 userspace (ie. when we can't
rely on pageflips to trigger a flush to the panel).

To signal to the driver that the async atomic update needs to
synchronize with fences, even though the fb didn't change, the
drm_atomic_state::dirty flag is added.

Signed-off-by: Rob Clark 
---
Background: there are a number of different folks working on getting
upstream kernel working on various different phones/tablets with qcom
SoC's.. many of them have command mode panels, so we kind of need a
way to support the legacy dirtyfb ioctl for x11 support.

I know there is work on a proprer non-legacy atomic property for
userspace to communicate dirty-rect(s) to the kernel, so this can
be improved from triggering a full-frame flush once that is in
place.  But we kinda needa a stop-gap solution.

I had considered an in-driver solution for this, but things get a
bit tricky if userspace ands up combining dirtyfb ioctls with page-
flips, because we need to synchronize setting various CTL.FLUSH bits
with setting the CTL.START bit.  (ie. really all we need to do for
cmd mode panels is bang CTL.START, but is this ends up racing with
pageflips setting FLUSH bits, then bad things.)  The easiest soln
is to wrap this up as an atomic commit and rely on the worker to
serialize things.  Hence adding an atomic dirtyfb helper.

I guess at least the helper, with some small addition to translate
and pass-thru the dirty rect(s) is useful to the final atomic dirty-
rect property solution.  Depending on how far off that is, a stop-
gap solution could be useful.

Adding Noralf, who iirc already posted the full dirty helpers already somewhere.
-Daniel


I've asked Deepak to RFC the core changes suggested for the full dirty 
blob on dri-devel. It builds on DisplayLink's suggestion, with a simple 
helper to get to the desired coordinates.


One thing to perhaps discuss is how we would like to fit this with 
front-buffer rendering and the dirty ioctl. In the page-flip context, 
the dirty rects, like egl's swapbuffer_with_damage is a hint to restrict 
the damage region that can be fully ignored by the driver, new content 
is indicated by a new framebuffer.


We could do the same for frontbuffer rendering: Either set a dirty flag 
like you do here, or provide a content_age state member. Since we clear 
the dirty flag on state copies, I guess that would be sufficient. The 
blob rectangles would then become a hint to restrict the damage region.


Another approach would be to have the presence of dirty rects without 
framebuffer change to indicate frontbuffer rendering.


I think I like the first approach best, although it may be tempting for 
user-space apps to just set the dirty bit instead of providing the full 
damage region.


/Thomas

Re: Signal handling in a page fault handler

2018-04-03 Thread Thomas Hellstrom


On 04/03/2018 02:33 PM, Chris Wilson wrote:

Quoting Matthew Wilcox (2018-04-02 15:10:58)

Souptick and I have been auditing the various page fault handler routines
and we've noticed that graphics drivers assume that a signal should be
able to interrupt a page fault.  In contrast, the page cache takes great
care to allow only fatal signals to interrupt a page fault.

I believe (but have not verified) that a non-fatal signal being delivered
to a task which is in the middle of a page fault may well end up in an
infinite loop, attempting to handle the page fault and failing forever.

Here's one of the simpler ones:

 ret = mutex_lock_interruptible(&etnaviv_obj->lock);
 if (ret)
 return VM_FAULT_NOPAGE;

(many other drivers do essentially the same thing including i915)

On seeing NOPAGE, the fault handler believes the PTE is in the page
table, so does nothing before it returns to arch code at which point
I get lost in the magic assembler macros.  I believe it will end up
returning to userspace if the signal is non-fatal, at which point it'll
go right back into the page fault handler, and mutex_lock_interruptible()
will immediately fail.  So we've converted a sleeping lock into the most
expensive spinlock.

I'll ask the obvious question: why isn't the signal handled on return to
userspace?


+1




I don't think the graphics drivers really want to be interrupted by
any signal.

Assume the worst case and we may block for 10s. Even a 10ms delay may be
unacceptable to some signal handlers (one presumes). For the number one
^C usecase, yes that may be reduced to only bother if it's killable, but
I wonder if there are not timing loops (e.g. sigitimer in Xorg < 1.19)
that want to be able to interrupt random blockages.
-Chris


I think the TTM page fault handler originally set the standard for this. 
First, IMO any critical section that waits for the GPU (like typically 
the page fault handler does), should be locked at least killable. The 
need for interruptible locks came from the X server's silken mouse 
relying on signals for smooth mouse operations: You didn't want the X 
server to be stuck in the kernel waiting for GPU completion when it 
should handle the cursor move request.. Now that doesn't seem to be the 
case anymore but to reiterate Chris' question, why would the signal 
persist once returned to user-space?


/Thomas








___
dri-devel mailing list
dri-de...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/vmwgfx: Return boolean instead of integer in vmw_fence_obj_signaled

2018-01-19 Thread Thomas Hellstrom


On 01/19/2018 01:02 AM, Gustavo A. R. Silva wrote:

Return statements in functions returning bool should use
true/false instead of 1/0.

This issue was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva 

Reviewed-by: Thomas Hellström 

I'll queue this for 4.17.

Thanks,
Thomas





---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75c..35fd6f9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -496,7 +496,7 @@ bool vmw_fence_obj_signaled(struct vmw_fence_obj *fence)
struct vmw_fence_manager *fman = fman_from_fence(fence);
  
  	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))

-   return 1;
+   return true;
  
  	vmw_fences_update(fman);

Re: linux-next: Signed-off-by missing for commit in the drm-misc-fixes tree

2018-01-18 Thread Thomas Hellstrom


Hi, Stephen,

That would be me.
I've historically only added my signed-off-by:s on commits I've (co-) 
authored myself, as the committer info is already registered,


But I take it that's not the correct approach?
/Thomas



On 01/18/2018 09:23 PM, Stephen Rothwell wrote:

Hi all,

Commit

   8a510a5c7526 ("drm/vmwgfx: fix memory corruption with legacy/sou connectors")

is missing a Signed-off-by from its committer.

Re: [PATCH] drm/vmwgfx: fix memory corruption with legacy/sou connectors

2018-01-17 Thread Thomas Hellstrom


On 01/17/2018 04:16 PM, Rob Clark wrote:

From: Rob Clark 

It looks like in all cases 'struct vmw_connector_state' is used.  But
only in stdu connectors, was atomic_{duplicate,destroy}_state() properly
subclassed.  Leading to writes beyond the end of the allocated connector
state block and all sorts of fun memory corruption related crashes.

Fixes: d7721ca71126 "drm/vmwgfx: Connector atomic state"
Cc: 
Signed-off-by: Rob Clark 

Reviewed-by: Thomas Hellstrom 

Thanks, Rob!

/Thomas



---
  drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c  | 4 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c | 4 ++--
  2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
index b8a09807c5de..3824595fece1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
@@ -266,8 +266,8 @@ static const struct drm_connector_funcs 
vmw_legacy_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_ldu_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
index bc5f6026573d..63a4cd794b73 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
@@ -420,8 +420,8 @@ static const struct drm_connector_funcs 
vmw_sou_connector_funcs = {
.set_property = vmw_du_connector_set_property,
.destroy = vmw_sou_connector_destroy,
.reset = vmw_du_connector_reset,
-   .atomic_duplicate_state = drm_atomic_helper_connector_duplicate_state,
-   .atomic_destroy_state = drm_atomic_helper_connector_destroy_state,
+   .atomic_duplicate_state = vmw_du_connector_duplicate_state,
+   .atomic_destroy_state = vmw_du_connector_destroy_state,
.atomic_set_property = vmw_du_connector_atomic_set_property,
.atomic_get_property = vmw_du_connector_atomic_get_property,
  };

Re: drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-17 Thread Thomas Hellstrom


On 01/17/2018 07:33 AM, Thomas Hellstrom wrote:

Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel 
on this patch?


Thanks,
Thomas



Actually, checkpatch.pl complains about -ENOSYS, so I'm sending a pull 
request for v1 of this patch instead, to make everybody happy. I've 
rephrased the commit log somewhat. Hope that's OK.


/Thomas

drm: fix vmwgfx boot warning WAS Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom


Hi, Woody,

On 01/16/2018 10:39 PM, Woody Suwalski wrote:

Thomas, the same way my DRM patch has disappeared:
Date

Tue, 19 Dec 2017 11:50:57 -0800
From Sinclair Yeh <>
Subject Re: [PATCH v.2] 4.15 vmgfx boot warning




This looks okay to me.


Since this is a core DRM patch I think Sinclair was expecting it to be 
pulled in using the drm-misc tree. Did you get a comment from Daniel on 
this patch?


Thanks,
Thomas

Re: [PATCH] [RESEND] vmwgfx: use monotonic event timestamps

2018-01-16 Thread Thomas Hellstrom


Hi, Arnd,

Sinclair's on paternal leave and I thought this patch was already in 
drm-next. My bad.

Dave, is it too late to pull this in for the next merge window?

/Thomas


On 01/16/2018 06:18 PM, Arnd Bergmann wrote:

DRM_VMW_EVENT_FENCE_SIGNALED (struct drm_vmw_event_fence) and
DRM_EVENT_VBLANK (struct drm_event_vblank) pass timestamps in 32-bit
seconds/microseconds format.

As of commit c61eef726a78 ("drm: add support for monotonic vblank
timestamps"), other DRM drivers use monotonic times for drm_event_vblank,
but vmwgfx still uses CLOCK_REALTIME for both events, which suffers from
the y2038/y2106 overflow as well as time jumps.

For consistency, this changes vmwgfx to use ktime_get_ts64 as well,
which solves those problems and avoids the deprecated do_gettimeofday()
function.

This should be transparent to to user space, as long as it doesn't
compare the time against the result of gettimeofday().

Link: 
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10076599_&d=DwIBAg&c=uilaK90D4TOVoH58JNXRgQ&r=wnSlgOCqfpNS4d02vP68_E9q2BNMCwfD2OZ_6dCFVQQ&m=8M6vawBo0zDsjbqhzV0xpOwAzX7Zm-uyGlIDnQ7-Gms&s=sHGuz0aoE9aLjVp5GALo8mYrN1bwOHW6mGpJIZmhwAc&e=
Signed-off-by: Arnd Bergmann 
---
Originally sent on Nov 27. Sinclair Yeh said he'd pick it up
for the next pull request, but it's not in linux-next yet.

Resending the unchanged patch, please pick it up when you have time,
or feel free to ignore this email in case it's already in some tree
that just isn't part of linux-next but will be sent during the
next merge window.
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index 6c5c75cf5e6c..9ed544f8958f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -901,11 +901,12 @@ static void vmw_event_fence_action_seq_passed(struct 
vmw_fence_action *action)
spin_lock_irq(&dev->event_lock);
  
  	if (likely(eaction->tv_sec != NULL)) {

-   struct timeval tv;
+   struct timespec64 ts;
  
-		do_gettimeofday(&tv);

-   *eaction->tv_sec = tv.tv_sec;
-   *eaction->tv_usec = tv.tv_usec;
+   ktime_get_ts64(&ts);
+   /* monotonic time, so no y2038 overflow */
+   *eaction->tv_sec = ts.tv_sec;
+   *eaction->tv_usec = ts.tv_nsec / NSEC_PER_USEC;
}
  
  	drm_send_event_locked(dev, eaction->event);

Re: [PATCH] drm/vmwgfx: remove DRM_ERROR message, stops log spamming

2017-09-12 Thread Thomas Hellstrom


Hi, Colin,

On 09/12/2017 07:35 PM, Colin King wrote:

From: Colin Ian King 

mmap'ing the device multiple times will spam the kernel log with the
DRM_ERROR message about illegal mmap'ing the old fifo space.
How are you hitting this? Multiple mappings should be fine as long as 
mapping offsets are correct,
so hitting this message should indicate that the user-space app is doing 
something seriously wrong, and

having it present in the log should probably help more than it hurts.

/Thomas




Since
the mmap'ing will fail with an -EINVAL there is no need to emit this
message, so just remove it.

Signed-off-by: Colin Ian King 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index e771091d2cd3..1e633c602fb1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -33,10 +33,8 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma)
struct drm_file *file_priv;
struct vmw_private *dev_priv;
  
-	if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET)) {

-   DRM_ERROR("Illegal attempt to mmap old fifo space.\n");
+   if (unlikely(vma->vm_pgoff < VMWGFX_FILE_PAGE_OFFSET))
return -EINVAL;
-   }
  
  	file_priv = filp->private_data;

dev_priv = vmw_priv(file_priv->minor->dev);

Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-30 Thread Thomas Hellstrom


On 08/30/2017 10:30 AM, Daniel Vetter wrote:

On Wed, Aug 30, 2017 at 08:21:46AM +0200, Thomas Hellstrom wrote:

On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
   }
-static struct dma_fence_ops vmw_fence_ops = {
+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,

Reviewed-by: Thomas Hellstrom 

Does this mean you'll merge it, or does this mean you'll expect someone
else to merge this?

I'm always confused when maintainers reply with an r-b/ack for a patch
only touching their driver, and no further information at all.
-Daniel


For patches only touching our driver, I'd say we're always responsible 
for sorting out how it's going to be merged.


Since Sinclair is maintaining the vmwgfx trees I thought I'd give him a 
chance to comment on how he wanted it merged.


/Thomas

Re: [PATCH] drm: vmwgfx: constify vmw_fence_ops

2017-08-29 Thread Thomas Hellstrom


On 08/30/2017 07:47 AM, Arvind Yadav wrote:

vmw_fence_ops are not supposed to change at runtime. Functions
"dma_fence_init" working with const vmw_fence_ops provided
by . So mark the non-const structs as const.

Signed-off-by: Arvind Yadav 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
index b8bc5bc..abc5f03 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
@@ -225,7 +225,7 @@ static long vmw_fence_wait(struct dma_fence *f, bool intr, 
signed long timeout)
return ret;
  }
  
-static struct dma_fence_ops vmw_fence_ops = {

+static const struct dma_fence_ops vmw_fence_ops = {
.get_driver_name = vmw_fence_get_driver_name,
.get_timeline_name = vmw_fence_get_timeline_name,
.enable_signaling = vmw_fence_enable_signaling,


Reviewed-by: Thomas Hellstrom

Re: [PATCH 28/29] drm/vmwgfx: switch to drm_*{get,put} helpers

2017-08-03 Thread Thomas Hellstrom


On 08/03/2017 01:58 PM, Cihangir Akturk wrote:

drm_*_reference() and drm_*_unreference() functions are just
compatibility alias for drm_*_get() and drm_*_put() adn should not be


s/adn/and/

used by new code. So convert all users of compatibility functions to use
the new APIs.

Signed-off-by: Cihangir Akturk 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_fb.c| 2 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c | 4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
index 6f4cb46..d43dce9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fb.c
@@ -500,7 +500,7 @@ static int vmw_fb_kms_detach(struct vmw_fb_par *par,
}
  
  	if (cur_fb) {

-   drm_framebuffer_unreference(cur_fb);
+   drm_framebuffer_put(cur_fb);
par->set_fb = NULL;
}
  
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c

index 5ec24fd..fd4a988 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c
@@ -316,7 +316,7 @@ int vmw_present_ioctl(struct drm_device *dev, void *data,
  out_no_surface:
ttm_read_unlock(&dev_priv->reservation_sem);
  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:
@@ -393,7 +393,7 @@ int vmw_present_readback_ioctl(struct drm_device *dev, void 
*data,
  
  	ttm_read_unlock(&dev_priv->reservation_sem);

  out_no_ttm_lock:
-   drm_framebuffer_unreference(fb);
+   drm_framebuffer_put(fb);
  out_no_fb:
drm_modeset_unlock_all(dev);
  out_no_copy:


Apart from the above,

Reviewed-by: Thomas Hellstrom 

(Assuming this gets pulled together with the whole series, not vmwgfx-next)

Thanks,

Thomas

Re: [PATCH 1/4] mm/vmalloc: allow to call vfree() in atomic context

2017-04-04 Thread Thomas Hellstrom

On 04/04/2017 11:41 AM, Michal Hocko wrote:
> On Thu 30-03-17 17:48:39, Andrey Ryabinin wrote:
>> From: Andrey Ryabinin 
>> Subject: mm/vmalloc: allow to call vfree() in atomic context fix
>>
>> Don't spawn worker if we already purging.
>>
>> Signed-off-by: Andrey Ryabinin 
> I would rather put this into a separate patch. Ideally with some numners
> as this is an optimization...

Actually, this just mimics what the code was doing before, as it only
invoked
__purge_vmap_area_lazy() if the trylock succeeded.

/Thomas




>
>> ---
>>  mm/vmalloc.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index ea1b4ab..88168b8 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -737,7 +737,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
>>  /* After this point, we may free va at any time */
>>  llist_add(&va->purge_list, &vmap_purge_list);
>>  
>> -if (unlikely(nr_lazy > lazy_max_pages()))
>> +if (unlikely(nr_lazy > lazy_max_pages()) &&
>> +!mutex_is_locked(&vmap_purge_lock))
>>  schedule_work(&purge_vmap_work);
>>  }
>>  
>> -- 
>> 2.10.2
>>

Re: [PATCH 1/4] mm/vmalloc: allow to call vfree() in atomic context

2017-03-30 Thread Thomas Hellstrom

On 03/30/2017 04:48 PM, Andrey Ryabinin wrote:
> On 03/30/2017 03:00 PM, Thomas Hellstrom wrote:
>
>>>  
>>> if (unlikely(nr_lazy > lazy_max_pages()))
>>> -   try_purge_vmap_area_lazy();
>> Perhaps a slight optimization would be to schedule work iff
>> !mutex_locked(&vmap_purge_lock) below?
>>
> Makes sense, we don't need to spawn workers if we already purging.
>
>
>
> From: Andrey Ryabinin 
> Subject: mm/vmalloc: allow to call vfree() in atomic context fix
>
> Don't spawn worker if we already purging.
>
> Signed-off-by: Andrey Ryabinin 
> ---
>  mm/vmalloc.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ea1b4ab..88168b8 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -737,7 +737,8 @@ static void free_vmap_area_noflush(struct vmap_area *va)
>   /* After this point, we may free va at any time */
>   llist_add(&va->purge_list, &vmap_purge_list);
>  
> - if (unlikely(nr_lazy > lazy_max_pages()))
> + if (unlikely(nr_lazy > lazy_max_pages()) &&
> + !mutex_is_locked(&vmap_purge_lock))
>   schedule_work(&purge_vmap_work);
>  }
>  

For both patches,

Reviewed-by: Thomas Hellstrom 

Thanks,

Thomas

Re: [PATCH 1/4] mm/vmalloc: allow to call vfree() in atomic context

2017-03-30 Thread Thomas Hellstrom

On 03/30/2017 12:27 PM, Andrey Ryabinin wrote:
> Commit 5803ed292e63 ("mm: mark all calls into the vmalloc subsystem
> as potentially sleeping") added might_sleep() to remove_vm_area() from
> vfree(), and commit 763b218ddfaf ("mm: add preempt points into
> __purge_vmap_area_lazy()") actually made vfree() potentially sleeping.
>
> This broke vmwgfx driver which calls vfree() under spin_lock().
>
> BUG: sleeping function called from invalid context at mm/vmalloc.c:1480
> in_atomic(): 1, irqs_disabled(): 0, pid: 341, name: plymouthd
> 2 locks held by plymouthd/341:
>  #0:  (drm_global_mutex){+.+.+.}, at: [] 
> drm_release+0x3b/0x3b0 [drm]
>  #1:  (&(&tfile->lock)->rlock){+.+...}, at: [] 
> ttm_object_file_release+0x28/0x90 [ttm]
>
> Call Trace:
>  dump_stack+0x86/0xc3
>  ___might_sleep+0x17d/0x250
>  __might_sleep+0x4a/0x80
>  remove_vm_area+0x22/0x90
>  __vunmap+0x2e/0x110
>  vfree+0x42/0x90
>  kvfree+0x2c/0x40
>  drm_ht_remove+0x1a/0x30 [drm]
>  ttm_object_file_release+0x50/0x90 [ttm]
>  vmw_postclose+0x47/0x60 [vmwgfx]
>  drm_release+0x290/0x3b0 [drm]
>  __fput+0xf8/0x210
>  fput+0xe/0x10
>  task_work_run+0x85/0xc0
>  exit_to_usermode_loop+0xb4/0xc0
>  do_syscall_64+0x185/0x1f0
>  entry_SYSCALL64_slow_path+0x25/0x25
>
> This can be fixed in vmgfx, but it would be better to make vfree()
> non-sleeping again because we may have other bugs like this one.
>
> __purge_vmap_area_lazy() is the only function in the vfree() path that
> wants to be able to sleep. So it make sense to schedule
> __purge_vmap_area_lazy() via schedule_work() so it runs only in sleepable
> context. This will have a minimal effect on the regular vfree() path.
> since __purge_vmap_area_lazy() is rarely called.
>
> Fixes: 5803ed292e63 ("mm: mark all calls into the vmalloc subsystem as
>   potentially sleeping")
> Reported-by: Tetsuo Handa 
> Signed-off-by: Andrey Ryabinin 
> Cc: 
>
> Signed-off-by: Andrey Ryabinin 
> ---
>  mm/vmalloc.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 68eb002..ea1b4ab 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -701,7 +701,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, 
> unsigned long end)
>   * Kick off a purge of the outstanding lazy areas. Don't bother if somebody
>   * is already purging.
>   */
> -static void try_purge_vmap_area_lazy(void)
> +static void try_purge_vmap_area_lazy(struct work_struct *work)
>  {
>   if (mutex_trylock(&vmap_purge_lock)) {
>   __purge_vmap_area_lazy(ULONG_MAX, 0);
> @@ -720,6 +720,8 @@ static void purge_vmap_area_lazy(void)
>   mutex_unlock(&vmap_purge_lock);
>  }
>  
> +static DECLARE_WORK(purge_vmap_work, try_purge_vmap_area_lazy);
> +
>  /*
>   * Free a vmap area, caller ensuring that the area has been unmapped
>   * and flush_cache_vunmap had been called for the correct range
> @@ -736,7 +738,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
>   llist_add(&va->purge_list, &vmap_purge_list);
>  
>   if (unlikely(nr_lazy > lazy_max_pages()))
> - try_purge_vmap_area_lazy();

Perhaps a slight optimization would be to schedule work iff
!mutex_locked(&vmap_purge_lock) below?

/Thomas


> + schedule_work(&purge_vmap_work);
>  }
>  
>  /*
> @@ -1125,7 +1127,6 @@ void vm_unmap_ram(const void *mem, unsigned int count)
>   unsigned long addr = (unsigned long)mem;
>   struct vmap_area *va;
>  
> - might_sleep();
>   BUG_ON(!addr);
>   BUG_ON(addr < VMALLOC_START);
>   BUG_ON(addr > VMALLOC_END);
> @@ -1477,8 +1478,6 @@ struct vm_struct *remove_vm_area(const void *addr)
>  {
>   struct vmap_area *va;
>  
> - might_sleep();
> -
>   va = find_vmap_area((unsigned long)addr);
>   if (va && va->flags & VM_VM_AREA) {
>   struct vm_struct *vm = va->vm;

[PATCH] Revert "kref: double kref_put() in my_data_handler()"

2017-03-05 Thread Thomas Hellstrom

This reverts commit 8f1ecc9fbc5b223e4f5d5bb8bcd6f5672c4bc4b6.

The correction is incorrect, see discussion at

http://stackoverflow.com/questions/20093127/why-kref-doc-of-linux-kernel-omits-kref-put-when-kthread-run-fail

Reported-by: KrishnamRaju raju 
Cc: Roel Kluin 
Cc: Randy Dunlap 
Cc: KrishnamRaju raju 
Signed-off-by: Thomas Hellstrom 
---
 Documentation/kref.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/kref.txt b/Documentation/kref.txt
index ddf85a5..d26a27c 100644
--- a/Documentation/kref.txt
+++ b/Documentation/kref.txt
@@ -84,6 +84,7 @@ int my_data_handler(void)
task = kthread_run(more_data_handling, data, "more_data_handling");
if (task == ERR_PTR(-ENOMEM)) {
rv = -ENOMEM;
+   kref_put(&data->refcount, data_release);
goto out;
}
 
-- 
2.4.11

Re: [PATCH] [RFC] drm: Nerf DRM_CONTROL nodes

2017-02-21 Thread Thomas Hellstrom

On 02/20/2017 11:22 PM, Daniel Vetter wrote:
> On Sun, Feb 19, 2017 at 4:21 PM, Thomas Hellstrom  wrote:
>> So I think we need a quick revert of this commit or a quick stable
>> follow-up to unbreak things on our side.
> I'd much prefer we just register control nodes for vmwgfx only, with a
> commit message explaining in detail what exactly your control tool is
> using (i.e. which ioctl), plus links to the source code for future
> references. Also not sold on the immediate revert, this stuff has been
> merged since months.
> -Daniel
>

https://cgit.freedesktop.org/~thomash/open-vm-tools/commit/?h=feature/thellstrom/resolutionKMS&id=9bf65a22d5a06d3a706bc14578619a56e06f8a24

/Thomas

Re: DRM_CONTROL node breakage (Re: [PATCH] [RFC] drm: Nerf DRM_CONTROL nodes)

2017-02-21 Thread Thomas Hellstrom

On 02/21/2017 06:34 AM, David Airlie wrote:
>> No.
>>
>> IMO Not fixing this immediately through stable is out of the question.
>> The deal is that we don't break userspace.
>> Having said that, I'm not against a long term vmwgfx-only solution. But
>> let's fix this now.
>>
>> Admittedly we missed testing this but you got to understand that not all
>> developer teams have a multitude of
>> developers (we have on average one for the whole linux graphics driver
>> stack except GL), and the bug
>> doesn't show up for QE on regression testing unless they run
>> gnome-sheel/Wayland which they currently don't, and I guess they've been
>> focused on the fb2 regression.
>>
>> It's no secret that we've been using the control nodes for some time.
>> The CONTROL_ALLOW is present in the
>> driver private ioctls and the commit has been there since 2016.
>>
>> The user-space code has been present in vmware-tools also since that
>> commit and due to the long release cycles of
>> open-vm-tools the open-vm-tools version was just about to be released.
>> It's necessary for non-xorg
> can you send a revert against drm-next? I'm not sure how clean it will be.
>
> there might be an intermediate step.
>
> Then can we port vmtools of this behaviour, not even sure what it is doing.
>
> Dave.

So after a quick investigation of the impact it looks like the daemon
patch was pulled out of the Fedora open-vm-tools update in time. This
limits the impact to within VMware where we can update the daemon code
and rerun the test cycle. I've posted a patch that makes it possible for
us to use render-nodes instead of control nodes.

/Thomas

DRM_CONTROL node breakage (Re: [PATCH] [RFC] drm: Nerf DRM_CONTROL nodes)

2017-02-20 Thread Thomas Hellstrom

No.

IMO Not fixing this immediately through stable is out of the question.
The deal is that we don't break userspace.
Having said that, I'm not against a long term vmwgfx-only solution. But
let's fix this now.

Admittedly we missed testing this but you got to understand that not all
developer teams have a multitude of
developers (we have on average one for the whole linux graphics driver
stack except GL), and the bug
doesn't show up for QE on regression testing unless they run
gnome-sheel/Wayland which they currently don't, and I guess they've been
focused on the fb2 regression.

It's no secret that we've been using the control nodes for some time.
The CONTROL_ALLOW is present in the
driver private ioctls and the commit has been there since 2016.

The user-space code has been present in vmware-tools also since that
commit and due to the long release cycles of
open-vm-tools the open-vm-tools version was just about to be released.
It's necessary for non-xorg

/Thomas


On 02/20/2017 11:22 PM, Daniel Vetter wrote:
> On Sun, Feb 19, 2017 at 4:21 PM, Thomas Hellstrom  wrote:
>> So I think we need a quick revert of this commit or a quick stable
>> follow-up to unbreak things on our side.
> I'd much prefer we just register control nodes for vmwgfx only, with a
> commit message explaining in detail what exactly your control tool is
> using (i.e. which ioctl), plus links to the source code for future
> references. Also not sold on the immediate revert, this stuff has been
> merged since months.
> -Daniel
>
>> /Thomas
>>
>>
>> On 02/19/2017 03:54 PM, Thomas Hellstrom wrote:
>>> Hi!
>>>
>>> This patch breaks the vmwgfx resolutionKMS daemon which opens a control
>>> node to tell DRM about the monitor layout...
>>>
>>> /Thomas
>>>
>>>
>>> On 10/28/2016 10:10 AM, Daniel Vetter wrote:
>>>> Looking at the ioctl permission checks I noticed that it's impossible
>>>> to import gem buffers into a control nodes, and fd2handle/handle2fd
>>>> also don't work, so no joy with dma-bufs.
>>>>
>>>> The only way to do anything with a control node is by drawing stuff
>>>> into a dumb buffer and displaying that. I suspect control nodes are an
>>>> entirely unused thing, and a cursory check shows that there does not
>>>> seem to be any callers of drmOpenControl nor of the other drmOpen
>>>> functions using DRM_MODE_CONTROL.
>>>>
>>>> Since I don't like dead uabi, let's remove it. But since this would be
>>>> a really big change I think it's better to start out small by simply
>>>> not registering anything. We can garbage-collect the dead code later
>>>> on, once we're sure it's really not used anywhere.
>>>>
>>>> Signed-off-by: Daniel Vetter 
>>>> ---
>>>>  drivers/gpu/drm/drm_drv.c | 6 --
>>>>  1 file changed, 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>>>> index 6efdba4993fc..f085e28ffc6f 100644
>>>> --- a/drivers/gpu/drm/drm_drv.c
>>>> +++ b/drivers/gpu/drm/drm_drv.c
>>>> @@ -517,12 +517,6 @@ int drm_dev_init(struct drm_device *dev,
>>>>  goto err_free;
>>>>  }
>>>>
>>>> -if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>>>> -ret = drm_minor_alloc(dev, DRM_MINOR_CONTROL);
>>>> -if (ret)
>>>> -goto err_minors;
>>>> -}
>>>> -
>>>>  if (drm_core_check_feature(dev, DRIVER_RENDER)) {
>>>>  ret = drm_minor_alloc(dev, DRM_MINOR_RENDER);
>>>>  if (ret)
>>> ___
>>> dri-devel mailing list
>>> dri-de...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
>

Re: [PATCH] [RFC] drm: Nerf DRM_CONTROL nodes

2017-02-19 Thread Thomas Hellstrom

So I think we need a quick revert of this commit or a quick stable
follow-up to unbreak things on our side.

/Thomas


On 02/19/2017 03:54 PM, Thomas Hellstrom wrote:
> Hi!
>
> This patch breaks the vmwgfx resolutionKMS daemon which opens a control
> node to tell DRM about the monitor layout...
>
> /Thomas
>
>
> On 10/28/2016 10:10 AM, Daniel Vetter wrote:
>> Looking at the ioctl permission checks I noticed that it's impossible
>> to import gem buffers into a control nodes, and fd2handle/handle2fd
>> also don't work, so no joy with dma-bufs.
>>
>> The only way to do anything with a control node is by drawing stuff
>> into a dumb buffer and displaying that. I suspect control nodes are an
>> entirely unused thing, and a cursory check shows that there does not
>> seem to be any callers of drmOpenControl nor of the other drmOpen
>> functions using DRM_MODE_CONTROL.
>>
>> Since I don't like dead uabi, let's remove it. But since this would be
>> a really big change I think it's better to start out small by simply
>> not registering anything. We can garbage-collect the dead code later
>> on, once we're sure it's really not used anywhere.
>>
>> Signed-off-by: Daniel Vetter 
>> ---
>>  drivers/gpu/drm/drm_drv.c | 6 --
>>  1 file changed, 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
>> index 6efdba4993fc..f085e28ffc6f 100644
>> --- a/drivers/gpu/drm/drm_drv.c
>> +++ b/drivers/gpu/drm/drm_drv.c
>> @@ -517,12 +517,6 @@ int drm_dev_init(struct drm_device *dev,
>>  goto err_free;
>>  }
>>  
>> -if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>> -ret = drm_minor_alloc(dev, DRM_MINOR_CONTROL);
>> -if (ret)
>> -goto err_minors;
>> -}
>> -
>>  if (drm_core_check_feature(dev, DRIVER_RENDER)) {
>>  ret = drm_minor_alloc(dev, DRM_MINOR_RENDER);
>>  if (ret)
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/ttm: Make sure BOs being swapped out are cacheable

2017-01-26 Thread Thomas Hellstrom

On 01/27/2017 03:29 AM, Michel Dänzer wrote:
> On 26/01/17 09:46 AM, Sinclair Yeh wrote:
>> On Wed, Jan 25, 2017 at 10:49:33AM +0100, Christian König wrote:
>>> Am 25.01.2017 um 10:25 schrieb Thomas Hellstrom:
>>>> On 01/25/2017 09:21 AM, Michel Dänzer wrote:
>>>>> From: Michel Dänzer 
>>>>>
>>>>> The current caching state may not be tt_cached, even though the
>>>>> placement contains TTM_PL_FLAG_CACHED, because placement can contain
>>>>> multiple caching flags. Trying to swap out such a BO would trip up the
>>>>>
>>>>>   BUG_ON(ttm->caching_state != tt_cached);
>>>>>
>>>>> in ttm_tt_swapout.
>>>>>
>>>>> Cc: sta...@vger.kernel.org
>>>>> Signed-off-by: Michel Dänzer 
>>>> Reviewed-by: Thomas Hellstrom 
>>> Reviewed-by: Christian König .
>> Reviewed-by: Sinclair Yeh 
> Thanks for the reviews! Via which tree should we merge this?
>
>
I don't maintain a TTM tree any longer. Let's check with Daniel if he
can merge it through drm-misc.

/Thomas

Re: [PATCH] drm/ttm: Make sure BOs being swapped out are cacheable

2017-01-25 Thread Thomas Hellstrom

On 01/25/2017 09:21 AM, Michel Dänzer wrote:
> From: Michel Dänzer 
>
> The current caching state may not be tt_cached, even though the
> placement contains TTM_PL_FLAG_CACHED, because placement can contain
> multiple caching flags. Trying to swap out such a BO would trip up the
>
>   BUG_ON(ttm->caching_state != tt_cached);
>
> in ttm_tt_swapout.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Michel Dänzer 
Reviewed-by: Thomas Hellstrom 

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d5063618efa7..86e3b233b722 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1670,7 +1670,6 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink)
>   struct ttm_buffer_object *bo;
>   int ret = -EBUSY;
>   int put_count;
> - uint32_t swap_placement = (TTM_PL_FLAG_CACHED | TTM_PL_FLAG_SYSTEM);
>  
>   spin_lock(&glob->lru_lock);
>   list_for_each_entry(bo, &glob->swap_lru, swap) {
> @@ -1701,7 +1700,8 @@ static int ttm_bo_swapout(struct ttm_mem_shrink *shrink)
>* Move to system cached
>*/
>  
> - if ((bo->mem.placement & swap_placement) != swap_placement) {
> + if (bo->mem.mem_type != TTM_PL_SYSTEM ||
> + bo->ttm->caching_state != tt_cached) {
>   struct ttm_mem_reg evict_mem;
>  
>   evict_mem = bo->mem;

Re: [PATCH] drm: move allocation out of drm_get_format_name()

2016-11-05 Thread Thomas Hellstrom

For the vmwgfx part:

Acked-by: Thomas Hellstrom 


On 11/05/2016 08:33 AM, Eric Engestrom wrote:
> Fixes: 90844f00049e9f42573fd31d7c32e8fd31d3fd07
>
> drm: make drm_get_format_name thread-safe
>
> Signed-off-by: Eric Engestrom 
> [danvet: Clarify that the returned pointer must be freed with
> kfree().]
> Signed-off-by: Daniel Vetter 
>
> Suggested-by: Ville Syrjälä 
> Signed-off-by: Eric Engestrom 
> ---
>  drivers/gpu/drm/amd/amdgpu/dce_v10_0.c  |  7 ++---
>  drivers/gpu/drm/amd/amdgpu/dce_v11_0.c  |  7 ++---
>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c   |  3 +-
>  drivers/gpu/drm/amd/amdgpu/dce_v8_0.c   |  7 ++---
>  drivers/gpu/drm/drm_atomic.c|  7 +++--
>  drivers/gpu/drm/drm_crtc.c  |  7 +++--
>  drivers/gpu/drm/drm_fourcc.c| 12 +++-
>  drivers/gpu/drm/drm_framebuffer.c   |  7 +++--
>  drivers/gpu/drm/drm_modeset_helper.c|  7 +++--
>  drivers/gpu/drm/drm_plane.c |  7 +++--
>  drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c |  7 ++---
>  drivers/gpu/drm/i915/i915_debugfs.c |  8 ++---
>  drivers/gpu/drm/i915/intel_atomic_plane.c   |  8 ++---
>  drivers/gpu/drm/i915/intel_display.c| 41 
> ++---
>  drivers/gpu/drm/radeon/atombios_crtc.c  | 14 -
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c |  3 +-
>  include/drm/drm_fourcc.h|  3 +-
>  17 files changed, 71 insertions(+), 84 deletions(-)
>
> diff --git a/include/drm/drm_fourcc.h b/include/drm/drm_fourcc.h
> index dc0aafa..5a8cb4b 100644
> --- a/include/drm/drm_fourcc.h
> +++ b/include/drm/drm_fourcc.h
> @@ -54,6 +54,7 @@ int drm_format_horz_chroma_subsampling(uint32_t format);
>  int drm_format_vert_chroma_subsampling(uint32_t format);
>  int drm_format_plane_width(int width, uint32_t format, int plane);
>  int drm_format_plane_height(int height, uint32_t format, int plane);
> -char *drm_get_format_name(uint32_t format) __malloc;
> +typedef char drm_format_name_buf[32];
> +char *drm_get_format_name(uint32_t format, drm_format_name_buf buf);
>  
>  #endif /* __DRM_FOURCC_H__ */
> diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
> index cbb8b77..34ed520 100644
> --- a/drivers/gpu/drm/drm_fourcc.c
> +++ b/drivers/gpu/drm/drm_fourcc.c
> @@ -79,17 +79,13 @@ uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t 
> depth)
>  EXPORT_SYMBOL(drm_mode_legacy_fb_format);
>  
>  /**
> - * drm_get_format_name - return a string for drm fourcc format
> + * drm_get_format_name - fill a string with a drm fourcc format's name
>   * @format: format to compute name of
> + * @buf: caller-supplied buffer
> - *
> - * Note that the buffer returned by this function is owned by the caller
> - * and will need to be freed using kfree().
>   */
> -char *drm_get_format_name(uint32_t format)
> +char *drm_get_format_name(uint32_t format, drm_format_name_buf buf)
>  {
> - char *buf = kmalloc(32, GFP_KERNEL);
> -
> - snprintf(buf, 32,
> + snprintf(buf, sizeof(drm_format_name_buf),
>"%c%c%c%c %s-endian (0x%08x)",
>printable_char(format & 0xff),
>printable_char((format >> 8) & 0xff),
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> index 199d3f7..cefa3d8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
> @@ -2032,7 +2032,7 @@ static int dce_v10_0_crtc_do_set_base(struct drm_crtc 
> *crtc,
>   u32 tmp, viewport_w, viewport_h;
>   int r;
>   bool bypass_lut = false;
> - char *format_name;
> + drm_format_name_buf format_name;
>  
>   /* no fb bound */
>   if (!atomic && !crtc->primary->fb) {
> @@ -2144,9 +2144,8 @@ static int dce_v10_0_crtc_do_set_base(struct drm_crtc 
> *crtc,
>   bypass_lut = true;
>   break;
>   default:
> - format_name = drm_get_format_name(target_fb->pixel_format);
> - DRM_ERROR("Unsupported screen format %s\n", format_name);
> - kfree(format_name);
> + DRM_ERROR("Unsupported screen format %s\n",
> +   drm_get_format_name(target_fb->pixel_format, 
> format_name));
>   return -EINVAL;
>   }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> index ecd000e..462abb8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> @@ -2

Re: [PATCH 3.4 065/125] drm/ttm: Fixed a read/write lock imbalance

2016-10-12 Thread Thomas Hellstrom

Li,

IIRC This one goes hand in hand with a vmwgfx (the only user) patch.
Please don't apply until I've figured out whether that patch is also in 3.4.

Thanks,
Thomas


On 10/12/2016 02:33 PM, l...@kernel.org wrote:
> From: Thomas Hellstrom 
>
> 3.4.113-rc1 review patch.  If anyone has any objections, please let me know.
>
> --
>
>
> commit 025af189fb44250206dd8a32fa4a682392af3301 upstream.
>
> In ttm_write_lock(), the uninterruptible path should call
> __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
> on F23 start up.
>
> syeh: Extracted this from one of Thomas' internal patches.
>
> Signed-off-by: Thomas Hellstrom 
> Reviewed-by: Sinclair Yeh 
> Signed-off-by: Zefan Li 
> ---
>  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
> index 075daf4..9934b4d 100644
> --- a/drivers/gpu/drm/ttm/ttm_lock.c
> +++ b/drivers/gpu/drm/ttm/ttm_lock.c
> @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock, bool 
> interruptible)
>   spin_unlock(&lock->lock);
>   }
>   } else
> - wait_event(lock->queue, __ttm_read_lock(lock));
> + wait_event(lock->queue, __ttm_write_lock(lock));
>  
>   return ret;
>  }

Patch for drm-next WAS Re: [PATCH] kref: prefer atomic_inc_not_zero to atomic_add_unless

2016-07-01 Thread Thomas Hellstrom

Dave,

Since kref_get_unless_zero() was brought in by drm, could we add this to
drm-next?

Thanks,
Thomas


On 06/30/2016 12:52 AM, Jason A. Donenfeld wrote:
> This was positively reviewed by maintainers but never picked up. Can
> someone queue this for 4.7 or 4.8?
>
> Thanks,
> Jason
>
> On Mon, Feb 1, 2016 at 10:53 PM, Jason A. Donenfeld  wrote:
>> This was positively reviewed but never picked up. Can someone queue
>> this for rc3?
>>
>> Thanks,
>> Jason
>>
>> On Sun, Oct 11, 2015 at 9:59 PM, Thomas Hellstrom  
>> wrote:
>>> Reviewed-by: Thomas Hellstrom 
>>>
>>>
>>> On 10/10/2015 12:56 PM, Jason A. Donenfeld wrote:
>>>> On most platforms, there exists this ifdef:
>>>>
>>>>  #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
>>>>
>>>> This makes this patch functionally useless. However, on PPC, there is
>>>> actually an explicit definition of atomic_inc_not_zero with its own
>>>> assembly that is slightly more optimized than atomic_add_unless. So,
>>>> this patch changes kref to use atomic_inc_not_zero instead, for PPC and
>>>> any future platforms that might provide an explicit implementation.
>>>>
>>>> This also puts this usage of kref more in line with a verbatim reading
>>>> of the examples in Paul McKenney's paper [1] in the section titled "2.4
>>>> Atomic Counting With Check and Release Memory Barrier", which uses
>>>> atomic_inc_not_zero.
>>>>
>>>> [1] 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__open-2Dstd.org_jtc1_sc22_wg21_docs_papers_2007_n2167.pdf&d=BQIBAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vpukPkBtpoNQp2IUKuFviOmPNYWVKmen3Jeeu55zmEA&m=z5Nd9sYiJMKiphNjyZp6XT5CbayXMBlcb903f260pDY&s=HEHX3CuXRs2GRRQWuC4Vef6iJMwdilKVRkiZgJpjEpA&e=
>>>>
>>>> Signed-off-by: Jason A. Donenfeld 
>>>> ---
>>>>  include/linux/kref.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/kref.h b/include/linux/kref.h
>>>> index 484604d..83d1f94 100644
>>>> --- a/include/linux/kref.h
>>>> +++ b/include/linux/kref.h
>>>> @@ -166,6 +166,6 @@ static inline int kref_put_mutex(struct kref *kref,
>>>>   */
>>>>  static inline int __must_check kref_get_unless_zero(struct kref *kref)
>>>>  {
>>>> - return atomic_add_unless(&kref->refcount, 1, 0);
>>>> + return atomic_inc_not_zero(&kref->refcount);
>>>>  }
>>>>  #endif /* _KREF_H_ */

Re: [Pv-drivers] [PATCH] VMW_PVSCSI: Change to update maintainer details (name, email)

2016-06-17 Thread Thomas Hellstrom

Hi!

IMO we shouldn't have maintainer info mentioned in the source files,
since the information is redundant and moreover if we change maintainer
a patch touching multiple files would just look odd, and we'd leave a
lot of stale info in older kernels.

Having the linux git *master* MAINTAINERS file as the authoritative
information source for this makes most sense. If there are different
maintainers of different parts of a single module we should be able to
sort that out internally.

Thanks,
Thomas

On 06/17/2016 05:11 AM, Julian Calaby wrote:
> Hi Joe,
>
> On Fri, Jun 17, 2016 at 1:04 PM, Joe Perches  wrote:
>> On Fri, 2016-06-17 at 12:44 +1000, Julian Calaby wrote:
>>> Hi Joe,
>> rehi Julian.
> (I always put a salutation on my emails and always finish them with
> "Thanks," =) )
>
>>> On Fri, Jun 17, 2016 at 12:33 PM, Joe Perches  wrote:
>> []
 get_maintainer.pl also has a rarely used "--file-emails" option to
 scan for what appears to be email addresses in specific files.

 $ ./scripts/get_maintainer.pl  -f --file-emails drivers/scsi/vmw_pvscsi.c
 Arvind Kumar  (maintainer:VMware PVSCSI driver,in 
 file)
 VMware PV-Drivers  (maintainer:VMware PVSCSI driver)
 "James E.J. Bottomley"  (maintainer:SCSI 
 SUBSYSTEM)
 "Martin K. Petersen"  (maintainer:SCSI 
 SUBSYSTEM)
 linux-s...@vger.kernel.org (open list:VMware PVSCSI driver)
 linux-kernel@vger.kernel.org (open list)

 note the "in file" after Arvind's name
>>> Didn't know this, however my point stands: the maintainer line in the
>>> file is redundant if we find maintainers through MAINTAINERS or the
>>> get_maintainer.pl script.
>> Yes, I'm not suggesting anything else.  Jim's name
>> should appear in the MAINTAINERS file somewhere.
>>
>> The question to me is whether or not Jim Gill is
>> taking over the maintainership of the entire
>> VMware PVSCSI driver or just a few files of it.
> As I see it, he's taking over maintainership of all of it: it's only
> files are drivers/scsi/vmw_pvscsi.[ch] AFAIK.
>
> Thanks,
>

Re: [PATCH] drm/vmwgfx: use *_32_bits() macros

2016-06-15 Thread Thomas Hellstrom

On 06/15/2016 01:11 PM, Daniel Vetter wrote:
> On Wed, Jun 15, 2016 at 10:37:24AM +0200, Paul Bolle wrote:
>> [Added Sinclair, Thomas, and "VMware Graphics".]
>>
>> On do, 2016-04-14 at 07:34 -0700, Joe Perches wrote:
>>> On Thu, 2016-04-14 at 13:32 +0200, Paul Bolle wrote:
>>>> On do, 2016-03-03 at 11:26 +0100, Paul Bolle wrote:
>>>>> Use the upper_32_bits() macro instead of the four line equivalent that
>>>>> triggers a GCC warning on 32 bits x86:
>>>>> drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c: In function
>>>>> 'vmw_cmdbuf_header_submit':
>>>>> drivers/gpu/drm/vmwgfx/vmwgfx_cmdbuf.c:297:25: warning: right
>>>>> shift count >= width of type [-Wshift-count-overflow]
>>>>>val = (header->handle >> 32);
>>>>>  ^
>>>>>
>>>>> And use the lower_32_bits() macro instead of and-ing with a 32 bits
>>>>> mask.
>>>>>
>>>>> Signed-off-by: Paul Bolle 
>>>>> ---
>>>>> Note: compile tested only (I don't use any of vmware's products).
>>>> The warning can still be seen on v4.6-rc3 for 32 bits x86. This patch
>>>> applies cleanly to that rc.
>>>>
>>>> Has anyone had a chance to look at this patch, and perhaps even test
>>>> it?
>>> Test?  Nope.  Seems obviously correct.
>> This warning still shows up when building v4.7-rc3 for 32 bits x86.
>>
>> Since my previous message an entry for this driver showed up in
>> MAINTAINERS. So I'd guess Sinclair, Thomas, etc want me to resend this
>> small patch. Is that correct?
> Sounds more like maintainers asleep at the helm ;-)
>
> I've applied this to drm-misc, Sinclair can apply polish later on if he
> wants to.
>
> Thanks, Daniel

Thanks for applying this.
FWIW,
Reviewed-by: Thomas Hellstrom 

/Thomas

Re: [Pv-drivers] [PATCH net v2] vmxnet3: segCnt can be 1 for LRO packets

2016-06-08 Thread Thomas Hellstrom

On 06/08/2016 08:13 AM, Shrikrishna Khare wrote:
> The device emulation may send segCnt of 1 for LRO packets.
>
> Signed-off-by: Shrikrishna Khare 
> Signed-off-by: Jin Heo 
>
> ---
> v2: fix subject line
> ---
>  drivers/net/vmxnet3/vmxnet3_drv.c | 2 +-
>  drivers/net/vmxnet3/vmxnet3_int.h | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
> b/drivers/net/vmxnet3/vmxnet3_drv.c
> index db8022a..6f399b2 100644
> --- a/drivers/net/vmxnet3/vmxnet3_drv.c
> +++ b/drivers/net/vmxnet3/vmxnet3_drv.c
> @@ -1369,7 +1369,7 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
>   rcdlro = (struct Vmxnet3_RxCompDescExt *)rcd;
>  
>   segCnt = rcdlro->segCnt;
> - BUG_ON(segCnt <= 1);
> + BUG_ON(segCnt == 0);

What will the vmxnet3 driver do otherwise if segCnt == 0? Please see below.


>   mss = rcdlro->mss;
>   if (unlikely(segCnt <= 1))
>   segCnt = 0;

Based on this code, it looks like it can handle the case without taking
down the kernel completely, so instead of a
"BUG_ON", would it make sense with a WARN_ON_ONCE() or WARN_ON()?

Thanks,
Thomas

Re: [PATCH 3.2 46/77] drm: Fix an unwanted master inheritance v2

2015-12-25 Thread Thomas Hellstrom

On 12/24/2015 04:37 PM, Ben Hutchings wrote:
> 3.2.75-rc1 review patch.  If anyone has any objections, please let me know.
>
> --
>
> From: Thomas Hellstrom 
>
> commit a0af2e538c80f3e47f1d6ddf120a153ad909e8ad upstream.
>
> A client calling drmSetMaster() using a file descriptor that was opened
> when another client was master would inherit the latter client's master
> object and all its authenticated clients.
>
> This is unwanted behaviour, and when this happens, instead allocate a
> brand new master object for the client calling drmSetMaster().
>
> Fixes a BUG() throw in vmw_master_set().
>
> Signed-off-by: Thomas Hellstrom 
> Signed-off-by: Dave Airlie 
> [bwh: Backported to 3.2:
>  - s/master_mutex/struct_mutex/
>  - drm_new_set_master() must drop struct_mutex while calling
>drm_driver::master_create
>  - Adjust filename, context, indentation]
> Signed-off-by: Ben Hutchings 
> ---
> --- a/drivers/gpu/drm/drm_stub.c
> +++ b/drivers/gpu/drm/drm_stub.c
> @@ -225,6 +225,10 @@ int drm_setmaster_ioctl(struct drm_devic
>   if (!file_priv->minor->master &&
>   file_priv->minor->master != file_priv->master) {
>   mutex_lock(&dev->struct_mutex);
> + if (!file_priv->allowed_master) {
> + ret = drm_new_set_master(dev, file_priv);
> + goto out_unlock;
> + }
>   file_priv->minor->master = drm_master_get(file_priv->master);
>   file_priv->is_master = 1;
>   if (dev->driver->master_set) {
> @@ -234,10 +238,11 @@ int drm_setmaster_ioctl(struct drm_devic
>   drm_master_put(&file_priv->minor->master);
>   }
>   }
> + out_unlock:
>   mutex_unlock(&dev->struct_mutex);
>   }
>  
> - return 0;
> + return ret;
>  }
>  
>  int drm_dropmaster_ioctl(struct drm_device *dev, void *data,
> --- a/drivers/gpu/drm/drm_fops.c
> +++ b/drivers/gpu/drm/drm_fops.c
> @@ -219,6 +219,62 @@ static int drm_cpu_valid(void)
>  }
>  
>  /**
> + * drm_new_set_master - Allocate a new master object and become master for 
> the
> + * associated master realm.
> + *
> + * @dev: The associated device.
> + * @fpriv: File private identifying the client.
> + *
> + * This function must be called with dev::struct_mutex held.
> + * Returns negative error code on failure. Zero on success.
> + */
> +int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv)
> +{
> + struct drm_master *old_master;
> + int ret;
> +
> + lockdep_assert_held_once(&dev->struct_mutex);
> +

Is lockdep_assert_held_once() backported into the 3.2 series? If not,
this line could probably be replaced by lockdep_assert_held() for stable
kernels or removed entirely.

Thanks,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the drm-misc tree with Linus' tree

2015-12-13 Thread Thomas Hellstrom

On 12/14/2015 02:12 AM, Stephen Rothwell wrote:
> Hi all,
>
> Today's linux-next merge of the drm-misc tree got conflicts in:
>
>   drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c
>   drivers/gpu/drm/vmwgfx/vmwgfx_scrn.c
>   drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c
>
> between commit:
>
>   8fbf9d92a7bc ("drm/vmwgfx: Implement the cursor_set2 callback v2")
>
> from Linus' tree and commit:
>
>   f80de66eca65 ("drm/vmwgfx: Drop dummy save/restore hooks")
>
> from the drm-misc tree.
>
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).
>
FWIW, the fix looks correct to me.

/Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-graphics-maintainer] [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros

2015-12-02 Thread Thomas Hellstrom

On 12/02/2015 06:26 PM, Dmitry Torokhov wrote:
> On Wed, Dec 02, 2015 at 07:31:24AM -0800, Greg Kroah-Hartman wrote:
>> On Tue, Dec 01, 2015 at 06:21:06PM -0800, Sinclair Yeh wrote:
>>> On Tue, Dec 01, 2015 at 04:04:08PM -0800, Greg Kroah-Hartman wrote:
 On Tue, Dec 01, 2015 at 02:54:20PM -0800, Sinclair Yeh wrote:
> Hi,
>
> On Tue, Dec 01, 2015 at 02:45:27PM -0800, Dmitry Torokhov wrote:
>> On Tue, Dec 1, 2015 at 2:32 PM, Sinclair Yeh  wrote:
>>> Hi,
>>>
> 
>
>   */
> -#define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4)  \
> -({ \
> -   unsigned long __dummy1, __dummy2;   \
> -   __asm__ __volatile__ ("inl %%dx" :  \
> -   "=a"(out1), \
> -   "=b"(out2), \
> -   "=c"(out3), \
> -   "=d"(out4), \
> -   "=S"(__dummy1), \
> -   "=D"(__dummy2) :\
> -   "a"(VMMOUSE_PROTO_MAGIC),   \
> -   "b"(in1),   \
> -   "c"(VMMOUSE_PROTO_CMD_##cmd),   \
> -   "d"(VMMOUSE_PROTO_PORT) :   \
> -   "memory");  \
> +#define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4)
>  \
> +({\
> +   unsigned long __dummy1 = 0, __dummy2 = 0;  \
 Why do we need to initialize dummies?
>>> Because for some commands those parameters to VMW_PORT() can be both
>>> input and outout.
>> The vmmouse commands do not use them as input though, so it seems we
>> are simply wasting CPU cycles setting them to 0 just because we are
>> using the new VMW_PORT here. Why do we need to switch? What is the
>> benefit of doing this?
> There are two reasons.  One is to make the code more readable and
> maintainable.  Rather than having mostly similar inline assembly
> code sprinkled across multiple modules, we can just use the macros
> and document that.
 But the macro is only used here, and the variables aren't used at all,
 so it makes no sense in this file.
>>> Maybe it's because I didn't CC you on the rest of the series.  I wasn't
>>> sure what the proper distribution list is for each part.
>> Use scripts/get_maintainer.pl, that's what it is there for.  A number of
>> those patches should go through me, if not all of them, if you want them
>> merged...
>>
>>> This new macro is also used in arch/x86/kernel/cpu/vmware.c and
>>> vmw_balloon.c
>> And it's used inconsistantly in those patches (you don't set the dummy
>> variables to 0 in all of them...)  Now maybe that's just how the asm
>> functions work, but it's not very obvious as to why this is at all.
>>
> The second reason is this organization makes some on-going future
> development easier.
 We don't plan for "future" development other than a single patch series,
 as we have no idea what that development is, nor if it will really
 happen.  You can always change this file later if you need to, nothing
 is keeping that from happening.
>>> So the intent of this series is to centralize similar lines of inline
>>> assembly code that are currently used by 3 different kernel modules
>>> to a central place.  The new vmware.h [patch 0/6] becomes the one header
>>> to include for common guest-host communication needs.
>> Why can't it go into vmw_vmci_defs.h instead, or your other .h file, why
>> create yet-another-.h-file for your bus?  You already have 2, this would
>> make it 3, which seems like a lot...
> Umm, you are not saying that vmmouse should include vmci header file(s),
> are you? Because the 2 are unrelated and vmci does not use the
> hypervisor port to communicate with host IIRC.

Also the platform setup code uses the hypervisor port, so it's a natural
place for the macro defines.

/Thomas


>
> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-graphics-maintainer] [PATCH 3/6] Input: Update vmmouse.c to use the common VMW_PORT macros

2015-12-01 Thread Thomas Hellstrom

On 12/02/2015 01:04 AM, Greg Kroah-Hartman wrote:
> On Tue, Dec 01, 2015 at 02:54:20PM -0800, Sinclair Yeh wrote:
>> Hi,
>>
>> On Tue, Dec 01, 2015 at 02:45:27PM -0800, Dmitry Torokhov wrote:
>>> On Tue, Dec 1, 2015 at 2:32 PM, Sinclair Yeh  wrote:
 Hi,

>> 
>>
>>   */
>> -#define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4)  \
>> -({ \
>> -   unsigned long __dummy1, __dummy2;   \
>> -   __asm__ __volatile__ ("inl %%dx" :  \
>> -   "=a"(out1), \
>> -   "=b"(out2), \
>> -   "=c"(out3), \
>> -   "=d"(out4), \
>> -   "=S"(__dummy1), \
>> -   "=D"(__dummy2) :\
>> -   "a"(VMMOUSE_PROTO_MAGIC),   \
>> -   "b"(in1),   \
>> -   "c"(VMMOUSE_PROTO_CMD_##cmd),   \
>> -   "d"(VMMOUSE_PROTO_PORT) :   \
>> -   "memory");  \
>> +#define VMMOUSE_CMD(cmd, in1, out1, out2, out3, out4) \
>> +({\
>> +   unsigned long __dummy1 = 0, __dummy2 = 0;  \
> Why do we need to initialize dummies?
 Because for some commands those parameters to VMW_PORT() can be both
 input and outout.
>>> The vmmouse commands do not use them as input though, so it seems we
>>> are simply wasting CPU cycles setting them to 0 just because we are
>>> using the new VMW_PORT here. Why do we need to switch? What is the
>>> benefit of doing this?
>> There are two reasons.  One is to make the code more readable and
>> maintainable.  Rather than having mostly similar inline assembly
>> code sprinkled across multiple modules, we can just use the macros
>> and document that.
> But the macro is only used here, and the variables aren't used at all,
> so it makes no sense in this file.
>

IMO, this makes a lot of sense because we now get a single definition of
VMW_PORT in the platform code that a developer can refer to to
understand things like 32-64 bit compatibilty, and usage conditions and
it also forces the developer to adopt the good practice of clearing
currently unused input variables rather than to leave them undefined. In
addition, if something needs to be changed we have one single place to
change rather than a lot of places scattered all over various kernel
modules.

Things that we (I) previously, for example, didn't get quite right in
the vmmouse module despite spending a considerable amount of time on the
subject.

Thanks,
Thomas


 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] drm/vmwgfx: fix a problematic usage of WARN_ON()

2015-11-26 Thread Thomas Hellstrom

Thanks for reporting!

This fix was already reported by Dan Carpenter and has already been
queued in vmwgfx-fixes-4.4

/Thomas


On 11/25/2015 02:12 PM, Geliang Tang wrote:
> WARN_ON() takes a condition rather than a format string. This patch
> converted WARN_ON() to WARN() instead.
>
> Signed-off-by: Geliang Tang 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
> index a8baf5f..b6a0806 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
> @@ -390,7 +390,7 @@ void *vmw_fifo_reserve_dx(struct vmw_private *dev_priv, 
> uint32_t bytes,
>   else if (ctx_id == SVGA3D_INVALID_ID)
>   ret = vmw_local_fifo_reserve(dev_priv, bytes);
>   else {
> - WARN_ON("Command buffer has not been allocated.\n");
> + WARN(1, "Command buffer has not been allocated.\n");
>   ret = NULL;
>   }
>   if (IS_ERR_OR_NULL(ret)) {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.19.y-ckt 13/86] drm/vmwgfx: Fix up user_dmabuf refcounting

2015-10-28 Thread Thomas Hellstrom

Kamal,

On 10/27/2015 10:29 PM, Kamal Mostafa wrote:
> 3.19.8-ckt9 -stable review patch.  If anyone has any objections, please let 
> me know.
>
> --
>
> From: Thomas Hellstrom 
>
> commit 54c12bc374408faddbff75dbf1a6167c19af39c4 upstream.
>

Unfortunately there was a regression introduced with this patch. The fix
for the regression is tiny and introduced upstream with patch
ed7d78b2da32198ca4c70172e3b63c6b3e2c570b, "drm/vmwgfx: Fix kernel NULL
pointer dereference on older hardware", also CC'd stable.
Please, if possible, make sure both these patches are pushed at the same
time.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/vmwgfx: switch from ioremap_cache to memremap

2015-10-28 Thread Thomas Hellstrom

Dan,

On 10/19/2015 11:34 PM, Williams, Dan J wrote:
> On Tue, 2015-10-13 at 20:52 +0200, Thomas Hellstrom wrote:
>>> Ok, I'll make local read_fifo() and write_fifo() macros to make this
>>> explicit.  Are these names ok with you?
>> Sure.
>>
> So I ended up just leaving the __iomem annotation on mmio_virt for now
> until the implementation can be converted to use explicit barriers where
> necessary.

Thanks, I'll queue this on vmwgfx-next for 4.4 and carry on from there.

Reviewed-by: Thomas Hellstrom 


>
> 8<-
> Subject: drm/vmwgfx: switch from ioremap_cache to memremap
>
> From: Dan Williams 
>
> Per commit 2e586a7e017a "drm/vmwgfx: Map the fifo as cached" the driver
> expects the fifo registers to be cacheable.  In preparation for
> deprecating ioremap_cache() convert its usage in vmwgfx to memremap().
>
> Cc: David Airlie 
> Cc: Thomas Hellstrom 
> Cc: Sinclair Yeh 
> Cc: dri-de...@lists.freedesktop.org
> Signed-off-by: Dan Williams 
> ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |   14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 2c7a25c71af2..33e9eda77bad 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -752,8 +752,14 @@ static int vmw_driver_load(struct drm_device *dev, 
> unsigned long chipset)
>   ttm_lock_set_kill(&dev_priv->fbdev_master.lock, false, SIGTERM);
>   dev_priv->active_master = &dev_priv->fbdev_master;
>  
> - dev_priv->mmio_virt = ioremap_cache(dev_priv->mmio_start,
> - dev_priv->mmio_size);
> + /*
> +  * Force __iomem for this mapping until the implied compiler
> +  * barriers and {READ|WRITE}_ONCE semantics from the
> +  * io{read|write}32() accessors can be replaced with explicit
> +  * barriers.
> +  */
> + dev_priv->mmio_virt = (void __iomem *) memremap(dev_priv->mmio_start,
> + dev_priv->mmio_size, MEMREMAP_WB);
>  
>   if (unlikely(dev_priv->mmio_virt == NULL)) {
>   ret = -ENOMEM;
> @@ -907,7 +913,7 @@ out_no_irq:
>  out_no_device:
>   ttm_object_device_release(&dev_priv->tdev);
>  out_err4:
> - iounmap(dev_priv->mmio_virt);
> + memunmap((void __force *) dev_priv->mmio_virt);
>  out_err3:
>   vmw_ttm_global_release(dev_priv);
>  out_err0:
> @@ -958,7 +964,7 @@ static int vmw_driver_unload(struct drm_device *dev)
>   pci_release_regions(dev->pdev);
>  
>   ttm_object_device_release(&dev_priv->tdev);
> - iounmap(dev_priv->mmio_virt);
> + memunmap((void __force *) dev_priv->mmio_virt);
>   if (dev_priv->ctx.staged_bindings)
>   vmw_binding_state_free(dev_priv->ctx.staged_bindings);
>   vmw_ttm_global_release(dev_priv);
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/vmwgfx: switch from ioremap_cache to memremap

2015-10-13 Thread Thomas Hellstrom

On 10/13/2015 08:48 PM, Dan Williams wrote:
> On Tue, Oct 13, 2015 at 11:37 AM, Thomas Hellstrom
>  wrote:
>> On 10/13/2015 06:35 PM, Dan Williams wrote:
>>> On Mon, Oct 12, 2015 at 10:18 PM, Thomas Hellstrom
>>>  wrote:
>>>> Hi!
>>>>
>>>> On 10/13/2015 12:35 AM, Dan Williams wrote:
>>>>> Per commit 2e586a7e017a "drm/vmwgfx: Map the fifo as cached" the driver
>>>>> expects the fifo registers to be cacheable.  In preparation for
>>>>> deprecating ioremap_cache() convert its usage in vmwgfx to memremap().
>>>>>
>>>>> Cc: David Airlie 
>>>>> Cc: Thomas Hellstrom 
>>>>> Cc: Sinclair Yeh 
>>>>> Cc: dri-de...@lists.freedesktop.org
>>>>> Signed-off-by: Dan Williams 
>>>> While I have nothing against the conversion, what's stopping the
>>>> compiler from reordering writes on a generic architecture and caching
>>>> and reordering reads on x86 in particular? At the very least it looks to
>>>> me like the memory accesses of the memremap'd memory needs to be
>>>> encapsulated within READ_ONCE and WRITE_ONCE.
>>> Hmm, currently the code is using ioread32/iowrite32 which only do
>>> volatile accesses, whereas READ_ONCE / WRITE_ONCE have a memory
>>> clobber on entry and exit.  So, I'm assuming all you need is the
>>> guarantee of "no compiler re-ordering" and not the stronger
>>> READ_ONCE/WRITE_ONCE guarantees, but that still seems broken compared
>>> to explicit fencing where it matters.
>> I'm not quite sure I follow you here, it looks to me like READ_ONCE()
>> and WRITE_ONCE() are implemented as
>> volatile accesses,
> Ah, sorry, I was looking at the default case...
>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_include_linux_compiler.h-23L215&d=BQIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vpukPkBtpoNQp2IUKuFviOmPNYWVKmen3Jeeu55zmEA&m=JRxebmjcR4J-yhD0wROjKrAKyto5OeetIvqt7MqV_WA&s=zn7YmnS74zjM3Sd5Dp9mZnSL27jqel6cwRHwYV6gU3U&e=
>>  
>>
>> just like ioread32 and iowrite32
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_include_asm-2Dgeneric_io.h-23L54&d=BQIBaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vpukPkBtpoNQp2IUKuFviOmPNYWVKmen3Jeeu55zmEA&m=JRxebmjcR4J-yhD0wROjKrAKyto5OeetIvqt7MqV_WA&s=y4dD2GUpcZVHljnThYugF-YLTgeP6En4JwoOnkaLg7A&e=
>>  
>>
>> which would minimize any potential impact of this change.
>> IMO optimizing the memory accesses can be done as a later step.
>>
> Ok, I'll make local read_fifo() and write_fifo() macros to make this
> explicit.  Are these names ok with you?

Sure.

Thanks,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/vmwgfx: switch from ioremap_cache to memremap

2015-10-13 Thread Thomas Hellstrom

On 10/13/2015 06:35 PM, Dan Williams wrote:
> On Mon, Oct 12, 2015 at 10:18 PM, Thomas Hellstrom
>  wrote:
>> Hi!
>>
>> On 10/13/2015 12:35 AM, Dan Williams wrote:
>>> Per commit 2e586a7e017a "drm/vmwgfx: Map the fifo as cached" the driver
>>> expects the fifo registers to be cacheable.  In preparation for
>>> deprecating ioremap_cache() convert its usage in vmwgfx to memremap().
>>>
>>> Cc: David Airlie 
>>> Cc: Thomas Hellstrom 
>>> Cc: Sinclair Yeh 
>>> Cc: dri-de...@lists.freedesktop.org
>>> Signed-off-by: Dan Williams 
>> While I have nothing against the conversion, what's stopping the
>> compiler from reordering writes on a generic architecture and caching
>> and reordering reads on x86 in particular? At the very least it looks to
>> me like the memory accesses of the memremap'd memory needs to be
>> encapsulated within READ_ONCE and WRITE_ONCE.
> Hmm, currently the code is using ioread32/iowrite32 which only do
> volatile accesses, whereas READ_ONCE / WRITE_ONCE have a memory
> clobber on entry and exit.  So, I'm assuming all you need is the
> guarantee of "no compiler re-ordering" and not the stronger
> READ_ONCE/WRITE_ONCE guarantees, but that still seems broken compared
> to explicit fencing where it matters.

I'm not quite sure I follow you here, it looks to me like READ_ONCE()
and WRITE_ONCE() are implemented as
volatile accesses,

http://lxr.free-electrons.com/source/include/linux/compiler.h#L215

just like ioread32 and iowrite32

http://lxr.free-electrons.com/source/include/asm-generic/io.h#L54

which would minimize any potential impact of this change.
IMO optimizing the memory accesses can be done as a later step.

/Thomas





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drm/vmwgfx: switch from ioremap_cache to memremap

2015-10-12 Thread Thomas Hellstrom

Hi!

On 10/13/2015 12:35 AM, Dan Williams wrote:
> Per commit 2e586a7e017a "drm/vmwgfx: Map the fifo as cached" the driver
> expects the fifo registers to be cacheable.  In preparation for
> deprecating ioremap_cache() convert its usage in vmwgfx to memremap().
>
> Cc: David Airlie 
> Cc: Thomas Hellstrom 
> Cc: Sinclair Yeh 
> Cc: dri-de...@lists.freedesktop.org
> Signed-off-by: Dan Williams 

While I have nothing against the conversion, what's stopping the
compiler from reordering writes on a generic architecture and caching
and reordering reads on x86 in particular? At the very least it looks to
me like the memory accesses of the memremap'd memory needs to be
encapsulated within READ_ONCE and WRITE_ONCE.

How is this handled in the other conversions?

Thanks,
Thomas





> ---
>
>  This is part of the tree-wide conversion of ioremap_cache() to
>  memremap() targeted for v4.4 [1]
>  
>  As noted in that cover letter feel free to apply or ack.  These
>  individual conversions can go in independently given the base memremap()
>  implementation is already upstream in v4.3-rc1.
>  
>  This passes a clean run of sparse, but I have not tried to run the
>  result.
>  
>  [1]: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lkml.org_lkml_2015_10_9_699&d=BQICaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vpukPkBtpoNQp2IUKuFviOmPNYWVKmen3Jeeu55zmEA&m=XuVtQ3_28jin5hdK6wBcLigEiRc-1TuelYa3zm94m44&s=kN3-2XStWWU0f20wNGpmQiwZTTiBBzwD4LShvfokwkQ&e=
>  
>
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c   |8 +--
>  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h   |2 -
>  drivers/gpu/drm/vmwgfx/vmwgfx_fence.c |   23 ---
>  drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c  |  102 
> -
>  drivers/gpu/drm/vmwgfx/vmwgfx_ioctl.c |9 +--
>  drivers/gpu/drm/vmwgfx/vmwgfx_irq.c   |8 +--
>  drivers/gpu/drm/vmwgfx/vmwgfx_kms.c   |   24 
>  7 files changed, 84 insertions(+), 92 deletions(-)
>
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> index 2c7a25c71af2..d6152cd7c634 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
> @@ -752,8 +752,8 @@ static int vmw_driver_load(struct drm_device *dev, 
> unsigned long chipset)
>   ttm_lock_set_kill(&dev_priv->fbdev_master.lock, false, SIGTERM);
>   dev_priv->active_master = &dev_priv->fbdev_master;
>  
> - dev_priv->mmio_virt = ioremap_cache(dev_priv->mmio_start,
> - dev_priv->mmio_size);
> + dev_priv->mmio_virt = memremap(dev_priv->mmio_start,
> +dev_priv->mmio_size, MEMREMAP_WB);
>  
>   if (unlikely(dev_priv->mmio_virt == NULL)) {
>   ret = -ENOMEM;
> @@ -907,7 +907,7 @@ out_no_irq:
>  out_no_device:
>   ttm_object_device_release(&dev_priv->tdev);
>  out_err4:
> - iounmap(dev_priv->mmio_virt);
> + memunmap(dev_priv->mmio_virt);
>  out_err3:
>   vmw_ttm_global_release(dev_priv);
>  out_err0:
> @@ -958,7 +958,7 @@ static int vmw_driver_unload(struct drm_device *dev)
>   pci_release_regions(dev->pdev);
>  
>   ttm_object_device_release(&dev_priv->tdev);
> - iounmap(dev_priv->mmio_virt);
> + memunmap(dev_priv->mmio_virt);
>   if (dev_priv->ctx.staged_bindings)
>   vmw_binding_state_free(dev_priv->ctx.staged_bindings);
>   vmw_ttm_global_release(dev_priv);
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> index f19fd39b43e1..7ff1db23de80 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
> @@ -375,7 +375,7 @@ struct vmw_private {
>   uint32_t stdu_max_height;
>   uint32_t initial_width;
>   uint32_t initial_height;
> - u32 __iomem *mmio_virt;
> + u32 *mmio_virt;
>   uint32_t capabilities;
>   uint32_t max_gmr_ids;
>   uint32_t max_gmr_pages;
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c 
> b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> index 567ddede51d1..97f75adc080d 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fence.c
> @@ -141,9 +141,9 @@ static bool vmw_fence_enable_signaling(struct fence *f)
>  
>   struct vmw_fence_manager *fman = fman_from_fence(fence);
>   struct vmw_private *dev_priv = fman->dev_priv;
> + u32 *fifo_mem = dev_priv->mmio_virt;
> + u32 seqno = *(fifo_mem + SVGA_FIFO_FENCE);
>  
> - u32 __iomem *fifo_mem = dev_priv->mmio_virt;
> - u32 seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);
>   if (seqno - fence->base.seqno < VMW_FENCE_WRAP)
>   return false;
>  
>
Here, for example.

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kref: prefer atomic_inc_not_zero to atomic_add_unless

2015-10-11 Thread Thomas Hellstrom

Reviewed-by: Thomas Hellstrom 


On 10/10/2015 12:56 PM, Jason A. Donenfeld wrote:
> On most platforms, there exists this ifdef:
>
>  #define atomic_inc_not_zero(v) atomic_add_unless((v), 1, 0)
>
> This makes this patch functionally useless. However, on PPC, there is
> actually an explicit definition of atomic_inc_not_zero with its own
> assembly that is slightly more optimized than atomic_add_unless. So,
> this patch changes kref to use atomic_inc_not_zero instead, for PPC and
> any future platforms that might provide an explicit implementation.
>
> This also puts this usage of kref more in line with a verbatim reading
> of the examples in Paul McKenney's paper [1] in the section titled "2.4
> Atomic Counting With Check and Release Memory Barrier", which uses
> atomic_inc_not_zero.
>
> [1] 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__open-2Dstd.org_jtc1_sc22_wg21_docs_papers_2007_n2167.pdf&d=BQIBAg&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vpukPkBtpoNQp2IUKuFviOmPNYWVKmen3Jeeu55zmEA&m=z5Nd9sYiJMKiphNjyZp6XT5CbayXMBlcb903f260pDY&s=HEHX3CuXRs2GRRQWuC4Vef6iJMwdilKVRkiZgJpjEpA&e=
>  
>
> Signed-off-by: Jason A. Donenfeld 
> ---
>  include/linux/kref.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/linux/kref.h b/include/linux/kref.h
> index 484604d..83d1f94 100644
> --- a/include/linux/kref.h
> +++ b/include/linux/kref.h
> @@ -166,6 +166,6 @@ static inline int kref_put_mutex(struct kref *kref,
>   */
>  static inline int __must_check kref_get_unless_zero(struct kref *kref)
>  {
> - return atomic_add_unless(&kref->refcount, 1, 0);
> + return atomic_inc_not_zero(&kref->refcount);
>  }
>  #endif /* _KREF_H_ */

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.18+] Can't boot with commit bd809af1 ("x86: Enable PAT to use cache mode translation tables")

2014-12-16 Thread Thomas Hellstrom

Thanks,

>From what I understand, there is indeed a virtual processor bug. It's
fixed in HardWare Version 11, so that the PAT registers return the
correct value.

Thanks,
Thomas


On 12/17/2014 03:46 AM, Jongman Heo wrote:
> Hi,
>
> I'm using VMWare workstation, version 10.0.3 build-1895310, on Windows 7 
> 64-bit.
> Guest is Fedora 21.
>
> --- Original Message ---
> Sender : Thomas Hellstrom
> Date : 2014-12-17 00:12 (GMT+09:00)
> Title : Re: [3.18+] Can't boot with commit bd809af1 ("x86: Enable PAT to use 
> cache mode translation tables")
>
> Jongman, what product (player, ws, esx) and version are you using?
>
> Thanks,
> Thomas
>
>
> On 12/16/2014 02:08 PM, Peter Hurley wrote:
>> VMware guys probably already know this but just in case
>>
>> [ +cc Thomas Hellstrom ]
>>
>> Jongman - you need to fix your mailer to use plaintext and not base64.
>>
>> On 12/16/2014 01:46 AM, Jongman Heo wrote:
>>>> Sender : Juergen Gross
>>>> On 12/16/2014 07:29 AM, Jongman Heo wrote:
>>>>>> Sender : Juergen Gross
>>>>>> On 12/16/2014 05:40 AM, Jongman Heo wrote:
>>>>>>>> Sender : Juergen Gross
>>>>>>>> On 12/15/2014 08:52 AM, Jongman Heo wrote:
>>>>>>>>>> Sender : Juergen Gross
>>>>>>>>>> On 12/14/2014 06:07 AM, 허종만 wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> My Linux virtual machine on (Windows) VMWare workstation 10 can't 
>>>>>>>>>>> boot with following commit.
>>>>>>>>>>>
>>>>>>>>>>> commit bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
>>>>>>>>>>> Author: Juergen Gross
>>>>>>>>>>> Date:   Mon Nov 3 14:02:03 2014 +0100
>>>>>>>>>>>
>>>>>>>>>>> x86: Enable PAT to use cache mode translation tables
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately I can't see any console log.
>>>>>>>>>> Hmm, weird. Could you provide some more information?
>>>>>>>>>>
>>>>>>>>>> Kernel config, hardware used, /proc/cpuinfo of working kernel?
>>>>>>>>>> Anything you see with earlyprintk enabled?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Juergen
>>>>>>>>> (Sorry for resending this email, previous one bounced from mailing 
>>>>>>>>> list due to HTML format)
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm using Fedora 21, with custom built kernel.
>>>>>>>>> Host PC is windows 7 64-bit, and running VMWare workstation 10 for 
>>>>>>>>> guest Fedora Linux.
>>>>>>>>>
>>>>>>>>> With earlyprintk, just following message is printed.
>>>>>>>>>
>>>>>>>>>  early console in setup code
>>>>>>>>>
>>>>>>>>> and nothing more...
>>>>>>>> Can you try attached diagnostic patch, please? I suspect a problem
>>>>>>>> regarding VMWares PAT emulation...
>>>>>>>>
>>>>>>>>
>>>>>>>> Juergen
>>>>>>> Hi,
>>>>>>>
>>>>>>> With the commit reverted, the patch doesn't apply.
>>>>>> Sure.
>>>>>>
>>>>>>> Without revert, kernel (patch applied) doesn't boot and I can't see any 
>>>>>>> message.
>>>>>> What are your kernel parameters? There must be some message with the
>>>>>> diagnostic patch, as the first pr_info() is called before any other
>>>>>> part of the critical patch is becoming active. Could it be you have
>>>>>> instructed the kernel to be "quiet"? I'd recommend:
>>>>>>
>>>>>> earlyprintk=vga ignore_loglevel
>>>>>>
>>>>>> and no quiet. I don't know VMWare settings, so may be you can use
>>>>>> earlyprintk=ttyS0 instead of vga.
>>>>>>
>&g

Re: [3.18+] Can't boot with commit bd809af1 ("x86: Enable PAT to use cache mode translation tables")

2014-12-16 Thread Thomas Hellstrom

Jongman, what product (player, ws, esx) and version are you using?

Thanks,
Thomas


On 12/16/2014 02:08 PM, Peter Hurley wrote:
> VMware guys probably already know this but just in case
>
> [ +cc Thomas Hellstrom ]
>
> Jongman - you need to fix your mailer to use plaintext and not base64.
>
> On 12/16/2014 01:46 AM, Jongman Heo wrote:
>>> Sender : Juergen Gross
>>> On 12/16/2014 07:29 AM, Jongman Heo wrote:
>>>>> Sender : Juergen Gross
>>>>> On 12/16/2014 05:40 AM, Jongman Heo wrote:
>>>>>>> Sender : Juergen Gross
>>>>>>> On 12/15/2014 08:52 AM, Jongman Heo wrote:
>>>>>>>>> Sender : Juergen Gross
>>>>>>>>> On 12/14/2014 06:07 AM, 허종만 wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> My Linux virtual machine on (Windows) VMWare workstation 10 can't 
>>>>>>>>>> boot with following commit.
>>>>>>>>>>
>>>>>>>>>> commit bd809af16e3ab1f8d55b3e2928c47c67e2a865d2
>>>>>>>>>> Author: Juergen Gross
>>>>>>>>>> Date:   Mon Nov 3 14:02:03 2014 +0100
>>>>>>>>>>
>>>>>>>>>> x86: Enable PAT to use cache mode translation tables
>>>>>>>>>>
>>>>>>>>>> Unfortunately I can't see any console log.
>>>>>>>>> Hmm, weird. Could you provide some more information?
>>>>>>>>>
>>>>>>>>> Kernel config, hardware used, /proc/cpuinfo of working kernel?
>>>>>>>>> Anything you see with earlyprintk enabled?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Juergen
>>>>>>>> (Sorry for resending this email, previous one bounced from mailing 
>>>>>>>> list due to HTML format)
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm using Fedora 21, with custom built kernel.
>>>>>>>> Host PC is windows 7 64-bit, and running VMWare workstation 10 for 
>>>>>>>> guest Fedora Linux.
>>>>>>>>
>>>>>>>> With earlyprintk, just following message is printed.
>>>>>>>>
>>>>>>>>  early console in setup code
>>>>>>>>
>>>>>>>> and nothing more...
>>>>>>> Can you try attached diagnostic patch, please? I suspect a problem
>>>>>>> regarding VMWares PAT emulation...
>>>>>>>
>>>>>>>
>>>>>>> Juergen
>>>>>> Hi,
>>>>>>
>>>>>> With the commit reverted, the patch doesn't apply.
>>>>> Sure.
>>>>>
>>>>>> Without revert, kernel (patch applied) doesn't boot and I can't see any 
>>>>>> message.
>>>>> What are your kernel parameters? There must be some message with the
>>>>> diagnostic patch, as the first pr_info() is called before any other
>>>>> part of the critical patch is becoming active. Could it be you have
>>>>> instructed the kernel to be "quiet"? I'd recommend:
>>>>>
>>>>> earlyprintk=vga ignore_loglevel
>>>>>
>>>>> and no quiet. I don't know VMWare settings, so may be you can use
>>>>> earlyprintk=ttyS0 instead of vga.
>>>>>
>>>>>> Let me show you my PAT values (the commit reverted)
>>>>>>
>>>>>> # dmesg | grep PAT
>>>>>> [0.00] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
>>>>>> [0.314631] x86 PAT enabled: cpu 3, old 0x0, new 0x7010600070106
>>>>>> [0.314703] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
>>>>>> [0.314780] x86 PAT enabled: cpu 2, old 0x0, new 0x7010600070106
>>>>>> [0.314852] x86 PAT enabled: cpu 4, old 0x0, new 0x7010600070106
>>>>>> [0.314923] x86 PAT enabled: cpu 0, old 0x0, new 0x7010600070106
>>>>>> [0.314997] x86 PAT enabled: cpu 6, old 0x0, new 0x7010600070106
>>>>>> [0.315069] x86 PAT enabled: cpu 7, old 0x0, new 0x7010600070106
>>>>>> [0.315142] x86 PA

Re: [git pull] drm for 3.19-rc1

2014-12-15 Thread Thomas Hellstrom

On 12/16/2014 07:18 AM, Thomas Hellstrom wrote:
> On 12/16/2014 01:35 AM, Linus Torvalds wrote:
>> On Sun, Dec 14, 2014 at 11:17 PM, Dave Airlie  wrote:
>>> i915:
>>> Initial Skylake (SKL) support
>>> gen3/4 reset work
>>> start of dri1/ums removal
>>> infoframe tracking
>>> fixes for lots of things.
>> So I'm not sure how happy I am about this. It seems to work, but on
>> the very first boot I get this:
>>
>>
> So this is my fault.
>
> I posted the patch and there was a discussion with Chris Wilson at Intel
> on dri-devel following the post concluding that the intel user-space
> driver was type-casting dumb buffers, and that that was legitimate since
> it didn't take place in generic user-space, but in a driver that had
> detailed knowledge of the driver.

...detailed knowledge of the buffer.

>
> So I should have explicitly have withdrawn the patch. I'm sorry about this.
>
> Thanks,
> Thomas
>
>
/Thomas


>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] drm for 3.19-rc1

2014-12-15 Thread Thomas Hellstrom

On 12/16/2014 01:35 AM, Linus Torvalds wrote:
> On Sun, Dec 14, 2014 at 11:17 PM, Dave Airlie  wrote:
>> i915:
>> Initial Skylake (SKL) support
>> gen3/4 reset work
>> start of dri1/ums removal
>> infoframe tracking
>> fixes for lots of things.
> So I'm not sure how happy I am about this. It seems to work, but on
> the very first boot I get this:
>
>

So this is my fault.

I posted the patch and there was a discussion with Chris Wilson at Intel
on dri-devel following the post concluding that the intel user-space
driver was type-casting dumb buffers, and that that was legitimate since
it didn't take place in generic user-space, but in a driver that had
detailed knowledge of the driver.

So I should have explicitly have withdrawn the patch. I'm sorry about this.

Thanks,
Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: current mainline doesn't boot on VWware platform

2014-10-27 Thread Thomas Hellstrom

Hi!

Thanks for forwarding. There is a bug on redhat bugzilla

https://bugzilla.redhat.com/show_bug.cgi?id=1155825

that might be related, but from what I can tell, the possible deadlock
reported there is only hypothetical.

/Thomas


On 10/24/2014 06:13 PM, Dmitry Torokhov wrote:
> Hi Steve,
>
> On Sat, Oct 18, 2014 at 09:02:15PM -0500, Steve French wrote:
>> Interesting to see that Fedora has similar problems with current
>> kernel (I pulled mainline again a few hours ago).
>>
>> Fedora 20 x86_64 in vmware boots to logon prompt but no mouse, no
>> keyboard, system hung.  Ctl-alt-f1 doesn't do anything.  Even with
>> verbose on kernel boot line, the extra debug messages have all been
>> cleared by the time the hang occurs so nothing useful displayed.
>>
>> To be as safe as possible I used their default .config
>> /boot/config-3.16.6-200.fc20.x86_64 to do a clean build of mainline.
>> Once again, 3.16 and 3.17 work but not current mainline.
> Let's add a few VMware people. I am pretty sure they'd be interested in
> why it does not work.
>
> Guys, could you please route as appropriate.
>
> Thanks.
>
>> On Sat, Oct 18, 2014 at 4:52 PM, Steve French  wrote:
>>> Same thing happened after pulling newly updated mainline kernel
>>> changes a few minutes ago.
>>>
>>> Black screen on boot just when you would expect x to be starting.
>>> Hung. Ctl-alt-F1 doesn't do anything.
>>>
>>> arch is x86_64.
>>>
>>> 3.17.1 works.  When I get time I will see if I can figure out more
>>> useful info but this looks like a pretty boring setup:
>>>
>>> sfrench@ubuntu:~$ glxinfo | grep -i "vendor"
>>> server glx vendor string: SGI
>>> client glx vendor string: Mesa Project and SGI
>>> OpenGL vendor string: VMware, Inc.
>>> sfrench@ubuntu:~$ grep LoadModule /var/log/Xorg.0.log
>>> [18.271] (II) LoadModule: "glx"
>>> [18.288] (II) LoadModule: "vmware"
>>> [18.352] (II) LoadModule: "modesetting"
>>> [18.355] (II) LoadModule: "fbdev"
>>> [18.360] (II) LoadModule: "vesa"
>>> [18.369] (II) LoadModule: "fbdevhw"
>>> [18.372] (II) LoadModule: "fb"
>>> [18.379] (II) LoadModule: "dri2"
>>> [18.479] (II) LoadModule: "evdev"
>>> [18.484] (II) LoadModule: "vmmouse"
>>>
>>> On Fri, Oct 17, 2014 at 6:52 PM, Steve French  wrote:
 This is vmware guest

 sfrench@ubuntu:~/xfstests$ cat /proc/cpuinfo
 processor: 0
 vendor_id: GenuineIntel
 cpu family: 6
 model: 70
 model name: Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz

 (and using 64 bit kernel build) ...

 I don't have the Ubuntu config for their mainline kernel test builds
 any more (deleted the kernel since it didn't work) but they post their
 .config diffs at
 https://urldefense.proofpoint.com/v1/url?u=http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/current/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=wp7Vk7bzPRuHbjuXgvt6xD%2F5rYqFnHiHkWIbhmFC%2FM8%3D%0A&s=4e9068baa1dd8756d9843bd41f3568505198d273fe6d15fa2cf666c844f601cf

 I could not get to a console so assumed it died in kernel as x was
 coming up.  I will have to reinstall current mainline to see if I can
 hook up a serial console (I had maximum kernel boot verbose in the
 grub.cfg but nothing interesting before the black screen)

 On Fri, Oct 17, 2014 at 6:44 PM, Felipe Balbi  wrote:
> Hi,
>
> On Fri, Oct 17, 2014 at 05:44:02PM -0500, Steve French wrote:
>> Anyone know a workaround for the problem booting current mainline?
>> 3.17 works fine for me, but recent mainline since 3.17 goes to a black
>> screen near the end of boot as X is about to start.
>>
>> I also tried the Ubuntu build of the day (which also is based on
>> current mainline but with the Ubuntu config just in case it was some
>> driver issue) which fails the same way.
> you need to provide more information, here are a few queries which might
> help:
>
> 1. which architecture ?
> 2. Can you get a serial console to get some dmesg output ?
> 3. after screen is black, can you ctrl + alt + F1 and get a console ?
> 4. where's you .config ?
>
> --
> balbi


 --
 Thanks,

 Steve
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Steve
>>
>>
>> -- 
>> Thanks,
>>
>> Steve
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  
>> https://urldefense.proofpoint.com/v1/url?u=http://vger.kernel.org/majordomo-info.html&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=wp7Vk7bzPRuHbjuXgvt6xD%2F5rYqFnHiHkWIbhmFC%2FM8%3D%0A&s=21dbd5ec2e156ce4853d5ce9152289dc4475e1ea039790635ea0cd635312c820
>> Please read the FAQ at  
>> https://urldefense.proofpoint.com/v1/url?u=http://www.tux.org/lkml/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3Cx

Re: page allocator bug in 3.16?

2014-09-26 Thread Thomas Hellstrom

On 09/26/2014 02:28 PM, Rob Clark wrote:
> On Fri, Sep 26, 2014 at 6:45 AM, Thomas Hellstrom  
> wrote:
>> On 09/26/2014 12:40 PM, Chuck Ebbert wrote:
>>> On Fri, 26 Sep 2014 09:15:57 +0200
>>> Thomas Hellstrom  wrote:
>>>
>>>> On 09/26/2014 01:52 AM, Peter Hurley wrote:
>>>>> On 09/25/2014 03:35 PM, Chuck Ebbert wrote:
>>>>>> There are six ttm patches queued for 3.16.4:
>>>>>>
>>>>>> drm-ttm-choose-a-pool-to-shrink-correctly-in-ttm_dma_pool_shrink_scan.patch
>>>>>> drm-ttm-fix-handling-of-ttm_pl_flag_topdown-v2.patch
>>>>>> drm-ttm-fix-possible-division-by-0-in-ttm_dma_pool_shrink_scan.patch
>>>>>> drm-ttm-fix-possible-stack-overflow-by-recursive-shrinker-calls.patch
>>>>>> drm-ttm-pass-gfp-flags-in-order-to-avoid-deadlock.patch
>>>>>> drm-ttm-use-mutex_trylock-to-avoid-deadlock-inside-shrinker-functions.patch
>>>>> Thanks for info, Chuck.
>>>>>
>>>>> Unfortunately, none of these fix TTM dma allocation doing CMA dma 
>>>>> allocation,
>>>>> which is the root problem.
>>>>>
>>>>> Regards,
>>>>> Peter Hurley
>>>> The problem is not really in TTM but in CMA, There was a guy offering to
>>>> fix this in the CMA code but I guess he didn't probably because he
>>>> didn't receive any feedback.
>>>>
>>> Yeah, the "solution" to this problem seems to be "don't enable CMA on
>>> x86". Maybe it should even be disabled in the config system.
>> Or, as previously suggested, don't use CMA for order 0 (single page)
>> allocations
> On devices that actually need CMA pools to arrange for memory to be in
> certain ranges, I think you probably do want to have order 0 pages
> come from the CMA pool.

But can the DMA subsystem or more specifically dma_alloc_coherent()
really guarantee such things? Isn't it better for such devices to use
CMA directly?

/Thomas


>
> Seems like disabling CMA on x86 (where it should be unneeded) is the
> better way, IMO
>
> BR,
> -R
>
>
>> /Thomas
>>
>> ___
>> dri-devel mailing list
>> dri-de...@lists.freedesktop.org
>> https://urldefense.proofpoint.com/v1/url?u=http://lists.freedesktop.org/mailman/listinfo/dri-devel&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=l5Ago9ekmVFZ3c4M6eauqrJWGwjf6fTb%2BP3CxbBFkVM%3D%0A&m=Uz7JXDXYXp4RlLs7G6qxMQlhOOT0trW3l78xpKg6Ass%3D%0A&s=50d6b7b3bfd093c93a228437a3d4414e49b4de817657c49c35154a115a5c2188

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 203 matches

Mail list logo