Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only] Hi Christian, You mean amdgpu_bo_validate_size() return false is the expectation when GTT < request < VRAM, even if VRAM size can meet the requirement, right? Thanks! Rico From: Christian K?nig Sent:

Re: [PATCH hmm 4/6] mm/hmm: remove HMM_FAULT_SNAPSHOT

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:03PM -0300, Jason Gunthorpe wrote: > + struct hmm_range *range = hmm_vma_walk->range; > unsigned int required_fault = 0; > unsigned long i; > > - if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT) > + /* > + * If there is no way for valid to

Re: [PATCH hmm 2/6] mm/hmm: return the fault type from hmm_pte_need_fault()

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:01PM -0300, Jason Gunthorpe wrote: > +enum { > + NEED_FAULT = 1 << 0, > + NEED_WRITE_FAULT = 1 << 1, > +}; Maybe add a HMM_ prefix? > for (i = 0; i < npages; ++i) { > + required_fault |= > +

Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Koenig, Christian
Correct, yes. For example if you have a 16GB VRAM Vega10 in a system with just 4GB RAM you can only allocate < 4GB VRAM (actually more like ~3GB) in a single BO. Otherwise we wouldn't be able to evacuate VRAM to system memory and disk during suspend/resume or during memory pressure. Regards,

Re: [PATCH hmm 3/6] mm/hmm: remove unused code and tidy comments

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:02PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > Delete several functions that are never called, fix some desync between > comments and structure content, remove an unused ret, and move one > function only used by hmm.c into hmm.c This looks good:

Re: [PATCH 4/4] mm: check the device private page owner in hmm_range_fault

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 10:41:09AM -0300, Jason Gunthorpe wrote: > Thinking about this some more, does the locking work out here? > > hmm_range_fault() runs with mmap_sem in read, and does not lock any of > the page table levels. > > So it relies on accessing stale pte data being safe, and here

Re: [PATCH 3/4] mm: simplify device private page handling in hmm_range_fault

2020-03-21 Thread Christoph Hellwig
On Thu, Mar 19, 2020 at 09:03:45PM -0300, Jason Gunthorpe wrote: > > Should tests enable the feature or the feature enable the test? > > IMHO, if the feature is being compiled into the kernel, that should > > enable the menu item for the test. If the feature isn't selected, > > no need to test it

Re: [PATCH hmm 1/6] mm/hmm: remove pgmap checking for devmap pages

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:00PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > The checking boils down to some racy check if the pagemap is still > available or not. Instead of checking this, rely entirely on the > notifiers, if a pagemap is destroyed then all pages that belong to

Re: [PATCH hmm 6/6] mm/hmm: use device_private_entry_to_pfn()

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:05PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > swp_offset() should not be called directly, the wrappers are supposed to > abstract away the encoding of the device_private specific information in > the swap entry. > > Signed-off-by: Jason Gunthorpe

Re: [PATCH hmm 5/6] mm/hmm: remove the CONFIG_TRANSPARENT_HUGEPAGE #ifdef

2020-03-21 Thread Christoph Hellwig
On Fri, Mar 20, 2020 at 01:49:04PM -0300, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > This code can be compiled when CONFIG_TRANSPARENT_HUGEPAGE is off, so > remove the ifdef. It can compile, but will the compiler optimize it away? Seems like both pmd_trans_huge and pmd_devmap are stubs

Re: [PATCH] drm/amdgpu: fix size validation failure in large buffer creation

2020-03-21 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only] I see, thanks your explanation. Regards, Rico From: Koenig, Christian Sent: Saturday, March 21, 2020 16:44 To: Yin, Tianci (Rico) Cc: amd-gfx@lists.freedesktop.org ; Xu, Feifei ; Li, Pauline ; Long, Gang ;

Re: Possibility of RX570 responsible for spontaneous reboots (MCE) with Ryzen 3700x?

2020-03-21 Thread Clemens Eisserer
Hi John, > >I know RX570 (polaris) should stay at PCI3 as far as I know. > > Yep... thought I remembered you mentioning having a 5700XT though... is that > in a different system ? I am using a RX570, the guy from reddit changed from R600 to an 5700XT and it seems it did solve his reboot