CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 30.05.2014 13:46, Grigori Goronzy wrote:
> On 30.05.2014 13:30, Marek Ol??k wrote:
>> Grigori,
>>
>> you can git-checkout the commit before and after the memory management
>> changes, compile both and test them.
>>
>
> I was trying to revert the changes, but it looks like too much changed
> in the meantime. The suitable commits to check out should be 0bc490a8
> (before) and 19dff56a (after), right?
>

Turns out these changes weren't the problem, but instead it's the page 
tables rework which seems to also cause a bunch of other issues, commit 
6d2f2944. The latest drm-fixes code doesn't change it, either.

According to my (not very scientific) testing with radeontop and the 
"time" utility, this appears to be a CPU overhead problem. The "sys" 
duration reported by time for a Xonotic benchmark run is over 3x as long 
after the regression, and radeontop seems to report about 10% reduced 
GPU load on average.

Best regards
Grigori

> Best regards
> Grigori
>
>> Marek
>>
>> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy 
>> wrote:
>>> On 13.05.2014 22:27, Marek Ol??k wrote:

 I applied these two patches Christian sent to dri-devel:

 drm/radeon: fix page directory update size estimation
 drm/radeon: fix buffer placement under memory pressure v2

 on top of torvalds's master branch.

>>>
>>> With latest kernel master (a991639c) I still see a regression,
>>> compared to
>>> 3.13 or 3.14, which have similar performance. Xonotic is about 7%
>>> slower.
>>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>>> record accurate numbers.
>>>
>>> Maybe the improved memory management has some overhead, but this is not
>>> acceptable IMHO. I'll try to investigate further.
>>>
>>> Best regards
>>>
>>> Grigori
>>>
 Marek

 On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy 
 wrote:
>
> On 13.05.2014 21:50, Marek Ol??k wrote:
>>
>>
>> Hi Christian,
>>
>> The performance regression I saw with piglit seems to be fixed with
>> latest kernel git. It's difficult to bisect the kernel, because there
>> are only merges between 3.14 and 3.15 and the merged committs are
>> actually based on 3.14-rc1 and 3.14-rc4.
>>
>> All seems to be fine with your fixes.
>>
>
> Which fixes have you applied? There are quite a few pending patches on
> dri-devel, that aren't yet part of drm-fixes-3.15.
>
> Grigori
>
>
>> Marek
>>
>> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>>  wrote:
>>>
>>>
>>> Is the performance regression regression caused by the page table
>>> changes
>>> or
>>> something else?
>>>
>>> I did made some tests with xonotic while developing it and it didn't
>>> showed
>>> anything obvious, but I didn't made tests on different systems.
>>>
>>> Christian.
>>>
>>> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>>>
 Your latest patches fix the regression.

 The performance regression can also be reproduced with piglit "-t
 texelFetch.fs".

 Kernel 3.14:
real0m17.724s
user0m41.905s
sys0m11.299s

 The problematic commit checked out + your fixes (without the PTE
 patch
 I
 think):
real0m23.474s
user1m1.008s
sys0m13.812s

 Marek


 On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
  wrote:
>
>
>
> Am 13.05.2014 15:22, schrieb Alex Deucher:
>
>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy
>> 
>> wrote:
>>>
>>>
>>>
>>> I can confirm this fixes it for me, too.
>>>
>>> 3.15 with these fixes and the large PTE patches actually ends up
>>> being
>>> noticeably slower than earlier kernels with Xonotic, though. I
>>> wonder
>>> what's
>>> going on.
>>
>>
>>
>> Allocation overhead?
>
>
>
>
> Unlikely, Xonotic just allocates a single page table at start,
> which
> then
> gets extended to a certain rate until they no longer need more
> address
> space
> and are done with it.
>
> Grigori, can you bisect and/or try to figure out what's wrong
> here?
>
> Christian.
>
>
>>
>>> Grigori
>>>
>>>
>>> On 12.05.2014 14:50, Christian K?nig wrote:



 I could reproduce the problem with xonotic and I think I've
 found
 the
 issue.

 Please test the attached patch.

 Thanks,
 

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Christian König
Well the good news is that when I use the CP DMA instead of the SDMA 
everything seems to work fine.

Unfortunately using the CP DMA has a completely different timing 
(because of the additional sync needed) and so I'm not sure if it's 
really fixed or just masked.

Christian.

Am 29.05.2014 18:52, schrieb Alex Deucher:
> On Thu, May 29, 2014 at 12:30 PM, Christian K?nig
>  wrote:
>> Hi Marek & Alex,
>>
>> I've found the issue why forcefully evicting page tables sometimes crashes
>> the box.
>>
>> Well this is a typical hexdump page table before it is moved to GART:
>> 000117f000  02914061 
>> 000117f008  02915061 
>> 000117f010  02916061 
>> 000117f018  02917061 
>> 000117f020  02918061 
>>
>> And it looks like this when it comes back:
>> 0006102000   
>> *
>>
>> Ideas? I don't really have an explanation for this. Moving buffers around
>> otherwise seems to work perfectly fine.
> Nothing I can think of off hand.  Might be worth trying CP DMA rather
> than SDMA for BO moves to see if we can narrow it down a bit more.
> Might also try the other SDMA ring.
>
> Alex
>
>> Thanks,
>> Christian.
>>
>> Am 28.05.2014 12:38, schrieb Christian K?nig:
>>
>>> I already tried a similar patch as well, without any more noticeable
>>> crashes. But going to give this another round with your patch and openarena.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 27.05.2014 23:55, schrieb Marek Ol??k:
 Hi Christian,

 I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
 fixed yet. They are very rare and very random. Therefore, I have come
 up with a patch which evicts page tables between IBs. See the
 attachment. With that patch applied, the system starts fine, compiz
 and glxgears work, but once I start playing openarena, it locks up
 pretty quickly.

 The patch shouldn't do anything in theory, because pages are moved
 back to VRAM immediately after that. However, the VRAM address of page
 tables may end up being different from before, which might be the root
 cause.

 Marek

 On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
  wrote:
> Crap, any chance you can narrow it down a bit more?
>
> I've just tried a piglit quick test on my Bonaire and it seems to work
> perfectly fine.
>
> What hw do you test on?
>
> Regards,
> Christian.
>
> Am 13.05.2014 23:21, schrieb Marek Ol??k:
>
>> Hi Christian,
>>
>> Even though some regressions are fixed by these patches:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> and indeed, the texelFetch tests no longer hang, there is one more
>> hang which needs to be fixed. :( All I know is the exact same commit
>> causes it and it can only be reproduced by running whole piglit with
>> concurrency enabled.
>>
>> My kernel git log:
>>
>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>> (10 hours ago) 
>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>> hours ago) 
>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>> months ago) 
>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>> months ago) 
>>
>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>> of the two fixes is the first bad commit.
>>
>> Marek
>>
>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>>> Hi Christian,
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>
>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>> Author: Christian K?nig 
>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>
>>>drm/radeon: use normal BOs for the page tables v4
>>>
>>>No need to make it more complicated than necessary,
>>>just allocate the page tables as normal BO and
>>>flush whenever the address change.
>>>
>>>v2: update comments and function name
>>>v3: squash bug fixes, page directory and tables patch
>>>v4: rebased on Mareks changes
>>>
>>>Signed-off-by: Christian K?nig 
>>>
>>>
>>> Reverting the commit gives me a lot of merge conflicts.
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>> Any idea what is wrong with it?
>>>
>>> Thanks,
>>>
>>> Marek
>



CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Marek Olšák
That's right.

Also, you probably want to enable automatic addition of the git-sha1
to the kernel version in menuconfig, there is an option for it, so
that you can have several kernels with the same version but different
sha1 installed.

Marek

On Fri, May 30, 2014 at 1:46 PM, Grigori Goronzy  wrote:
> On 30.05.2014 13:30, Marek Ol??k wrote:
>>
>> Grigori,
>>
>> you can git-checkout the commit before and after the memory management
>> changes, compile both and test them.
>>
>
> I was trying to revert the changes, but it looks like too much changed in
> the meantime. The suitable commits to check out should be 0bc490a8 (before)
> and 19dff56a (after), right?
>
>
> Best regards
> Grigori
>
>> Marek
>>
>> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy 
>> wrote:
>>>
>>> On 13.05.2014 22:27, Marek Ol??k wrote:


 I applied these two patches Christian sent to dri-devel:

 drm/radeon: fix page directory update size estimation
 drm/radeon: fix buffer placement under memory pressure v2

 on top of torvalds's master branch.

>>>
>>> With latest kernel master (a991639c) I still see a regression, compared
>>> to
>>> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
>>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>>> record accurate numbers.
>>>
>>> Maybe the improved memory management has some overhead, but this is not
>>> acceptable IMHO. I'll try to investigate further.
>>>
>>> Best regards
>>>
>>> Grigori
>>>
 Marek

 On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy 
 wrote:
>
>
> On 13.05.2014 21:50, Marek Ol??k wrote:
>>
>>
>>
>> Hi Christian,
>>
>> The performance regression I saw with piglit seems to be fixed with
>> latest kernel git. It's difficult to bisect the kernel, because there
>> are only merges between 3.14 and 3.15 and the merged committs are
>> actually based on 3.14-rc1 and 3.14-rc4.
>>
>> All seems to be fine with your fixes.
>>
>
> Which fixes have you applied? There are quite a few pending patches on
> dri-devel, that aren't yet part of drm-fixes-3.15.
>
> Grigori
>
>
>> Marek
>>
>> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>>  wrote:
>>>
>>>
>>>
>>> Is the performance regression regression caused by the page table
>>> changes
>>> or
>>> something else?
>>>
>>> I did made some tests with xonotic while developing it and it didn't
>>> showed
>>> anything obvious, but I didn't made tests on different systems.
>>>
>>> Christian.
>>>
>>> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>>>
 Your latest patches fix the regression.

 The performance regression can also be reproduced with piglit "-t
 texelFetch.fs".

 Kernel 3.14:
real0m17.724s
user0m41.905s
sys0m11.299s

 The problematic commit checked out + your fixes (without the PTE
 patch
 I
 think):
real0m23.474s
user1m1.008s
sys0m13.812s

 Marek


 On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
  wrote:
>
>
>
>
> Am 13.05.2014 15:22, schrieb Alex Deucher:
>
>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy
>> 
>> wrote:
>>>
>>>
>>>
>>>
>>> I can confirm this fixes it for me, too.
>>>
>>> 3.15 with these fixes and the large PTE patches actually ends up
>>> being
>>> noticeably slower than earlier kernels with Xonotic, though. I
>>> wonder
>>> what's
>>> going on.
>>
>>
>>
>>
>> Allocation overhead?
>
>
>
>
>
> Unlikely, Xonotic just allocates a single page table at start,
> which
> then
> gets extended to a certain rate until they no longer need more
> address
> space
> and are done with it.
>
> Grigori, can you bisect and/or try to figure out what's wrong here?
>
> Christian.
>
>
>>
>>> Grigori
>>>
>>>
>>> On 12.05.2014 14:50, Christian K?nig wrote:




 I could reproduce the problem with xonotic and I think I've
 found
 the
 issue.

 Please test the attached patch.

 Thanks,
 Christian.

 Am 11.05.2014 11:06, schrieb Christian K?nig:
>>
>>
>>
>>
>> I have tested it and it doesn't fix the hangs.

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 30.05.2014 13:30, Marek Ol??k wrote:
> Grigori,
>
> you can git-checkout the commit before and after the memory management
> changes, compile both and test them.
>

I was trying to revert the changes, but it looks like too much changed 
in the meantime. The suitable commits to check out should be 0bc490a8 
(before) and 19dff56a (after), right?

Best regards
Grigori

> Marek
>
> On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy  wrote:
>> On 13.05.2014 22:27, Marek Ol??k wrote:
>>>
>>> I applied these two patches Christian sent to dri-devel:
>>>
>>> drm/radeon: fix page directory update size estimation
>>> drm/radeon: fix buffer placement under memory pressure v2
>>>
>>> on top of torvalds's master branch.
>>>
>>
>> With latest kernel master (a991639c) I still see a regression, compared to
>> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
>> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
>> record accurate numbers.
>>
>> Maybe the improved memory management has some overhead, but this is not
>> acceptable IMHO. I'll try to investigate further.
>>
>> Best regards
>>
>> Grigori
>>
>>> Marek
>>>
>>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy 
>>> wrote:

 On 13.05.2014 21:50, Marek Ol??k wrote:
>
>
> Hi Christian,
>
> The performance regression I saw with piglit seems to be fixed with
> latest kernel git. It's difficult to bisect the kernel, because there
> are only merges between 3.14 and 3.15 and the merged committs are
> actually based on 3.14-rc1 and 3.14-rc4.
>
> All seems to be fine with your fixes.
>

 Which fixes have you applied? There are quite a few pending patches on
 dri-devel, that aren't yet part of drm-fixes-3.15.

 Grigori


> Marek
>
> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>  wrote:
>>
>>
>> Is the performance regression regression caused by the page table
>> changes
>> or
>> something else?
>>
>> I did made some tests with xonotic while developing it and it didn't
>> showed
>> anything obvious, but I didn't made tests on different systems.
>>
>> Christian.
>>
>> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>>
>>> Your latest patches fix the regression.
>>>
>>> The performance regression can also be reproduced with piglit "-t
>>> texelFetch.fs".
>>>
>>> Kernel 3.14:
>>>real0m17.724s
>>>user0m41.905s
>>>sys0m11.299s
>>>
>>> The problematic commit checked out + your fixes (without the PTE patch
>>> I
>>> think):
>>>real0m23.474s
>>>user1m1.008s
>>>sys0m13.812s
>>>
>>> Marek
>>>
>>>
>>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>>>  wrote:



 Am 13.05.2014 15:22, schrieb Alex Deucher:

> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy  chown.ath.cx>
> wrote:
>>
>>
>>
>> I can confirm this fixes it for me, too.
>>
>> 3.15 with these fixes and the large PTE patches actually ends up
>> being
>> noticeably slower than earlier kernels with Xonotic, though. I
>> wonder
>> what's
>> going on.
>
>
>
> Allocation overhead?




 Unlikely, Xonotic just allocates a single page table at start, which
 then
 gets extended to a certain rate until they no longer need more
 address
 space
 and are done with it.

 Grigori, can you bisect and/or try to figure out what's wrong here?

 Christian.


>
>> Grigori
>>
>>
>> On 12.05.2014 14:50, Christian K?nig wrote:
>>>
>>>
>>>
>>> I could reproduce the problem with xonotic and I think I've found
>>> the
>>> issue.
>>>
>>> Please test the attached patch.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 11.05.2014 11:06, schrieb Christian K?nig:
>
>
>
> I have tested it and it doesn't fix the hangs.



 Yeah, thought so. Well it was just a guess.

> (Also, I don't like the patch, because it reverts the behavior I
> added
> for userspace buffers.)



 Actually it shouldn't affect that. The alternative domain always
 contains GART even when userspace only specified VRAM as
 placement
 (as
 long as it is technical possible to do so).

 So what should happen is that TTM sees the current 

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Marek Olšák
Grigori,

you can git-checkout the commit before and after the memory management
changes, compile both and test them.

Marek

On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy  wrote:
> On 13.05.2014 22:27, Marek Ol??k wrote:
>>
>> I applied these two patches Christian sent to dri-devel:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> on top of torvalds's master branch.
>>
>
> With latest kernel master (a991639c) I still see a regression, compared to
> 3.13 or 3.14, which have similar performance. Xonotic is about 7% slower.
> OpenArena and Unigine Tropics are also noticeably slower, but I didn't
> record accurate numbers.
>
> Maybe the improved memory management has some overhead, but this is not
> acceptable IMHO. I'll try to investigate further.
>
> Best regards
>
> Grigori
>
>> Marek
>>
>> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy 
>> wrote:
>>>
>>> On 13.05.2014 21:50, Marek Ol??k wrote:


 Hi Christian,

 The performance regression I saw with piglit seems to be fixed with
 latest kernel git. It's difficult to bisect the kernel, because there
 are only merges between 3.14 and 3.15 and the merged committs are
 actually based on 3.14-rc1 and 3.14-rc4.

 All seems to be fine with your fixes.

>>>
>>> Which fixes have you applied? There are quite a few pending patches on
>>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>>
>>> Grigori
>>>
>>>
 Marek

 On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
  wrote:
>
>
> Is the performance regression regression caused by the page table
> changes
> or
> something else?
>
> I did made some tests with xonotic while developing it and it didn't
> showed
> anything obvious, but I didn't made tests on different systems.
>
> Christian.
>
> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>
>> Your latest patches fix the regression.
>>
>> The performance regression can also be reproduced with piglit "-t
>> texelFetch.fs".
>>
>> Kernel 3.14:
>>   real0m17.724s
>>   user0m41.905s
>>   sys0m11.299s
>>
>> The problematic commit checked out + your fixes (without the PTE patch
>> I
>> think):
>>   real0m23.474s
>>   user1m1.008s
>>   sys0m13.812s
>>
>> Marek
>>
>>
>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>>  wrote:
>>>
>>>
>>>
>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>
 On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
 wrote:
>
>
>
> I can confirm this fixes it for me, too.
>
> 3.15 with these fixes and the large PTE patches actually ends up
> being
> noticeably slower than earlier kernels with Xonotic, though. I
> wonder
> what's
> going on.



 Allocation overhead?
>>>
>>>
>>>
>>>
>>> Unlikely, Xonotic just allocates a single page table at start, which
>>> then
>>> gets extended to a certain rate until they no longer need more
>>> address
>>> space
>>> and are done with it.
>>>
>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>
>>> Christian.
>>>
>>>

> Grigori
>
>
> On 12.05.2014 14:50, Christian K?nig wrote:
>>
>>
>>
>> I could reproduce the problem with xonotic and I think I've found
>> the
>> issue.
>>
>> Please test the attached patch.
>>
>> Thanks,
>> Christian.
>>
>> Am 11.05.2014 11:06, schrieb Christian K?nig:



 I have tested it and it doesn't fix the hangs.
>>>
>>>
>>>
>>> Yeah, thought so. Well it was just a guess.
>>>
 (Also, I don't like the patch, because it reverts the behavior I
 added
 for userspace buffers.)
>>>
>>>
>>>
>>> Actually it shouldn't affect that. The alternative domain always
>>> contains GART even when userspace only specified VRAM as
>>> placement
>>> (as
>>> long as it is technical possible to do so).
>>>
>>> So what should happen is that TTM sees the current placement,
>>> matches
>>> that with the desired placement and should find that it doesn't
>>> need
>>> to move the buffer (we should just test if this behavior really
>>> works
>>> as expected).
>>>
>>> Christian.
>>>
>>> Am 10.05.2014 23:38, schrieb Marek Ol??k:



 Hi Christian,

 I have tested it 

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 13.05.2014 22:27, Marek Ol??k wrote:
> I applied these two patches Christian sent to dri-devel:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> on top of torvalds's master branch.
>

With latest kernel master (a991639c) I still see a regression, compared 
to 3.13 or 3.14, which have similar performance. Xonotic is about 7% 
slower. OpenArena and Unigine Tropics are also noticeably slower, but I 
didn't record accurate numbers.

Maybe the improved memory management has some overhead, but this is not 
acceptable IMHO. I'll try to investigate further.

Best regards
Grigori

> Marek
>
> On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy  
> wrote:
>> On 13.05.2014 21:50, Marek Ol??k wrote:
>>>
>>> Hi Christian,
>>>
>>> The performance regression I saw with piglit seems to be fixed with
>>> latest kernel git. It's difficult to bisect the kernel, because there
>>> are only merges between 3.14 and 3.15 and the merged committs are
>>> actually based on 3.14-rc1 and 3.14-rc4.
>>>
>>> All seems to be fine with your fixes.
>>>
>>
>> Which fixes have you applied? There are quite a few pending patches on
>> dri-devel, that aren't yet part of drm-fixes-3.15.
>>
>> Grigori
>>
>>
>>> Marek
>>>
>>> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>>>  wrote:

 Is the performance regression regression caused by the page table changes
 or
 something else?

 I did made some tests with xonotic while developing it and it didn't
 showed
 anything obvious, but I didn't made tests on different systems.

 Christian.

 Am 13.05.2014 17:19, schrieb Marek Ol??k:

> Your latest patches fix the regression.
>
> The performance regression can also be reproduced with piglit "-t
> texelFetch.fs".
>
> Kernel 3.14:
>   real0m17.724s
>   user0m41.905s
>   sys0m11.299s
>
> The problematic commit checked out + your fixes (without the PTE patch I
> think):
>   real0m23.474s
>   user1m1.008s
>   sys0m13.812s
>
> Marek
>
>
> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>  wrote:
>>
>>
>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>
>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
>>> wrote:


 I can confirm this fixes it for me, too.

 3.15 with these fixes and the large PTE patches actually ends up
 being
 noticeably slower than earlier kernels with Xonotic, though. I wonder
 what's
 going on.
>>>
>>>
>>> Allocation overhead?
>>
>>
>>
>> Unlikely, Xonotic just allocates a single page table at start, which
>> then
>> gets extended to a certain rate until they no longer need more address
>> space
>> and are done with it.
>>
>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>
>> Christian.
>>
>>
>>>
 Grigori


 On 12.05.2014 14:50, Christian K?nig wrote:
>
>
> I could reproduce the problem with xonotic and I think I've found
> the
> issue.
>
> Please test the attached patch.
>
> Thanks,
> Christian.
>
> Am 11.05.2014 11:06, schrieb Christian K?nig:
>>>
>>>
>>> I have tested it and it doesn't fix the hangs.
>>
>>
>> Yeah, thought so. Well it was just a guess.
>>
>>> (Also, I don't like the patch, because it reverts the behavior I
>>> added
>>> for userspace buffers.)
>>
>>
>> Actually it shouldn't affect that. The alternative domain always
>> contains GART even when userspace only specified VRAM as placement
>> (as
>> long as it is technical possible to do so).
>>
>> So what should happen is that TTM sees the current placement,
>> matches
>> that with the desired placement and should find that it doesn't
>> need
>> to move the buffer (we should just test if this behavior really
>> works
>> as expected).
>>
>> Christian.
>>
>> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>>>
>>>
>>> Hi Christian,
>>>
>>> I have tested it and it doesn't fix the hangs.
>>>
>>> (Also, I don't like the patch, because it reverts the behavior I
>>> added
>>> for userspace buffers.)
>>>
>>> Marek
>>>
>>>
>>>
>>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>>  wrote:


 Couldn't reproduce the issue so far. So the attached patch is
 just
 a
 complete shoot into the dark found by 

CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Christian König
Yeah, that will work around it for now.

But the general problem is that we have a memory corruption here, we 
just didn't noticed it earlier because clearing a texture or vectors 
with zero only results in random mis rendering.

Only when you hit a shader or in this case a page table it really 
manifests in a bad crash.

Going to dig deeper into this,
Christian.

Am 29.05.2014 18:51, schrieb Marek Ol??k:
> Can disable evictions for page tables, e.g. by removing them from the LRU 
> list?
>
> Marek
>
> On Thu, May 29, 2014 at 6:30 PM, Christian K?nig
>  wrote:
>> Hi Marek & Alex,
>>
>> I've found the issue why forcefully evicting page tables sometimes crashes
>> the box.
>>
>> Well this is a typical hexdump page table before it is moved to GART:
>> 000117f000  02914061 
>> 000117f008  02915061 
>> 000117f010  02916061 
>> 000117f018  02917061 
>> 000117f020  02918061 
>>
>> And it looks like this when it comes back:
>> 0006102000   
>> *
>>
>> Ideas? I don't really have an explanation for this. Moving buffers around
>> otherwise seems to work perfectly fine.
>>
>> Thanks,
>> Christian.
>>
>> Am 28.05.2014 12:38, schrieb Christian K?nig:
>>
>>> I already tried a similar patch as well, without any more noticeable
>>> crashes. But going to give this another round with your patch and openarena.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 27.05.2014 23:55, schrieb Marek Ol??k:
 Hi Christian,

 I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
 fixed yet. They are very rare and very random. Therefore, I have come
 up with a patch which evicts page tables between IBs. See the
 attachment. With that patch applied, the system starts fine, compiz
 and glxgears work, but once I start playing openarena, it locks up
 pretty quickly.

 The patch shouldn't do anything in theory, because pages are moved
 back to VRAM immediately after that. However, the VRAM address of page
 tables may end up being different from before, which might be the root
 cause.

 Marek

 On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
  wrote:
> Crap, any chance you can narrow it down a bit more?
>
> I've just tried a piglit quick test on my Bonaire and it seems to work
> perfectly fine.
>
> What hw do you test on?
>
> Regards,
> Christian.
>
> Am 13.05.2014 23:21, schrieb Marek Ol??k:
>
>> Hi Christian,
>>
>> Even though some regressions are fixed by these patches:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> and indeed, the texelFetch tests no longer hang, there is one more
>> hang which needs to be fixed. :( All I know is the exact same commit
>> causes it and it can only be reproduced by running whole piglit with
>> concurrency enabled.
>>
>> My kernel git log:
>>
>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>> (10 hours ago) 
>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>> hours ago) 
>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>> months ago) 
>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>> months ago) 
>>
>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>> of the two fixes is the first bad commit.
>>
>> Marek
>>
>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>>> Hi Christian,
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>
>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>> Author: Christian K?nig 
>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>
>>>drm/radeon: use normal BOs for the page tables v4
>>>
>>>No need to make it more complicated than necessary,
>>>just allocate the page tables as normal BO and
>>>flush whenever the address change.
>>>
>>>v2: update comments and function name
>>>v3: squash bug fixes, page directory and tables patch
>>>v4: rebased on Mareks changes
>>>
>>>Signed-off-by: Christian K?nig 
>>>
>>>
>>> Reverting the commit gives me a lot of merge conflicts.
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>> Any idea what is wrong with it?
>>>
>>> Thanks,
>>>
>>> Marek
>



CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Marek Olšák
Can disable evictions for page tables, e.g. by removing them from the LRU list?

Marek

On Thu, May 29, 2014 at 6:30 PM, Christian K?nig
 wrote:
> Hi Marek & Alex,
>
> I've found the issue why forcefully evicting page tables sometimes crashes
> the box.
>
> Well this is a typical hexdump page table before it is moved to GART:
> 000117f000  02914061 
> 000117f008  02915061 
> 000117f010  02916061 
> 000117f018  02917061 
> 000117f020  02918061 
>
> And it looks like this when it comes back:
> 0006102000   
> *
>
> Ideas? I don't really have an explanation for this. Moving buffers around
> otherwise seems to work perfectly fine.
>
> Thanks,
> Christian.
>
> Am 28.05.2014 12:38, schrieb Christian K?nig:
>
>> I already tried a similar patch as well, without any more noticeable
>> crashes. But going to give this another round with your patch and openarena.
>>
>> Thanks,
>> Christian.
>>
>> Am 27.05.2014 23:55, schrieb Marek Ol??k:
>>>
>>> Hi Christian,
>>>
>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>> fixed yet. They are very rare and very random. Therefore, I have come
>>> up with a patch which evicts page tables between IBs. See the
>>> attachment. With that patch applied, the system starts fine, compiz
>>> and glxgears work, but once I start playing openarena, it locks up
>>> pretty quickly.
>>>
>>> The patch shouldn't do anything in theory, because pages are moved
>>> back to VRAM immediately after that. However, the VRAM address of page
>>> tables may end up being different from before, which might be the root
>>> cause.
>>>
>>> Marek
>>>
>>> On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
>>>  wrote:

 Crap, any chance you can narrow it down a bit more?

 I've just tried a piglit quick test on my Bonaire and it seems to work
 perfectly fine.

 What hw do you test on?

 Regards,
 Christian.

 Am 13.05.2014 23:21, schrieb Marek Ol??k:

> Hi Christian,
>
> Even though some regressions are fixed by these patches:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> and indeed, the texelFetch tests no longer hang, there is one more
> hang which needs to be fixed. :( All I know is the exact same commit
> causes it and it can only be reproduced by running whole piglit with
> concurrency enabled.
>
> My kernel git log:
>
> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
> (10 hours ago) 
> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
> hours ago) 
> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
> months ago) 
> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
> months ago) 
>
> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
> of the two fixes is the first bad commit.
>
> Marek
>
> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>>
>> Hi Christian,
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>
>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>> Author: Christian K?nig 
>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>
>>   drm/radeon: use normal BOs for the page tables v4
>>
>>   No need to make it more complicated than necessary,
>>   just allocate the page tables as normal BO and
>>   flush whenever the address change.
>>
>>   v2: update comments and function name
>>   v3: squash bug fixes, page directory and tables patch
>>   v4: rebased on Mareks changes
>>
>>   Signed-off-by: Christian K?nig 
>>
>>
>> Reverting the commit gives me a lot of merge conflicts.
>>
>> The simplest way to reproduce the hangs is to run piglit with these
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>> Any idea what is wrong with it?
>>
>> Thanks,
>>
>> Marek


>>
>


CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Christian König
Hi Marek & Alex,

I've found the issue why forcefully evicting page tables sometimes 
crashes the box.

Well this is a typical hexdump page table before it is moved to GART:
000117f000  02914061 
000117f008  02915061 
000117f010  02916061 
000117f018  02917061 
000117f020  02918061 

And it looks like this when it comes back:
0006102000   
*

Ideas? I don't really have an explanation for this. Moving buffers 
around otherwise seems to work perfectly fine.

Thanks,
Christian.

Am 28.05.2014 12:38, schrieb Christian K?nig:
> I already tried a similar patch as well, without any more noticeable 
> crashes. But going to give this another round with your patch and 
> openarena.
>
> Thanks,
> Christian.
>
> Am 27.05.2014 23:55, schrieb Marek Ol??k:
>> Hi Christian,
>>
>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>> fixed yet. They are very rare and very random. Therefore, I have come
>> up with a patch which evicts page tables between IBs. See the
>> attachment. With that patch applied, the system starts fine, compiz
>> and glxgears work, but once I start playing openarena, it locks up
>> pretty quickly.
>>
>> The patch shouldn't do anything in theory, because pages are moved
>> back to VRAM immediately after that. However, the VRAM address of page
>> tables may end up being different from before, which might be the root
>> cause.
>>
>> Marek
>>
>> On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
>>  wrote:
>>> Crap, any chance you can narrow it down a bit more?
>>>
>>> I've just tried a piglit quick test on my Bonaire and it seems to work
>>> perfectly fine.
>>>
>>> What hw do you test on?
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.05.2014 23:21, schrieb Marek Ol??k:
>>>
 Hi Christian,

 Even though some regressions are fixed by these patches:

 drm/radeon: fix page directory update size estimation
 drm/radeon: fix buffer placement under memory pressure v2

 and indeed, the texelFetch tests no longer hang, there is one more
 hang which needs to be fixed. :( All I know is the exact same commit
 causes it and it can only be reproduced by running whole piglit with
 concurrency enabled.

 My kernel git log:

 * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
 (10 hours ago) 
 * 3af91e5 - drm/radeon: fix page directory update size estimation (21
 hours ago) 
 * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
 months ago) 
 * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
 months ago) 

 fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
 of the two fixes is the first bad commit.

 Marek

 On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
> Hi Christian,
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>
> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
> Author: Christian K?nig 
> Date:   Thu Feb 20 13:42:17 2014 +0100
>
>   drm/radeon: use normal BOs for the page tables v4
>
>   No need to make it more complicated than necessary,
>   just allocate the page tables as normal BO and
>   flush whenever the address change.
>
>   v2: update comments and function name
>   v3: squash bug fixes, page directory and tables patch
>   v4: rebased on Mareks changes
>
>   Signed-off-by: Christian K?nig 
>
>
> Reverting the commit gives me a lot of merge conflicts.
>
> The simplest way to reproduce the hangs is to run piglit with these
> parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>
> Any idea what is wrong with it?
>
> Thanks,
>
> Marek
>>>
>



CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Alex Deucher
On Thu, May 29, 2014 at 12:30 PM, Christian K?nig
 wrote:
> Hi Marek & Alex,
>
> I've found the issue why forcefully evicting page tables sometimes crashes
> the box.
>
> Well this is a typical hexdump page table before it is moved to GART:
> 000117f000  02914061 
> 000117f008  02915061 
> 000117f010  02916061 
> 000117f018  02917061 
> 000117f020  02918061 
>
> And it looks like this when it comes back:
> 0006102000   
> *
>
> Ideas? I don't really have an explanation for this. Moving buffers around
> otherwise seems to work perfectly fine.

Nothing I can think of off hand.  Might be worth trying CP DMA rather
than SDMA for BO moves to see if we can narrow it down a bit more.
Might also try the other SDMA ring.

Alex

>
> Thanks,
> Christian.
>
> Am 28.05.2014 12:38, schrieb Christian K?nig:
>
>> I already tried a similar patch as well, without any more noticeable
>> crashes. But going to give this another round with your patch and openarena.
>>
>> Thanks,
>> Christian.
>>
>> Am 27.05.2014 23:55, schrieb Marek Ol??k:
>>>
>>> Hi Christian,
>>>
>>> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
>>> fixed yet. They are very rare and very random. Therefore, I have come
>>> up with a patch which evicts page tables between IBs. See the
>>> attachment. With that patch applied, the system starts fine, compiz
>>> and glxgears work, but once I start playing openarena, it locks up
>>> pretty quickly.
>>>
>>> The patch shouldn't do anything in theory, because pages are moved
>>> back to VRAM immediately after that. However, the VRAM address of page
>>> tables may end up being different from before, which might be the root
>>> cause.
>>>
>>> Marek
>>>
>>> On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
>>>  wrote:

 Crap, any chance you can narrow it down a bit more?

 I've just tried a piglit quick test on my Bonaire and it seems to work
 perfectly fine.

 What hw do you test on?

 Regards,
 Christian.

 Am 13.05.2014 23:21, schrieb Marek Ol??k:

> Hi Christian,
>
> Even though some regressions are fixed by these patches:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> and indeed, the texelFetch tests no longer hang, there is one more
> hang which needs to be fixed. :( All I know is the exact same commit
> causes it and it can only be reproduced by running whole piglit with
> concurrency enabled.
>
> My kernel git log:
>
> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
> (10 hours ago) 
> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
> hours ago) 
> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
> months ago) 
> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
> months ago) 
>
> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
> of the two fixes is the first bad commit.
>
> Marek
>
> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>>
>> Hi Christian,
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>
>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>> Author: Christian K?nig 
>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>
>>   drm/radeon: use normal BOs for the page tables v4
>>
>>   No need to make it more complicated than necessary,
>>   just allocate the page tables as normal BO and
>>   flush whenever the address change.
>>
>>   v2: update comments and function name
>>   v3: squash bug fixes, page directory and tables patch
>>   v4: rebased on Mareks changes
>>
>>   Signed-off-by: Christian K?nig 
>>
>>
>> Reverting the commit gives me a lot of merge conflicts.
>>
>> The simplest way to reproduce the hangs is to run piglit with these
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>> Any idea what is wrong with it?
>>
>> Thanks,
>>
>> Marek


>>
>


CIK hangs with kernel 3.15, bisected

2014-05-28 Thread Christian König
I already tried a similar patch as well, without any more noticeable 
crashes. But going to give this another round with your patch and openarena.

Thanks,
Christian.

Am 27.05.2014 23:55, schrieb Marek Ol??k:
> Hi Christian,
>
> I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
> fixed yet. They are very rare and very random. Therefore, I have come
> up with a patch which evicts page tables between IBs. See the
> attachment. With that patch applied, the system starts fine, compiz
> and glxgears work, but once I start playing openarena, it locks up
> pretty quickly.
>
> The patch shouldn't do anything in theory, because pages are moved
> back to VRAM immediately after that. However, the VRAM address of page
> tables may end up being different from before, which might be the root
> cause.
>
> Marek
>
> On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
>  wrote:
>> Crap, any chance you can narrow it down a bit more?
>>
>> I've just tried a piglit quick test on my Bonaire and it seems to work
>> perfectly fine.
>>
>> What hw do you test on?
>>
>> Regards,
>> Christian.
>>
>> Am 13.05.2014 23:21, schrieb Marek Ol??k:
>>
>>> Hi Christian,
>>>
>>> Even though some regressions are fixed by these patches:
>>>
>>> drm/radeon: fix page directory update size estimation
>>> drm/radeon: fix buffer placement under memory pressure v2
>>>
>>> and indeed, the texelFetch tests no longer hang, there is one more
>>> hang which needs to be fixed. :( All I know is the exact same commit
>>> causes it and it can only be reproduced by running whole piglit with
>>> concurrency enabled.
>>>
>>> My kernel git log:
>>>
>>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>>> (10 hours ago) 
>>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>>> hours ago) 
>>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>>> months ago) 
>>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>>> months ago) 
>>>
>>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>>> of the two fixes is the first bad commit.
>>>
>>> Marek
>>>
>>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
 Hi Christian,

 This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:

 commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
 Author: Christian K?nig 
 Date:   Thu Feb 20 13:42:17 2014 +0100

   drm/radeon: use normal BOs for the page tables v4

   No need to make it more complicated than necessary,
   just allocate the page tables as normal BO and
   flush whenever the address change.

   v2: update comments and function name
   v3: squash bug fixes, page directory and tables patch
   v4: rebased on Mareks changes

   Signed-off-by: Christian K?nig 


 Reverting the commit gives me a lot of merge conflicts.

 The simplest way to reproduce the hangs is to run piglit with these
 parameters:
 -t texelFetch.fs

 Some of the tests allocate a lot of MSAA textures and the tests also
 run in parallel, which creates a lot of memory pressure and probably
 causes buffer evictions.

 Any idea what is wrong with it?

 Thanks,

 Marek
>>



CIK hangs with kernel 3.15, bisected

2014-05-28 Thread Marek Olšák
Hi Christian,

I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not
fixed yet. They are very rare and very random. Therefore, I have come
up with a patch which evicts page tables between IBs. See the
attachment. With that patch applied, the system starts fine, compiz
and glxgears work, but once I start playing openarena, it locks up
pretty quickly.

The patch shouldn't do anything in theory, because pages are moved
back to VRAM immediately after that. However, the VRAM address of page
tables may end up being different from before, which might be the root
cause.

Marek

On Wed, May 14, 2014 at 2:11 PM, Christian K?nig
 wrote:
> Crap, any chance you can narrow it down a bit more?
>
> I've just tried a piglit quick test on my Bonaire and it seems to work
> perfectly fine.
>
> What hw do you test on?
>
> Regards,
> Christian.
>
> Am 13.05.2014 23:21, schrieb Marek Ol??k:
>
>> Hi Christian,
>>
>> Even though some regressions are fixed by these patches:
>>
>> drm/radeon: fix page directory update size estimation
>> drm/radeon: fix buffer placement under memory pressure v2
>>
>> and indeed, the texelFetch tests no longer hang, there is one more
>> hang which needs to be fixed. :( All I know is the exact same commit
>> causes it and it can only be reproduced by running whole piglit with
>> concurrency enabled.
>>
>> My kernel git log:
>>
>> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
>> (10 hours ago) 
>> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
>> hours ago) 
>> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
>> months ago) 
>> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
>> months ago) 
>>
>> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
>> of the two fixes is the first bad commit.
>>
>> Marek
>>
>> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>>>
>>> Hi Christian,
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>>
>>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>>> Author: Christian K?nig 
>>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>>
>>>  drm/radeon: use normal BOs for the page tables v4
>>>
>>>  No need to make it more complicated than necessary,
>>>  just allocate the page tables as normal BO and
>>>  flush whenever the address change.
>>>
>>>  v2: update comments and function name
>>>  v3: squash bug fixes, page directory and tables patch
>>>  v4: rebased on Mareks changes
>>>
>>>  Signed-off-by: Christian K?nig 
>>>
>>>
>>> Reverting the commit gives me a lot of merge conflicts.
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>> Any idea what is wrong with it?
>>>
>>> Thanks,
>>>
>>> Marek
>
>
-- next part --
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c 
b/drivers/gpu/drm/radeon/radeon_vm.c
index d9ab99f..365e36f 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -116,6 +116,19 @@ void radeon_vm_manager_fini(struct radeon_device *rdev)
rdev->vm_manager.enabled = false;
 }

+static void force_gtt(struct radeon_bo *bo)
+{
+   if (radeon_bo_reserve(bo, false))
+   return;
+
+   radeon_ttm_placement_from_domain(bo, RADEON_GEM_DOMAIN_GTT);
+
+   if (ttm_bo_validate(>tbo, >placement, true, false)) {
+   DRM_ERROR("failed to force a GTT placement\n");
+   }
+   radeon_bo_unreserve(bo);
+}
+
 /**
  * radeon_vm_get_bos - add the vm BOs to a validation list
  *
@@ -147,6 +160,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct 
radeon_device *rdev,
list[0].handle = 0;
list_add([0].tv.head, head);

+   force_gtt(vm->page_directory);
+
for (i = 0, idx = 1; i <= vm->max_pde_used; i++) {
if (!vm->page_tables[i].bo)
continue;
@@ -159,6 +174,8 @@ struct radeon_cs_reloc *radeon_vm_get_bos(struct 
radeon_device *rdev,
list[idx].tiling_flags = 0;
list[idx].handle = 0;
list_add([idx++].tv.head, head);
+
+   force_gtt(vm->page_tables[i].bo);
}

return list;


CIK hangs with kernel 3.15, bisected

2014-05-14 Thread Christian König
Crap, any chance you can narrow it down a bit more?

I've just tried a piglit quick test on my Bonaire and it seems to work 
perfectly fine.

What hw do you test on?

Regards,
Christian.

Am 13.05.2014 23:21, schrieb Marek Ol??k:
> Hi Christian,
>
> Even though some regressions are fixed by these patches:
>
> drm/radeon: fix page directory update size estimation
> drm/radeon: fix buffer placement under memory pressure v2
>
> and indeed, the texelFetch tests no longer hang, there is one more
> hang which needs to be fixed. :( All I know is the exact same commit
> causes it and it can only be reproduced by running whole piglit with
> concurrency enabled.
>
> My kernel git log:
>
> * 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
> (10 hours ago) 
> * 3af91e5 - drm/radeon: fix page directory update size estimation (21
> hours ago) 
> * 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
> months ago) 
> * fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
> months ago) 
>
> fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
> of the two fixes is the first bad commit.
>
> Marek
>
> On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
>> Hi Christian,
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>
>> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
>> Author: Christian K?nig 
>> Date:   Thu Feb 20 13:42:17 2014 +0100
>>
>>  drm/radeon: use normal BOs for the page tables v4
>>
>>  No need to make it more complicated than necessary,
>>  just allocate the page tables as normal BO and
>>  flush whenever the address change.
>>
>>  v2: update comments and function name
>>  v3: squash bug fixes, page directory and tables patch
>>  v4: rebased on Mareks changes
>>
>>  Signed-off-by: Christian K?nig 
>>
>>
>> Reverting the commit gives me a lot of merge conflicts.
>>
>> The simplest way to reproduce the hangs is to run piglit with these 
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>> Any idea what is wrong with it?
>>
>> Thanks,
>>
>> Marek



CIK hangs with kernel 3.15, bisected

2014-05-14 Thread Marek Olšák
Hi Christian,

Even though some regressions are fixed by these patches:

drm/radeon: fix page directory update size estimation
drm/radeon: fix buffer placement under memory pressure v2

and indeed, the texelFetch tests no longer hang, there is one more
hang which needs to be fixed. :( All I know is the exact same commit
causes it and it can only be reproduced by running whole piglit with
concurrency enabled.

My kernel git log:

* 2ba22c8 - drm/radeon: fix buffer placement under memory pressure v2
(10 hours ago) 
* 3af91e5 - drm/radeon: fix page directory update size estimation (21
hours ago) 
* 6d2f294 - drm/radeon: use normal BOs for the page tables v4 (2
months ago) 
* fa68834 - drm/radeon: further cleanup vm flushing & fencing (2
months ago) 

fa68834 doesn't hang, but 2ba22c8 hangs, which means 6d2f294 or either
of the two fixes is the first bad commit.

Marek

On Fri, May 9, 2014 at 8:03 PM, Marek Ol??k  wrote:
> Hi Christian,
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>
> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
> Author: Christian K?nig 
> Date:   Thu Feb 20 13:42:17 2014 +0100
>
> drm/radeon: use normal BOs for the page tables v4
>
> No need to make it more complicated than necessary,
> just allocate the page tables as normal BO and
> flush whenever the address change.
>
> v2: update comments and function name
> v3: squash bug fixes, page directory and tables patch
> v4: rebased on Mareks changes
>
> Signed-off-by: Christian K?nig 
>
>
> Reverting the commit gives me a lot of merge conflicts.
>
> The simplest way to reproduce the hangs is to run piglit with these 
> parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>
> Any idea what is wrong with it?
>
> Thanks,
>
> Marek


CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
I applied these two patches Christian sent to dri-devel:

drm/radeon: fix page directory update size estimation
drm/radeon: fix buffer placement under memory pressure v2

on top of torvalds's master branch.

Marek

On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy  wrote:
> On 13.05.2014 21:50, Marek Ol??k wrote:
>>
>> Hi Christian,
>>
>> The performance regression I saw with piglit seems to be fixed with
>> latest kernel git. It's difficult to bisect the kernel, because there
>> are only merges between 3.14 and 3.15 and the merged committs are
>> actually based on 3.14-rc1 and 3.14-rc4.
>>
>> All seems to be fine with your fixes.
>>
>
> Which fixes have you applied? There are quite a few pending patches on
> dri-devel, that aren't yet part of drm-fixes-3.15.
>
> Grigori
>
>
>> Marek
>>
>> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>>  wrote:
>>>
>>> Is the performance regression regression caused by the page table changes
>>> or
>>> something else?
>>>
>>> I did made some tests with xonotic while developing it and it didn't
>>> showed
>>> anything obvious, but I didn't made tests on different systems.
>>>
>>> Christian.
>>>
>>> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>>>
 Your latest patches fix the regression.

 The performance regression can also be reproduced with piglit "-t
 texelFetch.fs".

 Kernel 3.14:
  real0m17.724s
  user0m41.905s
  sys0m11.299s

 The problematic commit checked out + your fixes (without the PTE patch I
 think):
  real0m23.474s
  user1m1.008s
  sys0m13.812s

 Marek


 On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
  wrote:
>
>
> Am 13.05.2014 15:22, schrieb Alex Deucher:
>
>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
>> wrote:
>>>
>>>
>>> I can confirm this fixes it for me, too.
>>>
>>> 3.15 with these fixes and the large PTE patches actually ends up
>>> being
>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>> what's
>>> going on.
>>
>>
>> Allocation overhead?
>
>
>
> Unlikely, Xonotic just allocates a single page table at start, which
> then
> gets extended to a certain rate until they no longer need more address
> space
> and are done with it.
>
> Grigori, can you bisect and/or try to figure out what's wrong here?
>
> Christian.
>
>
>>
>>> Grigori
>>>
>>>
>>> On 12.05.2014 14:50, Christian K?nig wrote:


 I could reproduce the problem with xonotic and I think I've found
 the
 issue.

 Please test the attached patch.

 Thanks,
 Christian.

 Am 11.05.2014 11:06, schrieb Christian K?nig:
>>
>>
>> I have tested it and it doesn't fix the hangs.
>
>
> Yeah, thought so. Well it was just a guess.
>
>> (Also, I don't like the patch, because it reverts the behavior I
>> added
>> for userspace buffers.)
>
>
> Actually it shouldn't affect that. The alternative domain always
> contains GART even when userspace only specified VRAM as placement
> (as
> long as it is technical possible to do so).
>
> So what should happen is that TTM sees the current placement,
> matches
> that with the desired placement and should find that it doesn't
> need
> to move the buffer (we should just test if this behavior really
> works
> as expected).
>
> Christian.
>
> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>>
>>
>> Hi Christian,
>>
>> I have tested it and it doesn't fix the hangs.
>>
>> (Also, I don't like the patch, because it reverts the behavior I
>> added
>> for userspace buffers.)
>>
>> Marek
>>
>>
>>
>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>  wrote:
>>>
>>>
>>> Couldn't reproduce the issue so far. So the attached patch is
>>> just
>>> a
>>> complete shoot into the dark found by rereading the code, but it
>>> might
>>> actually be the problem.
>>>
>>> Please give it a try.
>>>
>>> Going to keep testing in the meantime,
>>> Christian.
>>>
>>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
> if
> I boot
> with radeon.vramlimit=256 and then run Xonotic timedemo with
> high
> settings.
> I haven't had a chance to bisect it yet, but it might be a
> 

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Grigori Goronzy
On 13.05.2014 21:50, Marek Ol??k wrote:
> Hi Christian,
>
> The performance regression I saw with piglit seems to be fixed with
> latest kernel git. It's difficult to bisect the kernel, because there
> are only merges between 3.14 and 3.15 and the merged committs are
> actually based on 3.14-rc1 and 3.14-rc4.
>
> All seems to be fine with your fixes.
>

Which fixes have you applied? There are quite a few pending patches on 
dri-devel, that aren't yet part of drm-fixes-3.15.

Grigori

> Marek
>
> On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
>  wrote:
>> Is the performance regression regression caused by the page table changes or
>> something else?
>>
>> I did made some tests with xonotic while developing it and it didn't showed
>> anything obvious, but I didn't made tests on different systems.
>>
>> Christian.
>>
>> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>>
>>> Your latest patches fix the regression.
>>>
>>> The performance regression can also be reproduced with piglit "-t
>>> texelFetch.fs".
>>>
>>> Kernel 3.14:
>>>  real0m17.724s
>>>  user0m41.905s
>>>  sys0m11.299s
>>>
>>> The problematic commit checked out + your fixes (without the PTE patch I
>>> think):
>>>  real0m23.474s
>>>  user1m1.008s
>>>  sys0m13.812s
>>>
>>> Marek
>>>
>>>
>>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>>>  wrote:

 Am 13.05.2014 15:22, schrieb Alex Deucher:

> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
> wrote:
>>
>> I can confirm this fixes it for me, too.
>>
>> 3.15 with these fixes and the large PTE patches actually ends up being
>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>> what's
>> going on.
>
> Allocation overhead?


 Unlikely, Xonotic just allocates a single page table at start, which then
 gets extended to a certain rate until they no longer need more address
 space
 and are done with it.

 Grigori, can you bisect and/or try to figure out what's wrong here?

 Christian.


>
>> Grigori
>>
>>
>> On 12.05.2014 14:50, Christian K?nig wrote:
>>>
>>> I could reproduce the problem with xonotic and I think I've found the
>>> issue.
>>>
>>> Please test the attached patch.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 11.05.2014 11:06, schrieb Christian K?nig:
>
> I have tested it and it doesn't fix the hangs.

 Yeah, thought so. Well it was just a guess.

> (Also, I don't like the patch, because it reverts the behavior I
> added
> for userspace buffers.)

 Actually it shouldn't affect that. The alternative domain always
 contains GART even when userspace only specified VRAM as placement
 (as
 long as it is technical possible to do so).

 So what should happen is that TTM sees the current placement, matches
 that with the desired placement and should find that it doesn't need
 to move the buffer (we should just test if this behavior really works
 as expected).

 Christian.

 Am 10.05.2014 23:38, schrieb Marek Ol??k:
>
> Hi Christian,
>
> I have tested it and it doesn't fix the hangs.
>
> (Also, I don't like the patch, because it reverts the behavior I
> added
> for userspace buffers.)
>
> Marek
>
>
>
> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>  wrote:
>>
>> Couldn't reproduce the issue so far. So the attached patch is just
>> a
>> complete shoot into the dark found by rereading the code, but it
>> might
>> actually be the problem.
>>
>> Please give it a try.
>>
>> Going to keep testing in the meantime,
>> Christian.
>>
>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>
 I see hangs with kernel 3.15 and SI under memory pressure, e.g.
 if
 I boot
 with radeon.vramlimit=256 and then run Xonotic timedemo with high
 settings.
 I haven't had a chance to bisect it yet, but it might be a
 similar
 problem.
>>>
>>> Sounds like the same issue to me. Thx for the good test case.
>>>
 Any idea what is wrong with it?
>>>
>>> Actually I already wondered that it went so smooth without any
>>> regression
>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>
 Some of the tests allocate a lot of MSAA textures and the tests
 also
 run in parallel, which creates a lot of memory pressure and
 probably
 causes buffer evictions.
>>>

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
Hi Christian,

The performance regression I saw with piglit seems to be fixed with
latest kernel git. It's difficult to bisect the kernel, because there
are only merges between 3.14 and 3.15 and the merged committs are
actually based on 3.14-rc1 and 3.14-rc4.

All seems to be fine with your fixes.

Marek

On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
 wrote:
> Is the performance regression regression caused by the page table changes or
> something else?
>
> I did made some tests with xonotic while developing it and it didn't showed
> anything obvious, but I didn't made tests on different systems.
>
> Christian.
>
> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>
>> Your latest patches fix the regression.
>>
>> The performance regression can also be reproduced with piglit "-t
>> texelFetch.fs".
>>
>> Kernel 3.14:
>> real0m17.724s
>> user0m41.905s
>> sys0m11.299s
>>
>> The problematic commit checked out + your fixes (without the PTE patch I
>> think):
>> real0m23.474s
>> user1m1.008s
>> sys0m13.812s
>>
>> Marek
>>
>>
>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>>  wrote:
>>>
>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>
 On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
 wrote:
>
> I can confirm this fixes it for me, too.
>
> 3.15 with these fixes and the large PTE patches actually ends up being
> noticeably slower than earlier kernels with Xonotic, though. I wonder
> what's
> going on.

 Allocation overhead?
>>>
>>>
>>> Unlikely, Xonotic just allocates a single page table at start, which then
>>> gets extended to a certain rate until they no longer need more address
>>> space
>>> and are done with it.
>>>
>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>
>>> Christian.
>>>
>>>

> Grigori
>
>
> On 12.05.2014 14:50, Christian K?nig wrote:
>>
>> I could reproduce the problem with xonotic and I think I've found the
>> issue.
>>
>> Please test the attached patch.
>>
>> Thanks,
>> Christian.
>>
>> Am 11.05.2014 11:06, schrieb Christian K?nig:

 I have tested it and it doesn't fix the hangs.
>>>
>>> Yeah, thought so. Well it was just a guess.
>>>
 (Also, I don't like the patch, because it reverts the behavior I
 added
 for userspace buffers.)
>>>
>>> Actually it shouldn't affect that. The alternative domain always
>>> contains GART even when userspace only specified VRAM as placement
>>> (as
>>> long as it is technical possible to do so).
>>>
>>> So what should happen is that TTM sees the current placement, matches
>>> that with the desired placement and should find that it doesn't need
>>> to move the buffer (we should just test if this behavior really works
>>> as expected).
>>>
>>> Christian.
>>>
>>> Am 10.05.2014 23:38, schrieb Marek Ol??k:

 Hi Christian,

 I have tested it and it doesn't fix the hangs.

 (Also, I don't like the patch, because it reverts the behavior I
 added
 for userspace buffers.)

 Marek



 On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
  wrote:
>
> Couldn't reproduce the issue so far. So the attached patch is just
> a
> complete shoot into the dark found by rereading the code, but it
> might
> actually be the problem.
>
> Please give it a try.
>
> Going to keep testing in the meantime,
> Christian.
>
> Am 10.05.2014 10:23, schrieb Christian K?nig:
>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>> if
>>> I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>> settings.
>>> I haven't had a chance to bisect it yet, but it might be a
>>> similar
>>> problem.
>>
>> Sounds like the same issue to me. Thx for the good test case.
>>
>>> Any idea what is wrong with it?
>>
>> Actually I already wondered that it went so smooth without any
>> regression
>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>
>>> Some of the tests allocate a lot of MSAA textures and the tests
>>> also
>>> run in parallel, which creates a lot of memory pressure and
>>> probably
>>> causes buffer evictions.
>>
>> Sounds like the underlying problem to me. We probably evict some
>> part of a
>> page table without updating the page directory. Going to dig into
>> it today,
>> it's probably just a one liner missing somewhere in the VM code.
>>
>> Christian.
>>
>> Am 09.05.2014 23:39, schrieb 

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
I think it's caused by something else. I'll continue testing and bisecting.

Marek

On Tue, May 13, 2014 at 5:31 PM, Christian K?nig
 wrote:
> Is the performance regression regression caused by the page table changes or
> something else?
>
> I did made some tests with xonotic while developing it and it didn't showed
> anything obvious, but I didn't made tests on different systems.
>
> Christian.
>
> Am 13.05.2014 17:19, schrieb Marek Ol??k:
>
>> Your latest patches fix the regression.
>>
>> The performance regression can also be reproduced with piglit "-t
>> texelFetch.fs".
>>
>> Kernel 3.14:
>> real0m17.724s
>> user0m41.905s
>> sys0m11.299s
>>
>> The problematic commit checked out + your fixes (without the PTE patch I
>> think):
>> real0m23.474s
>> user1m1.008s
>> sys0m13.812s
>>
>> Marek
>>
>>
>> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>>  wrote:
>>>
>>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>>
 On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
 wrote:
>
> I can confirm this fixes it for me, too.
>
> 3.15 with these fixes and the large PTE patches actually ends up being
> noticeably slower than earlier kernels with Xonotic, though. I wonder
> what's
> going on.

 Allocation overhead?
>>>
>>>
>>> Unlikely, Xonotic just allocates a single page table at start, which then
>>> gets extended to a certain rate until they no longer need more address
>>> space
>>> and are done with it.
>>>
>>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>>
>>> Christian.
>>>
>>>

> Grigori
>
>
> On 12.05.2014 14:50, Christian K?nig wrote:
>>
>> I could reproduce the problem with xonotic and I think I've found the
>> issue.
>>
>> Please test the attached patch.
>>
>> Thanks,
>> Christian.
>>
>> Am 11.05.2014 11:06, schrieb Christian K?nig:

 I have tested it and it doesn't fix the hangs.
>>>
>>> Yeah, thought so. Well it was just a guess.
>>>
 (Also, I don't like the patch, because it reverts the behavior I
 added
 for userspace buffers.)
>>>
>>> Actually it shouldn't affect that. The alternative domain always
>>> contains GART even when userspace only specified VRAM as placement
>>> (as
>>> long as it is technical possible to do so).
>>>
>>> So what should happen is that TTM sees the current placement, matches
>>> that with the desired placement and should find that it doesn't need
>>> to move the buffer (we should just test if this behavior really works
>>> as expected).
>>>
>>> Christian.
>>>
>>> Am 10.05.2014 23:38, schrieb Marek Ol??k:

 Hi Christian,

 I have tested it and it doesn't fix the hangs.

 (Also, I don't like the patch, because it reverts the behavior I
 added
 for userspace buffers.)

 Marek



 On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
  wrote:
>
> Couldn't reproduce the issue so far. So the attached patch is just
> a
> complete shoot into the dark found by rereading the code, but it
> might
> actually be the problem.
>
> Please give it a try.
>
> Going to keep testing in the meantime,
> Christian.
>
> Am 10.05.2014 10:23, schrieb Christian K?nig:
>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g.
>>> if
>>> I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>> settings.
>>> I haven't had a chance to bisect it yet, but it might be a
>>> similar
>>> problem.
>>
>> Sounds like the same issue to me. Thx for the good test case.
>>
>>> Any idea what is wrong with it?
>>
>> Actually I already wondered that it went so smooth without any
>> regression
>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>
>>> Some of the tests allocate a lot of MSAA textures and the tests
>>> also
>>> run in parallel, which creates a lot of memory pressure and
>>> probably
>>> causes buffer evictions.
>>
>> Sounds like the underlying problem to me. We probably evict some
>> part of a
>> page table without updating the page directory. Going to dig into
>> it today,
>> it's probably just a one liner missing somewhere in the VM code.
>>
>> Christian.
>>
>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>
>>> On 09.05.2014 20:03, Marek Ol??k wrote:


 This commit which first appeared in 3.15-rc1 causes hangs on
 Bonaire:
 [...]

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Christian König
Is the performance regression regression caused by the page table 
changes or something else?

I did made some tests with xonotic while developing it and it didn't 
showed anything obvious, but I didn't made tests on different systems.

Christian.

Am 13.05.2014 17:19, schrieb Marek Ol??k:
> Your latest patches fix the regression.
>
> The performance regression can also be reproduced with piglit "-t
> texelFetch.fs".
>
> Kernel 3.14:
> real0m17.724s
> user0m41.905s
> sys0m11.299s
>
> The problematic commit checked out + your fixes (without the PTE patch I 
> think):
> real0m23.474s
> user1m1.008s
> sys0m13.812s
>
> Marek
>
>
> On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
>  wrote:
>> Am 13.05.2014 15:22, schrieb Alex Deucher:
>>
>>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
>>> wrote:
 I can confirm this fixes it for me, too.

 3.15 with these fixes and the large PTE patches actually ends up being
 noticeably slower than earlier kernels with Xonotic, though. I wonder
 what's
 going on.
>>> Allocation overhead?
>>
>> Unlikely, Xonotic just allocates a single page table at start, which then
>> gets extended to a certain rate until they no longer need more address space
>> and are done with it.
>>
>> Grigori, can you bisect and/or try to figure out what's wrong here?
>>
>> Christian.
>>
>>
>>>
 Grigori


 On 12.05.2014 14:50, Christian K?nig wrote:
> I could reproduce the problem with xonotic and I think I've found the
> issue.
>
> Please test the attached patch.
>
> Thanks,
> Christian.
>
> Am 11.05.2014 11:06, schrieb Christian K?nig:
>>> I have tested it and it doesn't fix the hangs.
>> Yeah, thought so. Well it was just a guess.
>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>> Actually it shouldn't affect that. The alternative domain always
>> contains GART even when userspace only specified VRAM as placement (as
>> long as it is technical possible to do so).
>>
>> So what should happen is that TTM sees the current placement, matches
>> that with the desired placement and should find that it doesn't need
>> to move the buffer (we should just test if this behavior really works
>> as expected).
>>
>> Christian.
>>
>> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>>> Hi Christian,
>>>
>>> I have tested it and it doesn't fix the hangs.
>>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>>>
>>> Marek
>>>
>>>
>>>
>>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>>  wrote:
 Couldn't reproduce the issue so far. So the attached patch is just a
 complete shoot into the dark found by rereading the code, but it
 might
 actually be the problem.

 Please give it a try.

 Going to keep testing in the meantime,
 Christian.

 Am 10.05.2014 10:23, schrieb Christian K?nig:

>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>> I boot
>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>> settings.
>> I haven't had a chance to bisect it yet, but it might be a similar
>> problem.
> Sounds like the same issue to me. Thx for the good test case.
>
>> Any idea what is wrong with it?
> Actually I already wondered that it went so smooth without any
> regression
> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>
>> Some of the tests allocate a lot of MSAA textures and the tests
>> also
>> run in parallel, which creates a lot of memory pressure and
>> probably
>> causes buffer evictions.
> Sounds like the underlying problem to me. We probably evict some
> part of a
> page table without updating the page directory. Going to dig into
> it today,
> it's probably just a one liner missing somewhere in the VM code.
>
> Christian.
>
> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>> On 09.05.2014 20:03, Marek Ol??k wrote:
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>> Bonaire:
>>> [...]
>>>
>>> The simplest way to reproduce the hangs is to run piglit with
>>> these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests
>>> also
>>> run in parallel, which creates a lot of memory pressure and
>>> probably
>>> causes buffer evictions.
>>>
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. 

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
Your latest patches fix the regression.

The performance regression can also be reproduced with piglit "-t
texelFetch.fs".

Kernel 3.14:
   real0m17.724s
   user0m41.905s
   sys0m11.299s

The problematic commit checked out + your fixes (without the PTE patch I think):
   real0m23.474s
   user1m1.008s
   sys0m13.812s

Marek


On Tue, May 13, 2014 at 3:57 PM, Christian K?nig
 wrote:
> Am 13.05.2014 15:22, schrieb Alex Deucher:
>
>> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy 
>> wrote:
>>>
>>> I can confirm this fixes it for me, too.
>>>
>>> 3.15 with these fixes and the large PTE patches actually ends up being
>>> noticeably slower than earlier kernels with Xonotic, though. I wonder
>>> what's
>>> going on.
>>
>> Allocation overhead?
>
>
> Unlikely, Xonotic just allocates a single page table at start, which then
> gets extended to a certain rate until they no longer need more address space
> and are done with it.
>
> Grigori, can you bisect and/or try to figure out what's wrong here?
>
> Christian.
>
>
>>
>>
>>> Grigori
>>>
>>>
>>> On 12.05.2014 14:50, Christian K?nig wrote:

 I could reproduce the problem with xonotic and I think I've found the
 issue.

 Please test the attached patch.

 Thanks,
 Christian.

 Am 11.05.2014 11:06, schrieb Christian K?nig:
>>
>> I have tested it and it doesn't fix the hangs.
>
> Yeah, thought so. Well it was just a guess.
>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
>
> Actually it shouldn't affect that. The alternative domain always
> contains GART even when userspace only specified VRAM as placement (as
> long as it is technical possible to do so).
>
> So what should happen is that TTM sees the current placement, matches
> that with the desired placement and should find that it doesn't need
> to move the buffer (we should just test if this behavior really works
> as expected).
>
> Christian.
>
> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>>
>> Hi Christian,
>>
>> I have tested it and it doesn't fix the hangs.
>>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
>>
>> Marek
>>
>>
>>
>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>  wrote:
>>>
>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>> complete shoot into the dark found by rereading the code, but it
>>> might
>>> actually be the problem.
>>>
>>> Please give it a try.
>>>
>>> Going to keep testing in the meantime,
>>> Christian.
>>>
>>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
> I boot
> with radeon.vramlimit=256 and then run Xonotic timedemo with high
> settings.
> I haven't had a chance to bisect it yet, but it might be a similar
> problem.

 Sounds like the same issue to me. Thx for the good test case.

> Any idea what is wrong with it?

 Actually I already wondered that it went so smooth without any
 regression
 so far, didn't noticed the bug in bugzilla.kernel.org yet.

> Some of the tests allocate a lot of MSAA textures and the tests
> also
> run in parallel, which creates a lot of memory pressure and
> probably
> causes buffer evictions.

 Sounds like the underlying problem to me. We probably evict some
 part of a
 page table without updating the page directory. Going to dig into
 it today,
 it's probably just a one liner missing somewhere in the VM code.

 Christian.

 Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>
> On 09.05.2014 20:03, Marek Ol??k wrote:
>>
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on
>> Bonaire:
>> [...]
>>
>> The simplest way to reproduce the hangs is to run piglit with
>> these
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests
>> also
>> run in parallel, which creates a lot of memory pressure and
>> probably
>> causes buffer evictions.
>>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
> I boot
> with radeon.vramlimit=256 and then run Xonotic timedemo with high
> settings.
> I haven't had a chance to bisect it yet, but it might be a similar
> problem.
>
> Grigori


>>> ___
>>> dri-devel mailing list
>>> dri-devel at 

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Christian König
Am 13.05.2014 15:22, schrieb Alex Deucher:
> On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy  wrote:
>> I can confirm this fixes it for me, too.
>>
>> 3.15 with these fixes and the large PTE patches actually ends up being
>> noticeably slower than earlier kernels with Xonotic, though. I wonder what's
>> going on.
> Allocation overhead?

Unlikely, Xonotic just allocates a single page table at start, which 
then gets extended to a certain rate until they no longer need more 
address space and are done with it.

Grigori, can you bisect and/or try to figure out what's wrong here?

Christian.

>
>
>> Grigori
>>
>>
>> On 12.05.2014 14:50, Christian K?nig wrote:
>>> I could reproduce the problem with xonotic and I think I've found the
>>> issue.
>>>
>>> Please test the attached patch.
>>>
>>> Thanks,
>>> Christian.
>>>
>>> Am 11.05.2014 11:06, schrieb Christian K?nig:
> I have tested it and it doesn't fix the hangs.
 Yeah, thought so. Well it was just a guess.

> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
 Actually it shouldn't affect that. The alternative domain always
 contains GART even when userspace only specified VRAM as placement (as
 long as it is technical possible to do so).

 So what should happen is that TTM sees the current placement, matches
 that with the desired placement and should find that it doesn't need
 to move the buffer (we should just test if this behavior really works
 as expected).

 Christian.

 Am 10.05.2014 23:38, schrieb Marek Ol??k:
> Hi Christian,
>
> I have tested it and it doesn't fix the hangs.
>
> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
>
> Marek
>
>
>
> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>  wrote:
>> Couldn't reproduce the issue so far. So the attached patch is just a
>> complete shoot into the dark found by rereading the code, but it might
>> actually be the problem.
>>
>> Please give it a try.
>>
>> Going to keep testing in the meantime,
>> Christian.
>>
>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>
 I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
 I boot
 with radeon.vramlimit=256 and then run Xonotic timedemo with high
 settings.
 I haven't had a chance to bisect it yet, but it might be a similar
 problem.
>>> Sounds like the same issue to me. Thx for the good test case.
>>>
 Any idea what is wrong with it?
>>> Actually I already wondered that it went so smooth without any
>>> regression
>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>
 Some of the tests allocate a lot of MSAA textures and the tests also
 run in parallel, which creates a lot of memory pressure and probably
 causes buffer evictions.
>>> Sounds like the underlying problem to me. We probably evict some
>>> part of a
>>> page table without updating the page directory. Going to dig into
>>> it today,
>>> it's probably just a one liner missing somewhere in the VM code.
>>>
>>> Christian.
>>>
>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
 On 09.05.2014 20:03, Marek Ol??k wrote:
>
> This commit which first appeared in 3.15-rc1 causes hangs on
> Bonaire:
> [...]
>
> The simplest way to reproduce the hangs is to run piglit with these
> parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>
 I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
 I boot
 with radeon.vramlimit=256 and then run Xonotic timedemo with high
 settings.
 I haven't had a chance to bisect it yet, but it might be a similar
 problem.

 Grigori
>>>
>> ___
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel



CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Alex Deucher
On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy  wrote:
> I can confirm this fixes it for me, too.
>
> 3.15 with these fixes and the large PTE patches actually ends up being
> noticeably slower than earlier kernels with Xonotic, though. I wonder what's
> going on.

Allocation overhead?


>
> Grigori
>
>
> On 12.05.2014 14:50, Christian K?nig wrote:
>>
>> I could reproduce the problem with xonotic and I think I've found the
>> issue.
>>
>> Please test the attached patch.
>>
>> Thanks,
>> Christian.
>>
>> Am 11.05.2014 11:06, schrieb Christian K?nig:

 I have tested it and it doesn't fix the hangs.
>>>
>>> Yeah, thought so. Well it was just a guess.
>>>
 (Also, I don't like the patch, because it reverts the behavior I added
 for userspace buffers.)
>>>
>>> Actually it shouldn't affect that. The alternative domain always
>>> contains GART even when userspace only specified VRAM as placement (as
>>> long as it is technical possible to do so).
>>>
>>> So what should happen is that TTM sees the current placement, matches
>>> that with the desired placement and should find that it doesn't need
>>> to move the buffer (we should just test if this behavior really works
>>> as expected).
>>>
>>> Christian.
>>>
>>> Am 10.05.2014 23:38, schrieb Marek Ol??k:

 Hi Christian,

 I have tested it and it doesn't fix the hangs.

 (Also, I don't like the patch, because it reverts the behavior I added
 for userspace buffers.)

 Marek



 On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
  wrote:
>
> Couldn't reproduce the issue so far. So the attached patch is just a
> complete shoot into the dark found by rereading the code, but it might
> actually be the problem.
>
> Please give it a try.
>
> Going to keep testing in the meantime,
> Christian.
>
> Am 10.05.2014 10:23, schrieb Christian K?nig:
>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>> I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>> settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar
>>> problem.
>>
>> Sounds like the same issue to me. Thx for the good test case.
>>
>>> Any idea what is wrong with it?
>>
>> Actually I already wondered that it went so smooth without any
>> regression
>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>
>> Sounds like the underlying problem to me. We probably evict some
>> part of a
>> page table without updating the page directory. Going to dig into
>> it today,
>> it's probably just a one liner missing somewhere in the VM code.
>>
>> Christian.
>>
>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>
>>> On 09.05.2014 20:03, Marek Ol??k wrote:


 This commit which first appeared in 3.15-rc1 causes hangs on
 Bonaire:
 [...]

 The simplest way to reproduce the hangs is to run piglit with these
 parameters:
 -t texelFetch.fs

 Some of the tests allocate a lot of MSAA textures and the tests also
 run in parallel, which creates a lot of memory pressure and probably
 causes buffer evictions.

>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>>> I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>>> settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar
>>> problem.
>>>
>>> Grigori
>>
>>
>>>
>>
>
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Grigori Goronzy
I can confirm this fixes it for me, too.

3.15 with these fixes and the large PTE patches actually ends up being 
noticeably slower than earlier kernels with Xonotic, though. I wonder 
what's going on.

Grigori

On 12.05.2014 14:50, Christian K?nig wrote:
> I could reproduce the problem with xonotic and I think I've found the
> issue.
>
> Please test the attached patch.
>
> Thanks,
> Christian.
>
> Am 11.05.2014 11:06, schrieb Christian K?nig:
>>> I have tested it and it doesn't fix the hangs.
>> Yeah, thought so. Well it was just a guess.
>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>> Actually it shouldn't affect that. The alternative domain always
>> contains GART even when userspace only specified VRAM as placement (as
>> long as it is technical possible to do so).
>>
>> So what should happen is that TTM sees the current placement, matches
>> that with the desired placement and should find that it doesn't need
>> to move the buffer (we should just test if this behavior really works
>> as expected).
>>
>> Christian.
>>
>> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>>> Hi Christian,
>>>
>>> I have tested it and it doesn't fix the hangs.
>>>
>>> (Also, I don't like the patch, because it reverts the behavior I added
>>> for userspace buffers.)
>>>
>>> Marek
>>>
>>>
>>>
>>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>>  wrote:
 Couldn't reproduce the issue so far. So the attached patch is just a
 complete shoot into the dark found by rereading the code, but it might
 actually be the problem.

 Please give it a try.

 Going to keep testing in the meantime,
 Christian.

 Am 10.05.2014 10:23, schrieb Christian K?nig:

>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>> I boot
>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>> settings.
>> I haven't had a chance to bisect it yet, but it might be a similar
>> problem.
> Sounds like the same issue to me. Thx for the good test case.
>
>> Any idea what is wrong with it?
> Actually I already wondered that it went so smooth without any
> regression
> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
> Sounds like the underlying problem to me. We probably evict some
> part of a
> page table without updating the page directory. Going to dig into
> it today,
> it's probably just a one liner missing somewhere in the VM code.
>
> Christian.
>
> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>> On 09.05.2014 20:03, Marek Ol??k wrote:
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on
>>> Bonaire:
>>> [...]
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if
>> I boot
>> with radeon.vramlimit=256 and then run Xonotic timedemo with high
>> settings.
>> I haven't had a chance to bisect it yet, but it might be a similar
>> problem.
>>
>> Grigori
>
>>
>



CIK hangs with kernel 3.15, bisected

2014-05-12 Thread Christian König
I could reproduce the problem with xonotic and I think I've found the issue.

Please test the attached patch.

Thanks,
Christian.

Am 11.05.2014 11:06, schrieb Christian K?nig:
>> I have tested it and it doesn't fix the hangs.
> Yeah, thought so. Well it was just a guess.
>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
> Actually it shouldn't affect that. The alternative domain always 
> contains GART even when userspace only specified VRAM as placement (as 
> long as it is technical possible to do so).
>
> So what should happen is that TTM sees the current placement, matches 
> that with the desired placement and should find that it doesn't need 
> to move the buffer (we should just test if this behavior really works 
> as expected).
>
> Christian.
>
> Am 10.05.2014 23:38, schrieb Marek Ol??k:
>> Hi Christian,
>>
>> I have tested it and it doesn't fix the hangs.
>>
>> (Also, I don't like the patch, because it reverts the behavior I added
>> for userspace buffers.)
>>
>> Marek
>>
>>
>>
>> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>>  wrote:
>>> Couldn't reproduce the issue so far. So the attached patch is just a
>>> complete shoot into the dark found by rereading the code, but it might
>>> actually be the problem.
>>>
>>> Please give it a try.
>>>
>>> Going to keep testing in the meantime,
>>> Christian.
>>>
>>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if 
> I boot
> with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings.
> I haven't had a chance to bisect it yet, but it might be a similar 
> problem.
 Sounds like the same issue to me. Thx for the good test case.

> Any idea what is wrong with it?
 Actually I already wondered that it went so smooth without any 
 regression
 so far, didn't noticed the bug in bugzilla.kernel.org yet.

> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
 Sounds like the underlying problem to me. We probably evict some 
 part of a
 page table without updating the page directory. Going to dig into 
 it today,
 it's probably just a one liner missing somewhere in the VM code.

 Christian.

 Am 09.05.2014 23:39, schrieb Grigori Goronzy:
> On 09.05.2014 20:03, Marek Ol??k wrote:
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on 
>> Bonaire:
>> [...]
>>
>> The simplest way to reproduce the hangs is to run piglit with these
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if 
> I boot
> with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings.
> I haven't had a chance to bisect it yet, but it might be a similar 
> problem.
>
> Grigori

>

-- next part --
A non-text attachment was scrubbed...
Name: 0001-drm-radeon-fix-page-directory-update-size-estimation.patch
Type: text/x-diff
Size: 986 bytes
Desc: not available
URL: 



CIK hangs with kernel 3.15, bisected

2014-05-11 Thread Christian König
> I have tested it and it doesn't fix the hangs.
Yeah, thought so. Well it was just a guess.

> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
Actually it shouldn't affect that. The alternative domain always 
contains GART even when userspace only specified VRAM as placement (as 
long as it is technical possible to do so).

So what should happen is that TTM sees the current placement, matches 
that with the desired placement and should find that it doesn't need to 
move the buffer (we should just test if this behavior really works as 
expected).

Christian.

Am 10.05.2014 23:38, schrieb Marek Ol??k:
> Hi Christian,
>
> I have tested it and it doesn't fix the hangs.
>
> (Also, I don't like the patch, because it reverts the behavior I added
> for userspace buffers.)
>
> Marek
>
>
>
> On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
>  wrote:
>> Couldn't reproduce the issue so far. So the attached patch is just a
>> complete shoot into the dark found by rereading the code, but it might
>> actually be the problem.
>>
>> Please give it a try.
>>
>> Going to keep testing in the meantime,
>> Christian.
>>
>> Am 10.05.2014 10:23, schrieb Christian K?nig:
>>
 I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
 with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
 I haven't had a chance to bisect it yet, but it might be a similar problem.
>>> Sounds like the same issue to me. Thx for the good test case.
>>>
 Any idea what is wrong with it?
>>> Actually I already wondered that it went so smooth without any regression
>>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>>
 Some of the tests allocate a lot of MSAA textures and the tests also
 run in parallel, which creates a lot of memory pressure and probably
 causes buffer evictions.
>>> Sounds like the underlying problem to me. We probably evict some part of a
>>> page table without updating the page directory. Going to dig into it today,
>>> it's probably just a one liner missing somewhere in the VM code.
>>>
>>> Christian.
>>>
>>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
 On 09.05.2014 20:03, Marek Ol??k wrote:
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
> [...]
>
> The simplest way to reproduce the hangs is to run piglit with these
> parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>
 I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
 with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
 I haven't had a chance to bisect it yet, but it might be a similar problem.

 Grigori
>>>



CIK hangs with kernel 3.15, bisected

2014-05-11 Thread Marek Olšák
Hi Christian,

I have tested it and it doesn't fix the hangs.

(Also, I don't like the patch, because it reverts the behavior I added
for userspace buffers.)

Marek



On Sat, May 10, 2014 at 6:34 PM, Christian K?nig
 wrote:
> Couldn't reproduce the issue so far. So the attached patch is just a
> complete shoot into the dark found by rereading the code, but it might
> actually be the problem.
>
> Please give it a try.
>
> Going to keep testing in the meantime,
> Christian.
>
> Am 10.05.2014 10:23, schrieb Christian K?nig:
>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>
>> Sounds like the same issue to me. Thx for the good test case.
>>
>>> Any idea what is wrong with it?
>>
>> Actually I already wondered that it went so smooth without any regression
>> so far, didn't noticed the bug in bugzilla.kernel.org yet.
>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>
>> Sounds like the underlying problem to me. We probably evict some part of a
>> page table without updating the page directory. Going to dig into it today,
>> it's probably just a one liner missing somewhere in the VM code.
>>
>> Christian.
>>
>> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>>>
>>> On 09.05.2014 20:03, Marek Ol??k wrote:


 This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
 [...]

 The simplest way to reproduce the hangs is to run piglit with these
 parameters:
 -t texelFetch.fs

 Some of the tests allocate a lot of MSAA textures and the tests also
 run in parallel, which creates a lot of memory pressure and probably
 causes buffer evictions.

>>>
>>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I boot
>>> with radeon.vramlimit=256 and then run Xonotic timedemo with high settings.
>>> I haven't had a chance to bisect it yet, but it might be a similar problem.
>>>
>>> Grigori
>>
>>
>


CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Christian König
Couldn't reproduce the issue so far. So the attached patch is just a 
complete shoot into the dark found by rereading the code, but it might 
actually be the problem.

Please give it a try.

Going to keep testing in the meantime,
Christian.

Am 10.05.2014 10:23, schrieb Christian K?nig:
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
>> boot with radeon.vramlimit=256 and then run Xonotic timedemo with 
>> high settings. I haven't had a chance to bisect it yet, but it might 
>> be a similar problem.
> Sounds like the same issue to me. Thx for the good test case.
>
>> Any idea what is wrong with it?
> Actually I already wondered that it went so smooth without any 
> regression so far, didn't noticed the bug in bugzilla.kernel.org yet.
>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
> Sounds like the underlying problem to me. We probably evict some part 
> of a page table without updating the page directory. Going to dig into 
> it today, it's probably just a one liner missing somewhere in the VM 
> code.
>
> Christian.
>
> Am 09.05.2014 23:39, schrieb Grigori Goronzy:
>> On 09.05.2014 20:03, Marek Ol??k wrote:
>>>
>>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>>> [...]
>>>
>>> The simplest way to reproduce the hangs is to run piglit with these 
>>> parameters:
>>> -t texelFetch.fs
>>>
>>> Some of the tests allocate a lot of MSAA textures and the tests also
>>> run in parallel, which creates a lot of memory pressure and probably
>>> causes buffer evictions.
>>>
>>
>> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
>> boot with radeon.vramlimit=256 and then run Xonotic timedemo with 
>> high settings. I haven't had a chance to bisect it yet, but it might 
>> be a similar problem.
>>
>> Grigori
>

-- next part --
A non-text attachment was scrubbed...
Name: 0001-drm-radeon-fix-buffer-placement-under-memory-pressur.patch
Type: text/x-diff
Size: 1892 bytes
Desc: not available
URL: 



CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Christian König
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
> boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings. I haven't had a chance to bisect it yet, but it might be a 
> similar problem.
Sounds like the same issue to me. Thx for the good test case.

> Any idea what is wrong with it?
Actually I already wondered that it went so smooth without any 
regression so far, didn't noticed the bug in bugzilla.kernel.org yet.

> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
Sounds like the underlying problem to me. We probably evict some part of 
a page table without updating the page directory. Going to dig into it 
today, it's probably just a one liner missing somewhere in the VM code.

Christian.

Am 09.05.2014 23:39, schrieb Grigori Goronzy:
> On 09.05.2014 20:03, Marek Ol??k wrote:
>>
>> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>> [...]
>>
>> The simplest way to reproduce the hangs is to run piglit with these 
>> parameters:
>> -t texelFetch.fs
>>
>> Some of the tests allocate a lot of MSAA textures and the tests also
>> run in parallel, which creates a lot of memory pressure and probably
>> causes buffer evictions.
>>
>
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
> boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
> settings. I haven't had a chance to bisect it yet, but it might be a 
> similar problem.
>
> Grigori



CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Grigori Goronzy
On 09.05.2014 20:03, Marek Ol??k wrote:
>
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>[...]
>
> The simplest way to reproduce the hangs is to run piglit with these 
> parameters:
> -t texelFetch.fs
>
> Some of the tests allocate a lot of MSAA textures and the tests also
> run in parallel, which creates a lot of memory pressure and probably
> causes buffer evictions.
>

I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I 
boot with radeon.vramlimit=256 and then run Xonotic timedemo with high 
settings. I haven't had a chance to bisect it yet, but it might be a 
similar problem.

Grigori


CIK hangs with kernel 3.15, bisected

2014-05-09 Thread Rafał Miłecki
On 9 May 2014 20:03, Marek Ol??k  wrote:
> This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:
>
> commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
> Author: Christian K?nig 
> Date:   Thu Feb 20 13:42:17 2014 +0100
>
> drm/radeon: use normal BOs for the page tables v4

Also reported in:
https://bugzilla.kernel.org/show_bug.cgi?id=75651


CIK hangs with kernel 3.15, bisected

2014-05-09 Thread Marek Olšák
Hi Christian,

This commit which first appeared in 3.15-rc1 causes hangs on Bonaire:

commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592
Author: Christian K?nig 
Date:   Thu Feb 20 13:42:17 2014 +0100

drm/radeon: use normal BOs for the page tables v4

No need to make it more complicated than necessary,
just allocate the page tables as normal BO and
flush whenever the address change.

v2: update comments and function name
v3: squash bug fixes, page directory and tables patch
v4: rebased on Mareks changes

Signed-off-by: Christian K?nig 


Reverting the commit gives me a lot of merge conflicts.

The simplest way to reproduce the hangs is to run piglit with these parameters:
-t texelFetch.fs

Some of the tests allocate a lot of MSAA textures and the tests also
run in parallel, which creates a lot of memory pressure and probably
causes buffer evictions.

Any idea what is wrong with it?

Thanks,

Marek