CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 30.05.2014 13:46, Grigori Goronzy wrote: > On 30.05.2014 13:30, Marek Ol??k wrote: >> Grigori, >> >> you can git-checkout the commit before and after the memory management >> changes, compile both and test them. >> > > I was trying to revert the changes, but it looks like too much changed > in

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Christian König
Well the good news is that when I use the CP DMA instead of the SDMA everything seems to work fine. Unfortunately using the CP DMA has a completely different timing (because of the additional sync needed) and so I'm not sure if it's really fixed or just masked. Christian. Am 29.05.2014

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Marek Olšák
That's right. Also, you probably want to enable automatic addition of the git-sha1 to the kernel version in menuconfig, there is an option for it, so that you can have several kernels with the same version but different sha1 installed. Marek On Fri, May 30, 2014 at 1:46 PM, Grigori Goronzy

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 30.05.2014 13:30, Marek Ol??k wrote: > Grigori, > > you can git-checkout the commit before and after the memory management > changes, compile both and test them. > I was trying to revert the changes, but it looks like too much changed in the meantime. The suitable commits to check out should

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Marek Olšák
Grigori, you can git-checkout the commit before and after the memory management changes, compile both and test them. Marek On Fri, May 30, 2014 at 2:30 AM, Grigori Goronzy wrote: > On 13.05.2014 22:27, Marek Ol??k wrote: >> >> I applied these two patches Christian sent to dri-devel: >> >>

CIK hangs with kernel 3.15, bisected

2014-05-30 Thread Grigori Goronzy
On 13.05.2014 22:27, Marek Ol??k wrote: > I applied these two patches Christian sent to dri-devel: > > drm/radeon: fix page directory update size estimation > drm/radeon: fix buffer placement under memory pressure v2 > > on top of torvalds's master branch. > With latest kernel master (a991639c) I

CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Christian König
Yeah, that will work around it for now. But the general problem is that we have a memory corruption here, we just didn't noticed it earlier because clearing a texture or vectors with zero only results in random mis rendering. Only when you hit a shader or in this case a page table it really

CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Marek Olšák
Can disable evictions for page tables, e.g. by removing them from the LRU list? Marek On Thu, May 29, 2014 at 6:30 PM, Christian K?nig wrote: > Hi Marek & Alex, > > I've found the issue why forcefully evicting page tables sometimes crashes > the box. > > Well this is a typical hexdump page

CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Christian König
Hi Marek & Alex, I've found the issue why forcefully evicting page tables sometimes crashes the box. Well this is a typical hexdump page table before it is moved to GART: 000117f000 02914061 000117f008 02915061 000117f010 02916061 000117f018 02917061

CIK hangs with kernel 3.15, bisected

2014-05-29 Thread Alex Deucher
On Thu, May 29, 2014 at 12:30 PM, Christian K?nig wrote: > Hi Marek & Alex, > > I've found the issue why forcefully evicting page tables sometimes crashes > the box. > > Well this is a typical hexdump page table before it is moved to GART: > 000117f000 02914061 > 000117f008 02915061

CIK hangs with kernel 3.15, bisected

2014-05-28 Thread Christian König
I already tried a similar patch as well, without any more noticeable crashes. But going to give this another round with your patch and openarena. Thanks, Christian. Am 27.05.2014 23:55, schrieb Marek Ol??k: > Hi Christian, > > I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are

CIK hangs with kernel 3.15, bisected

2014-05-28 Thread Marek Olšák
Hi Christian, I test on Bonaire (ChipID = 0x665c). Unfortunately, the hangs are not fixed yet. They are very rare and very random. Therefore, I have come up with a patch which evicts page tables between IBs. See the attachment. With that patch applied, the system starts fine, compiz and glxgears

CIK hangs with kernel 3.15, bisected

2014-05-14 Thread Christian König
Crap, any chance you can narrow it down a bit more? I've just tried a piglit quick test on my Bonaire and it seems to work perfectly fine. What hw do you test on? Regards, Christian. Am 13.05.2014 23:21, schrieb Marek Ol??k: > Hi Christian, > > Even though some regressions are fixed by these

CIK hangs with kernel 3.15, bisected

2014-05-14 Thread Marek Olšák
Hi Christian, Even though some regressions are fixed by these patches: drm/radeon: fix page directory update size estimation drm/radeon: fix buffer placement under memory pressure v2 and indeed, the texelFetch tests no longer hang, there is one more hang which needs to be fixed. :( All I know

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
I applied these two patches Christian sent to dri-devel: drm/radeon: fix page directory update size estimation drm/radeon: fix buffer placement under memory pressure v2 on top of torvalds's master branch. Marek On Tue, May 13, 2014 at 10:19 PM, Grigori Goronzy wrote: > On 13.05.2014 21:50,

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Grigori Goronzy
On 13.05.2014 21:50, Marek Ol??k wrote: > Hi Christian, > > The performance regression I saw with piglit seems to be fixed with > latest kernel git. It's difficult to bisect the kernel, because there > are only merges between 3.14 and 3.15 and the merged committs are > actually based on 3.14-rc1

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
Hi Christian, The performance regression I saw with piglit seems to be fixed with latest kernel git. It's difficult to bisect the kernel, because there are only merges between 3.14 and 3.15 and the merged committs are actually based on 3.14-rc1 and 3.14-rc4. All seems to be fine with your fixes.

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
I think it's caused by something else. I'll continue testing and bisecting. Marek On Tue, May 13, 2014 at 5:31 PM, Christian K?nig wrote: > Is the performance regression regression caused by the page table changes or > something else? > > I did made some tests with xonotic while developing it

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Christian König
Is the performance regression regression caused by the page table changes or something else? I did made some tests with xonotic while developing it and it didn't showed anything obvious, but I didn't made tests on different systems. Christian. Am 13.05.2014 17:19, schrieb Marek Ol??k: > Your

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Marek Olšák
Your latest patches fix the regression. The performance regression can also be reproduced with piglit "-t texelFetch.fs". Kernel 3.14: real0m17.724s user0m41.905s sys0m11.299s The problematic commit checked out + your fixes (without the PTE patch I think): real

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Christian König
Am 13.05.2014 15:22, schrieb Alex Deucher: > On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy wrote: >> I can confirm this fixes it for me, too. >> >> 3.15 with these fixes and the large PTE patches actually ends up being >> noticeably slower than earlier kernels with Xonotic, though. I wonder

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Alex Deucher
On Mon, May 12, 2014 at 7:38 PM, Grigori Goronzy wrote: > I can confirm this fixes it for me, too. > > 3.15 with these fixes and the large PTE patches actually ends up being > noticeably slower than earlier kernels with Xonotic, though. I wonder what's > going on. Allocation overhead? > >

CIK hangs with kernel 3.15, bisected

2014-05-13 Thread Grigori Goronzy
I can confirm this fixes it for me, too. 3.15 with these fixes and the large PTE patches actually ends up being noticeably slower than earlier kernels with Xonotic, though. I wonder what's going on. Grigori On 12.05.2014 14:50, Christian K?nig wrote: > I could reproduce the problem with

CIK hangs with kernel 3.15, bisected

2014-05-12 Thread Christian König
I could reproduce the problem with xonotic and I think I've found the issue. Please test the attached patch. Thanks, Christian. Am 11.05.2014 11:06, schrieb Christian K?nig: >> I have tested it and it doesn't fix the hangs. > Yeah, thought so. Well it was just a guess. > >> (Also, I don't like

CIK hangs with kernel 3.15, bisected

2014-05-11 Thread Christian König
> I have tested it and it doesn't fix the hangs. Yeah, thought so. Well it was just a guess. > (Also, I don't like the patch, because it reverts the behavior I added > for userspace buffers.) Actually it shouldn't affect that. The alternative domain always contains GART even when userspace only

CIK hangs with kernel 3.15, bisected

2014-05-11 Thread Marek Olšák
Hi Christian, I have tested it and it doesn't fix the hangs. (Also, I don't like the patch, because it reverts the behavior I added for userspace buffers.) Marek On Sat, May 10, 2014 at 6:34 PM, Christian K?nig wrote: > Couldn't reproduce the issue so far. So the attached patch is just a >

CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Christian König
Couldn't reproduce the issue so far. So the attached patch is just a complete shoot into the dark found by rereading the code, but it might actually be the problem. Please give it a try. Going to keep testing in the meantime, Christian. Am 10.05.2014 10:23, schrieb Christian K?nig: >> I see

CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Christian König
> I see hangs with kernel 3.15 and SI under memory pressure, e.g. if I > boot with radeon.vramlimit=256 and then run Xonotic timedemo with high > settings. I haven't had a chance to bisect it yet, but it might be a > similar problem. Sounds like the same issue to me. Thx for the good test case.

CIK hangs with kernel 3.15, bisected

2014-05-10 Thread Grigori Goronzy
On 09.05.2014 20:03, Marek Ol??k wrote: > > This commit which first appeared in 3.15-rc1 causes hangs on Bonaire: >[...] > > The simplest way to reproduce the hangs is to run piglit with these > parameters: > -t texelFetch.fs > > Some of the tests allocate a lot of MSAA textures and the tests

CIK hangs with kernel 3.15, bisected

2014-05-09 Thread Rafał Miłecki
On 9 May 2014 20:03, Marek Ol??k wrote: > This commit which first appeared in 3.15-rc1 causes hangs on Bonaire: > > commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592 > Author: Christian K?nig > Date: Thu Feb 20 13:42:17 2014 +0100 > > drm/radeon: use normal BOs for the page tables v4 Also

CIK hangs with kernel 3.15, bisected

2014-05-09 Thread Marek Olšák
Hi Christian, This commit which first appeared in 3.15-rc1 causes hangs on Bonaire: commit 6d2f2944e95e504a7d33385eeeb9bb7fcca72592 Author: Christian K?nig Date: Thu Feb 20 13:42:17 2014 +0100 drm/radeon: use normal BOs for the page tables v4 No need to make it more complicated than