Re: Screen corruption using radeon kernel driver

2022-12-11 Thread Mikhail Krylov
On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:
> >
> > On 2022-11-30 14:28, Alex Deucher wrote:
> > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:
> > >>
> > >> On 2022-11-29 17:11, Mikhail Krylov wrote:
> > >>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> >  On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  
> >  wrote:
> > >
> > > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > >> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  
> > >> wrote:
> > >>>
> > >>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > >>>
> > >> [excessive quoting removed]
> > >>>
> > > So, is there any progress on this issue? I do understand it's not 
> > > a high
> > > priority one, and today I've checked it on 6.0 kernel, and
> > > unfortunately, it still persists...
> > >
> > > I'm considering writing a patch that will allow user to override
> > > need_dma32/dma_bits setting with a module parameter. I'll have 
> > > some time
> > > after the New Year for that.
> > >
> > > Is it at all possible that such a patch will be merged into 
> > > kernel?
> > >
> >  On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> >  wrote:
> >  Unless someone familiar with HIMEM can figure out what is going 
> >  wrong
> >  we should just revert the patch.
> > 
> >  Alex
> > >>>
> > >>>
> > >>> Okay, I was suggesting that mostly because
> > >>>
> > >>> a) it works for me with dma_bits = 40 (I understand that's what it 
> > >>> is
> > >>> without the original patch applied);
> > >>>
> > >>> b) there's a hint of uncertainity on this line
> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > >>> saying that for AGP dma_bits = 32 is the safest option, so 
> > >>> apparently there are
> > >>> setups, unlike mine, where dma_bits = 32 is better than 40.
> > >>>
> > >>> But I'm in no position to argue, just wanted to make myself clear.
> > >>> I'm okay with rebuilding the kernel for my machine until the 
> > >>> original
> > >>> patch is reverted or any other fix is applied.
> > >>
> > >> What GPU do you have and is it AGP?  If it is AGP, does setting
> > >> radeon.agpmode=-1 also fix it?
> > >>
> > >> Alex
> > >
> > > That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 
> > > doesn't
> > > help, it just makes 3D acceleration in games such as OpenArena stop
> > > working.
> > 
> >  Just to confirm, is the board AGP or PCIe?
> > 
> >  Alex
> > >>>
> > >>> It is AGP. That's an old machine.
> > >>
> > >> Can you check whether dma_addressing_limited() is actually returning the
> > >> expected result at the point of radeon_ttm_init()? Disabling highmem is
> > >> presumably just hiding whatever problem exists, by throwing away all
> > >>   >32-bit RAM such that use_dma32 doesn't matter.
> > >
> > > The device in question only supports a 32 bit DMA mask so
> > > dma_addressing_limited() should return true.  Bounce buffers are not
> > > really usable on GPUs because they map so much memory.  If
> > > dma_addressing_limited() returns false, that would explain it.
> >
> > Right, it appears to be the only part of the offending commit that
> > *could* reasonably make any difference, so I'm primarily wondering if
> > dma_get_required_mask() somehow gets confused.
> 
> Mikhail,
> 
> Can you see that dma_addressing_limited() and dma_get_required_mask()
> return in this case?
> 
> Alex
> 
> 
> >
> > Thanks,
> > Robin.

Hello again, I was able to confirm by adding printk() to the functions
and recompiling the kernel that dma_addressing_limited() returns
*false* on the kernel with the bug. 

And dma_get_required_mask() returns 0x7fff, as I said before.


signature.asc
Description: PGP signature


Re: Screen corruption using radeon kernel driver

2022-12-10 Thread Luben Tuikov
On 2022-12-10 10:32, Mikhail Krylov wrote:
> On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
>> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:
>>>
>>> On 2022-11-30 14:28, Alex Deucher wrote:
 On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:
>
> On 2022-11-29 17:11, Mikhail Krylov wrote:
>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
>>> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  
>>> wrote:

 On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  
> wrote:
>>
>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
>>
> [excessive quoting removed]
>>
 So, is there any progress on this issue? I do understand it's not 
 a high
 priority one, and today I've checked it on 6.0 kernel, and
 unfortunately, it still persists...

 I'm considering writing a patch that will allow user to override
 need_dma32/dma_bits setting with a module parameter. I'll have 
 some time
 after the New Year for that.

 Is it at all possible that such a patch will be merged into kernel?

>>> On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
>>> wrote:
>>> Unless someone familiar with HIMEM can figure out what is going 
>>> wrong
>>> we should just revert the patch.
>>>
>>> Alex
>>
>>
>> Okay, I was suggesting that mostly because
>>
>> a) it works for me with dma_bits = 40 (I understand that's what it is
>> without the original patch applied);
>>
>> b) there's a hint of uncertainity on this line
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
>> saying that for AGP dma_bits = 32 is the safest option, so 
>> apparently there are
>> setups, unlike mine, where dma_bits = 32 is better than 40.
>>
>> But I'm in no position to argue, just wanted to make myself clear.
>> I'm okay with rebuilding the kernel for my machine until the original
>> patch is reverted or any other fix is applied.
>
> What GPU do you have and is it AGP?  If it is AGP, does setting
> radeon.agpmode=-1 also fix it?
>
> Alex

 That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
 help, it just makes 3D acceleration in games such as OpenArena stop
 working.
>>>
>>> Just to confirm, is the board AGP or PCIe?
>>>
>>> Alex
>>
>> It is AGP. That's an old machine.
>
> Can you check whether dma_addressing_limited() is actually returning the
> expected result at the point of radeon_ttm_init()? Disabling highmem is
> presumably just hiding whatever problem exists, by throwing away all
>   >32-bit RAM such that use_dma32 doesn't matter.

 The device in question only supports a 32 bit DMA mask so
 dma_addressing_limited() should return true.  Bounce buffers are not
 really usable on GPUs because they map so much memory.  If
 dma_addressing_limited() returns false, that would explain it.
>>>
>>> Right, it appears to be the only part of the offending commit that
>>> *could* reasonably make any difference, so I'm primarily wondering if
>>> dma_get_required_mask() somehow gets confused.
>>
>> Mikhail,
>>
>> Can you see that dma_addressing_limited() and dma_get_required_mask()
>> return in this case?
>>
>> Alex
>>
>>
>>>
>>> Thanks,
>>> Robin.
> 
> Hello again, I was able to confirm by adding printk() to the functions
> and recompiling the kernel that dma_addressing_limited() returns
> *false* on the kernel with the bug. 
> 
> And dma_get_required_mask() returns 0x7fff, as I said before.

Yes, dma_addressing_limited() evaluates to "false" in your case,
and this is the correct answer according to the function's comment:
"Return %true if the devices DMA mask is too small to address all
 memory in the system, else %false."

In this case the device's DMA mask is 0x and the mask
for the 1.5 GiB memory is 0x7FFF, so the static inline
returns "false". (dma_direct_get_required_mask() returns this
for your memory size.)

It would appear that dma_addressing_limited() isn't answering the question
which the last parameter to ttm_device_init(), "use GFP_DMA32", wants
answered. Perhaps we should use another method to make sure that that
parameter is set in the scenario in question.

Regards,
Luben




Re: Screen corruption using radeon kernel driver

2022-12-02 Thread Mikhail Krylov
On Thu, Dec 01, 2022 at 02:00:58PM +, Robin Murphy wrote:
> On 2022-11-30 19:59, Mikhail Krylov wrote:
> > On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
> > > On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  
> > > wrote:
> > > > 
> > > > On 2022-11-30 14:28, Alex Deucher wrote:
> > > > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  
> > > > > wrote:
> > > > > > 
> > > > > > On 2022-11-29 17:11, Mikhail Krylov wrote:
> > > > > > > On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> > > > > > > > On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov 
> > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > > > > > > > > > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov 
> > > > > > > > > >  wrote:
> > > > > > > > > > > 
> > > > > > > > > > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher 
> > > > > > > > > > > wrote:
> > > > > > > > > > > 
> > > > > > > > > > > > > > [excessive quoting removed]
> > > > > > > > > > > 
> > > > > > > > > > > > > So, is there any progress on this issue? I do 
> > > > > > > > > > > > > understand it's not a high
> > > > > > > > > > > > > priority one, and today I've checked it on 6.0 
> > > > > > > > > > > > > kernel, and
> > > > > > > > > > > > > unfortunately, it still persists...
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I'm considering writing a patch that will allow user 
> > > > > > > > > > > > > to override
> > > > > > > > > > > > > need_dma32/dma_bits setting with a module parameter. 
> > > > > > > > > > > > > I'll have some time
> > > > > > > > > > > > > after the New Year for that.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Is it at all possible that such a patch will be 
> > > > > > > > > > > > > merged into kernel?
> > > > > > > > > > > > > 
> > > > > > > > > > > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > Unless someone familiar with HIMEM can figure out what 
> > > > > > > > > > > > is going wrong
> > > > > > > > > > > > we should just revert the patch.
> > > > > > > > > > > > 
> > > > > > > > > > > > Alex
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Okay, I was suggesting that mostly because
> > > > > > > > > > > 
> > > > > > > > > > > a) it works for me with dma_bits = 40 (I understand 
> > > > > > > > > > > that's what it is
> > > > > > > > > > > without the original patch applied);
> > > > > > > > > > > 
> > > > > > > > > > > b) there's a hint of uncertainity on this line
> > > > > > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > > > > > > > > > > saying that for AGP dma_bits = 32 is the safest option, 
> > > > > > > > > > > so apparently there are
> > > > > > > > > > > setups, unlike mine, where dma_bits = 32 is better than 
> > > > > > > > > > > 40.
> > > > > > > > > > > 
> > > > > > > > > > > But I'm in no position to argue, just wanted to make 
> > > > > > > > > > > myself clear.
> > > > > > > > > > > I'm okay with rebuilding the kernel for my machine until 
> > > > > > > > > > > the original
> > > > > > > > > > > patch is reverted or any other fix is applied.
> > > > > > > > > > 
> > > > > > > > > > What GPU do you have and is it AGP?  If it is AGP, does 
> > > > > > > > > > setting
> > > > > > > > > > radeon.agpmode=-1 also fix it?
> > > > > > > > > > 
> > > > > > > > > > Alex
> > > > > > > > > 
> > > > > > > > > That is ATI Radeon X1950, and, unfortunately, 
> > > > > > > > > radeon.agpmode=-1 doesn't
> > > > > > > > > help, it just makes 3D acceleration in games such as 
> > > > > > > > > OpenArena stop
> > > > > > > > > working.
> > > > > > > > 
> > > > > > > > Just to confirm, is the board AGP or PCIe?
> > > > > > > > 
> > > > > > > > Alex
> > > > > > > 
> > > > > > > It is AGP. That's an old machine.
> > > > > > 
> > > > > > Can you check whether dma_addressing_limited() is actually 
> > > > > > returning the
> > > > > > expected result at the point of radeon_ttm_init()? Disabling 
> > > > > > highmem is
> > > > > > presumably just hiding whatever problem exists, by throwing away all
> > > > > >>32-bit RAM such that use_dma32 doesn't matter.
> > > > > 
> > > > > The device in question only supports a 32 bit DMA mask so
> > > > > dma_addressing_limited() should return true.  Bounce buffers are not
> > > > > really usable on GPUs because they map so much memory.  If
> > > > > dma_addressing_limited() returns false, that would explain it.
> > > > 
> > > > Right, it appears to be the only part of the offending commit that
> > > > *could* reasonably make any difference, so I'm primarily wondering if
> > > > dma_get_required_mask() somehow gets confused.
> > > 
> > > Mikhail,
> > > 
> > > Can you see that dma_addressing_limited() and dma_get_required_mask()
> > > return in this case?
> > > 
> > > Alex
> > > 
> > > 
> > > > 
> > > > Thanks,
> > > > Robin.
> > 
> > 

Re: Screen corruption using radeon kernel driver

2022-12-01 Thread Alex Deucher
On Thu, Dec 1, 2022 at 9:01 AM Robin Murphy  wrote:
>
> On 2022-11-30 19:59, Mikhail Krylov wrote:
> > On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
> >> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:
> >>>
> >>> On 2022-11-30 14:28, Alex Deucher wrote:
>  On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  
>  wrote:
> >
> > On 2022-11-29 17:11, Mikhail Krylov wrote:
> >> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> >>> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  
> >>> wrote:
> 
>  On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  
> > wrote:
> >>
> >> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> >>
> > [excessive quoting removed]
> >>
>  So, is there any progress on this issue? I do understand it's 
>  not a high
>  priority one, and today I've checked it on 6.0 kernel, and
>  unfortunately, it still persists...
> 
>  I'm considering writing a patch that will allow user to override
>  need_dma32/dma_bits setting with a module parameter. I'll have 
>  some time
>  after the New Year for that.
> 
>  Is it at all possible that such a patch will be merged into 
>  kernel?
> 
> >>> On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov 
> >>>  wrote:
> >>> Unless someone familiar with HIMEM can figure out what is going 
> >>> wrong
> >>> we should just revert the patch.
> >>>
> >>> Alex
> >>
> >>
> >> Okay, I was suggesting that mostly because
> >>
> >> a) it works for me with dma_bits = 40 (I understand that's what it 
> >> is
> >> without the original patch applied);
> >>
> >> b) there's a hint of uncertainity on this line
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> >> saying that for AGP dma_bits = 32 is the safest option, so 
> >> apparently there are
> >> setups, unlike mine, where dma_bits = 32 is better than 40.
> >>
> >> But I'm in no position to argue, just wanted to make myself clear.
> >> I'm okay with rebuilding the kernel for my machine until the 
> >> original
> >> patch is reverted or any other fix is applied.
> >
> > What GPU do you have and is it AGP?  If it is AGP, does setting
> > radeon.agpmode=-1 also fix it?
> >
> > Alex
> 
>  That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 
>  doesn't
>  help, it just makes 3D acceleration in games such as OpenArena stop
>  working.
> >>>
> >>> Just to confirm, is the board AGP or PCIe?
> >>>
> >>> Alex
> >>
> >> It is AGP. That's an old machine.
> >
> > Can you check whether dma_addressing_limited() is actually returning the
> > expected result at the point of radeon_ttm_init()? Disabling highmem is
> > presumably just hiding whatever problem exists, by throwing away all
> >>32-bit RAM such that use_dma32 doesn't matter.
> 
>  The device in question only supports a 32 bit DMA mask so
>  dma_addressing_limited() should return true.  Bounce buffers are not
>  really usable on GPUs because they map so much memory.  If
>  dma_addressing_limited() returns false, that would explain it.
> >>>
> >>> Right, it appears to be the only part of the offending commit that
> >>> *could* reasonably make any difference, so I'm primarily wondering if
> >>> dma_get_required_mask() somehow gets confused.
> >>
> >> Mikhail,
> >>
> >> Can you see that dma_addressing_limited() and dma_get_required_mask()
> >> return in this case?
> >>
> >> Alex
> >>
> >>
> >>>
> >>> Thanks,
> >>> Robin.
> >
> > Unfortunately, right now I don't have enough time for kernel
> > modifications and rebuilds (I will later!), so I did a quick-and-dirty
> > research with kprobe.
> >
> > The problem is that dma_addressing_limited() seems to be inlined and
> > kprobe fails to intercept it.
> >
> > But I managed to get the result of dma_get_required_mask(). It returns
> > 0x7fff (!) on the vanilla (with the patch, buggy) kernel:
> >
> > $ sudo kprobe-perf 'r:dma_get_required_mask $retval'
> > Tracing kprobe dma_get_required_mask. Ctrl-C to end.
> >  modprobe-1244[000] d...   105.582816: dma_get_required_mask: 
> > (radeon_ttm_init+0x61/0x240 [radeon] <- dma_get_required_mask) 
> > arg1=0x7fff
> >
> > This function does not even get called in the kernel without the patch
> > that I built myself. I believe that's because ttm_bo_device_init()
> > doesn't call it without the patch.

Re: Screen corruption using radeon kernel driver

2022-12-01 Thread Robin Murphy

On 2022-11-30 19:59, Mikhail Krylov wrote:

On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:


On 2022-11-30 14:28, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:


On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the
expected result at the point of radeon_ttm_init()? Disabling highmem is
presumably just hiding whatever problem exists, by throwing away all
   >32-bit RAM such that use_dma32 doesn't matter.


The device in question only supports a 32 bit DMA mask so
dma_addressing_limited() should return true.  Bounce buffers are not
really usable on GPUs because they map so much memory.  If
dma_addressing_limited() returns false, that would explain it.


Right, it appears to be the only part of the offending commit that
*could* reasonably make any difference, so I'm primarily wondering if
dma_get_required_mask() somehow gets confused.


Mikhail,

Can you see that dma_addressing_limited() and dma_get_required_mask()
return in this case?

Alex




Thanks,
Robin.


Unfortunately, right now I don't have enough time for kernel
modifications and rebuilds (I will later!), so I did a quick-and-dirty
research with kprobe.

The problem is that dma_addressing_limited() seems to be inlined and
kprobe fails to intercept it.

But I managed to get the result of dma_get_required_mask(). It returns
0x7fff (!) on the vanilla (with the patch, buggy) kernel:
  
$ sudo kprobe-perf 'r:dma_get_required_mask $retval'

Tracing kprobe dma_get_required_mask. Ctrl-C to end.
 modprobe-1244[000] d...   105.582816: dma_get_required_mask: 
(radeon_ttm_init+0x61/0x240 [radeon] <- dma_get_required_mask) arg1=0x7fff

This function does not even get called in the kernel without the patch
that I built myself. I believe that's because ttm_bo_device_init()
doesn't call it without the patch.

Hope that helps at least a bit. If not, I'll be able to do more thorough
research in a couple of weeks, probably.


Hmm, just to clarify, what's your actual RAM layout? I've been assuming
that the issue must be caused by unexpected DMA address truncation, but
double-checking the older threads it seems that might not be the case.
I just did a quick sanity-check of both HIGHMEM4G and HIGHMEM64G configs
in a VM with either 2GB or 4GB of RAM assigned, and the
dma_direct_get_required_mask() calculation seemed to return the
appropriate result for all combinations.

Otherwise, the only significant difference of use_dma32 seems to be to
switch TTM's allocation flags from GFP_HIGHUSER to GFP_DMA32. Could it
just be that the highmem support somewhere between TTM and radeon has
bitrotted, and it hasn't been noticed until this change because everyone
still using a 32-bit system with highmem also happens not to be using a
newer 40-bit-capable GPU? Or perhaps it never worked for AGP at all, in
which case an explicit special case might be clearer?

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c

Re: Screen corruption using radeon kernel driver

2022-12-01 Thread Mikhail Krylov
On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:
> >
> > On 2022-11-30 14:28, Alex Deucher wrote:
> > > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:
> > >>
> > >> On 2022-11-29 17:11, Mikhail Krylov wrote:
> > >>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> >  On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  
> >  wrote:
> > >
> > > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > >> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  
> > >> wrote:
> > >>>
> > >>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > >>>
> > >> [excessive quoting removed]
> > >>>
> > > So, is there any progress on this issue? I do understand it's not 
> > > a high
> > > priority one, and today I've checked it on 6.0 kernel, and
> > > unfortunately, it still persists...
> > >
> > > I'm considering writing a patch that will allow user to override
> > > need_dma32/dma_bits setting with a module parameter. I'll have 
> > > some time
> > > after the New Year for that.
> > >
> > > Is it at all possible that such a patch will be merged into 
> > > kernel?
> > >
> >  On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> >  wrote:
> >  Unless someone familiar with HIMEM can figure out what is going 
> >  wrong
> >  we should just revert the patch.
> > 
> >  Alex
> > >>>
> > >>>
> > >>> Okay, I was suggesting that mostly because
> > >>>
> > >>> a) it works for me with dma_bits = 40 (I understand that's what it 
> > >>> is
> > >>> without the original patch applied);
> > >>>
> > >>> b) there's a hint of uncertainity on this line
> > >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > >>> saying that for AGP dma_bits = 32 is the safest option, so 
> > >>> apparently there are
> > >>> setups, unlike mine, where dma_bits = 32 is better than 40.
> > >>>
> > >>> But I'm in no position to argue, just wanted to make myself clear.
> > >>> I'm okay with rebuilding the kernel for my machine until the 
> > >>> original
> > >>> patch is reverted or any other fix is applied.
> > >>
> > >> What GPU do you have and is it AGP?  If it is AGP, does setting
> > >> radeon.agpmode=-1 also fix it?
> > >>
> > >> Alex
> > >
> > > That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 
> > > doesn't
> > > help, it just makes 3D acceleration in games such as OpenArena stop
> > > working.
> > 
> >  Just to confirm, is the board AGP or PCIe?
> > 
> >  Alex
> > >>>
> > >>> It is AGP. That's an old machine.
> > >>
> > >> Can you check whether dma_addressing_limited() is actually returning the
> > >> expected result at the point of radeon_ttm_init()? Disabling highmem is
> > >> presumably just hiding whatever problem exists, by throwing away all
> > >>   >32-bit RAM such that use_dma32 doesn't matter.
> > >
> > > The device in question only supports a 32 bit DMA mask so
> > > dma_addressing_limited() should return true.  Bounce buffers are not
> > > really usable on GPUs because they map so much memory.  If
> > > dma_addressing_limited() returns false, that would explain it.
> >
> > Right, it appears to be the only part of the offending commit that
> > *could* reasonably make any difference, so I'm primarily wondering if
> > dma_get_required_mask() somehow gets confused.
> 
> Mikhail,
> 
> Can you see that dma_addressing_limited() and dma_get_required_mask()
> return in this case?
> 
> Alex
> 
> 
> >
> > Thanks,
> > Robin.

Unfortunately, right now I don't have enough time for kernel
modifications and rebuilds (I will later!), so I did a quick-and-dirty
research with kprobe. 

The problem is that dma_addressing_limited() seems to be inlined and
kprobe fails to intercept it.

But I managed to get the result of dma_get_required_mask(). It returns
0x7fff (!) on the vanilla (with the patch, buggy) kernel:
 
$ sudo kprobe-perf 'r:dma_get_required_mask $retval'
Tracing kprobe dma_get_required_mask. Ctrl-C to end.
modprobe-1244[000] d...   105.582816: dma_get_required_mask: 
(radeon_ttm_init+0x61/0x240 [radeon] <- dma_get_required_mask) arg1=0x7fff

This function does not even get called in the kernel without the patch
that I built myself. I believe that's because ttm_bo_device_init()
doesn't call it without the patch.

Hope that helps at least a bit. If not, I'll be able to do more thorough
research in a couple of weeks, probably.


signature.asc
Description: PGP signature


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Alex Deucher
On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy  wrote:
>
> On 2022-11-30 14:28, Alex Deucher wrote:
> > On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:
> >>
> >> On 2022-11-29 17:11, Mikhail Krylov wrote:
> >>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
>  On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  
>  wrote:
> >
> > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> >> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  
> >> wrote:
> >>>
> >>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> >>>
> >> [excessive quoting removed]
> >>>
> > So, is there any progress on this issue? I do understand it's not a 
> > high
> > priority one, and today I've checked it on 6.0 kernel, and
> > unfortunately, it still persists...
> >
> > I'm considering writing a patch that will allow user to override
> > need_dma32/dma_bits setting with a module parameter. I'll have some 
> > time
> > after the New Year for that.
> >
> > Is it at all possible that such a patch will be merged into kernel?
> >
>  On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
>  wrote:
>  Unless someone familiar with HIMEM can figure out what is going wrong
>  we should just revert the patch.
> 
>  Alex
> >>>
> >>>
> >>> Okay, I was suggesting that mostly because
> >>>
> >>> a) it works for me with dma_bits = 40 (I understand that's what it is
> >>> without the original patch applied);
> >>>
> >>> b) there's a hint of uncertainity on this line
> >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> >>> saying that for AGP dma_bits = 32 is the safest option, so apparently 
> >>> there are
> >>> setups, unlike mine, where dma_bits = 32 is better than 40.
> >>>
> >>> But I'm in no position to argue, just wanted to make myself clear.
> >>> I'm okay with rebuilding the kernel for my machine until the original
> >>> patch is reverted or any other fix is applied.
> >>
> >> What GPU do you have and is it AGP?  If it is AGP, does setting
> >> radeon.agpmode=-1 also fix it?
> >>
> >> Alex
> >
> > That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> > help, it just makes 3D acceleration in games such as OpenArena stop
> > working.
> 
>  Just to confirm, is the board AGP or PCIe?
> 
>  Alex
> >>>
> >>> It is AGP. That's an old machine.
> >>
> >> Can you check whether dma_addressing_limited() is actually returning the
> >> expected result at the point of radeon_ttm_init()? Disabling highmem is
> >> presumably just hiding whatever problem exists, by throwing away all
> >>   >32-bit RAM such that use_dma32 doesn't matter.
> >
> > The device in question only supports a 32 bit DMA mask so
> > dma_addressing_limited() should return true.  Bounce buffers are not
> > really usable on GPUs because they map so much memory.  If
> > dma_addressing_limited() returns false, that would explain it.
>
> Right, it appears to be the only part of the offending commit that
> *could* reasonably make any difference, so I'm primarily wondering if
> dma_get_required_mask() somehow gets confused.

Mikhail,

Can you see that dma_addressing_limited() and dma_get_required_mask()
return in this case?

Alex


>
> Thanks,
> Robin.


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Robin Murphy

On 2022-11-30 14:28, Alex Deucher wrote:

On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:


On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the
expected result at the point of radeon_ttm_init()? Disabling highmem is
presumably just hiding whatever problem exists, by throwing away all
  >32-bit RAM such that use_dma32 doesn't matter.


The device in question only supports a 32 bit DMA mask so
dma_addressing_limited() should return true.  Bounce buffers are not
really usable on GPUs because they map so much memory.  If
dma_addressing_limited() returns false, that would explain it.


Right, it appears to be the only part of the offending commit that 
*could* reasonably make any difference, so I'm primarily wondering if 
dma_get_required_mask() somehow gets confused.


Thanks,
Robin.


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Alex Deucher
On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy  wrote:
>
> On 2022-11-29 17:11, Mikhail Krylov wrote:
> > On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> >> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:
> >>>
> >>> On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
>  On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> >
> > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> >
>  [excessive quoting removed]
> >
> >>> So, is there any progress on this issue? I do understand it's not a 
> >>> high
> >>> priority one, and today I've checked it on 6.0 kernel, and
> >>> unfortunately, it still persists...
> >>>
> >>> I'm considering writing a patch that will allow user to override
> >>> need_dma32/dma_bits setting with a module parameter. I'll have some 
> >>> time
> >>> after the New Year for that.
> >>>
> >>> Is it at all possible that such a patch will be merged into kernel?
> >>>
> >> On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> >> wrote:
> >> Unless someone familiar with HIMEM can figure out what is going wrong
> >> we should just revert the patch.
> >>
> >> Alex
> >
> >
> > Okay, I was suggesting that mostly because
> >
> > a) it works for me with dma_bits = 40 (I understand that's what it is
> > without the original patch applied);
> >
> > b) there's a hint of uncertainity on this line
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > saying that for AGP dma_bits = 32 is the safest option, so apparently 
> > there are
> > setups, unlike mine, where dma_bits = 32 is better than 40.
> >
> > But I'm in no position to argue, just wanted to make myself clear.
> > I'm okay with rebuilding the kernel for my machine until the original
> > patch is reverted or any other fix is applied.
> 
>  What GPU do you have and is it AGP?  If it is AGP, does setting
>  radeon.agpmode=-1 also fix it?
> 
>  Alex
> >>>
> >>> That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> >>> help, it just makes 3D acceleration in games such as OpenArena stop
> >>> working.
> >>
> >> Just to confirm, is the board AGP or PCIe?
> >>
> >> Alex
> >
> > It is AGP. That's an old machine.
>
> Can you check whether dma_addressing_limited() is actually returning the
> expected result at the point of radeon_ttm_init()? Disabling highmem is
> presumably just hiding whatever problem exists, by throwing away all
>  >32-bit RAM such that use_dma32 doesn't matter.

The device in question only supports a 32 bit DMA mask so
dma_addressing_limited() should return true.  Bounce buffers are not
really usable on GPUs because they map so much memory.  If
dma_addressing_limited() returns false, that would explain it.

Alex


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Robin Murphy

On 2022-11-29 17:11, Mikhail Krylov wrote:

On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:

On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:


On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:

On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:


On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:


[excessive quoting removed]



So, is there any progress on this issue? I do understand it's not a high
priority one, and today I've checked it on 6.0 kernel, and
unfortunately, it still persists...

I'm considering writing a patch that will allow user to override
need_dma32/dma_bits setting with a module parameter. I'll have some time
after the New Year for that.

Is it at all possible that such a patch will be merged into kernel?


On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex



Okay, I was suggesting that mostly because

a) it works for me with dma_bits = 40 (I understand that's what it is
without the original patch applied);

b) there's a hint of uncertainity on this line
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
saying that for AGP dma_bits = 32 is the safest option, so apparently there are
setups, unlike mine, where dma_bits = 32 is better than 40.

But I'm in no position to argue, just wanted to make myself clear.
I'm okay with rebuilding the kernel for my machine until the original
patch is reverted or any other fix is applied.


What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


Just to confirm, is the board AGP or PCIe?

Alex


It is AGP. That's an old machine.


Can you check whether dma_addressing_limited() is actually returning the 
expected result at the point of radeon_ttm_init()? Disabling highmem is 
presumably just hiding whatever problem exists, by throwing away all 
>32-bit RAM such that use_dma32 doesn't matter.


Robin.


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Mikhail Krylov
On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> >
> > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> >
> > >>> [excessive quoting removed]
> >
> > >> So, is there any progress on this issue? I do understand it's not a high
> > >> priority one, and today I've checked it on 6.0 kernel, and
> > >> unfortunately, it still persists...
> > >>
> > >> I'm considering writing a patch that will allow user to override
> > >> need_dma32/dma_bits setting with a module parameter. I'll have some time
> > >> after the New Year for that.
> > >>
> > >> Is it at all possible that such a patch will be merged into kernel?
> > >>
> > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
> > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > we should just revert the patch.
> > >
> > > Alex
> >
> >
> > Okay, I was suggesting that mostly because
> >
> > a) it works for me with dma_bits = 40 (I understand that's what it is
> > without the original patch applied);
> >
> > b) there's a hint of uncertainity on this line
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > saying that for AGP dma_bits = 32 is the safest option, so apparently there 
> > are
> > setups, unlike mine, where dma_bits = 32 is better than 40.
> >
> > But I'm in no position to argue, just wanted to make myself clear.
> > I'm okay with rebuilding the kernel for my machine until the original
> > patch is reverted or any other fix is applied.
> 
> What GPU do you have and is it AGP?  If it is AGP, does setting
> radeon.agpmode=-1 also fix it?
> 
> Alex

That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
help, it just makes 3D acceleration in games such as OpenArena stop
working.


signature.asc
Description: PGP signature


Re: Screen corruption using radeon kernel driver

2022-11-30 Thread Mikhail Krylov
On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:
> >
> > On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> > > >
> > > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > > >
> > > > >>> [excessive quoting removed]
> > > >
> > > > >> So, is there any progress on this issue? I do understand it's not a 
> > > > >> high
> > > > >> priority one, and today I've checked it on 6.0 kernel, and
> > > > >> unfortunately, it still persists...
> > > > >>
> > > > >> I'm considering writing a patch that will allow user to override
> > > > >> need_dma32/dma_bits setting with a module parameter. I'll have some 
> > > > >> time
> > > > >> after the New Year for that.
> > > > >>
> > > > >> Is it at all possible that such a patch will be merged into kernel?
> > > > >>
> > > > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> > > > > wrote:
> > > > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > > > we should just revert the patch.
> > > > >
> > > > > Alex
> > > >
> > > >
> > > > Okay, I was suggesting that mostly because
> > > >
> > > > a) it works for me with dma_bits = 40 (I understand that's what it is
> > > > without the original patch applied);
> > > >
> > > > b) there's a hint of uncertainity on this line
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > > > saying that for AGP dma_bits = 32 is the safest option, so apparently 
> > > > there are
> > > > setups, unlike mine, where dma_bits = 32 is better than 40.
> > > >
> > > > But I'm in no position to argue, just wanted to make myself clear.
> > > > I'm okay with rebuilding the kernel for my machine until the original
> > > > patch is reverted or any other fix is applied.
> > >
> > > What GPU do you have and is it AGP?  If it is AGP, does setting
> > > radeon.agpmode=-1 also fix it?
> > >
> > > Alex
> >
> > That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> > help, it just makes 3D acceleration in games such as OpenArena stop
> > working.
> 
> Just to confirm, is the board AGP or PCIe?
> 
> Alex

It is AGP. That's an old machine.


signature.asc
Description: PGP signature


Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Alex Deucher
On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov  wrote:
>
> On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
> > On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
> > >
> > > On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
> > >
> > > >>> [excessive quoting removed]
> > >
> > > >> So, is there any progress on this issue? I do understand it's not a 
> > > >> high
> > > >> priority one, and today I've checked it on 6.0 kernel, and
> > > >> unfortunately, it still persists...
> > > >>
> > > >> I'm considering writing a patch that will allow user to override
> > > >> need_dma32/dma_bits setting with a module parameter. I'll have some 
> > > >> time
> > > >> after the New Year for that.
> > > >>
> > > >> Is it at all possible that such a patch will be merged into kernel?
> > > >>
> > > > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  
> > > > wrote:
> > > > Unless someone familiar with HIMEM can figure out what is going wrong
> > > > we should just revert the patch.
> > > >
> > > > Alex
> > >
> > >
> > > Okay, I was suggesting that mostly because
> > >
> > > a) it works for me with dma_bits = 40 (I understand that's what it is
> > > without the original patch applied);
> > >
> > > b) there's a hint of uncertainity on this line
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> > > saying that for AGP dma_bits = 32 is the safest option, so apparently 
> > > there are
> > > setups, unlike mine, where dma_bits = 32 is better than 40.
> > >
> > > But I'm in no position to argue, just wanted to make myself clear.
> > > I'm okay with rebuilding the kernel for my machine until the original
> > > patch is reverted or any other fix is applied.
> >
> > What GPU do you have and is it AGP?  If it is AGP, does setting
> > radeon.agpmode=-1 also fix it?
> >
> > Alex
>
> That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
> help, it just makes 3D acceleration in games such as OpenArena stop
> working.

Just to confirm, is the board AGP or PCIe?

Alex


Re: Screen corruption using radeon kernel driver

2022-11-29 Thread Alex Deucher
On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov  wrote:
>
> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
>
> >>> [excessive quoting removed]
>
> >> So, is there any progress on this issue? I do understand it's not a high
> >> priority one, and today I've checked it on 6.0 kernel, and
> >> unfortunately, it still persists...
> >>
> >> I'm considering writing a patch that will allow user to override
> >> need_dma32/dma_bits setting with a module parameter. I'll have some time
> >> after the New Year for that.
> >>
> >> Is it at all possible that such a patch will be merged into kernel?
> >>
> > On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
> > Unless someone familiar with HIMEM can figure out what is going wrong
> > we should just revert the patch.
> >
> > Alex
>
>
> Okay, I was suggesting that mostly because
>
> a) it works for me with dma_bits = 40 (I understand that's what it is
> without the original patch applied);
>
> b) there's a hint of uncertainity on this line
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
> saying that for AGP dma_bits = 32 is the safest option, so apparently there 
> are
> setups, unlike mine, where dma_bits = 32 is better than 40.
>
> But I'm in no position to argue, just wanted to make myself clear.
> I'm okay with rebuilding the kernel for my machine until the original
> patch is reverted or any other fix is applied.

What GPU do you have and is it AGP?  If it is AGP, does setting
radeon.agpmode=-1 also fix it?

Alex


Re: Screen corruption using radeon kernel driver

2022-11-28 Thread Alex Deucher
On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov  wrote:
>
> On Mon, Apr 25, 2022 at 01:22:04PM -0400, Alex Deucher wrote:
> > + dri-devel
> >
> > On Mon, Apr 25, 2022 at 3:33 AM Krylov Michael  wrote:
> > >
> > > Hello!
> > >
> > > After updating my Linux kernel from version 4.19 (Debian 10 version) to
> > > 5.10 (packaged with Debian 11), I've noticed that the image
> > > displayed on my older computer, 32-bit Pentium 4 using ATI Radeon X1950
> > > AGP video card is severely corrupted in the graphical (Xorg and Wayland)
> > > mode: all kinds of black and white stripes across the screen, some
> > > letters missing, etc.
> > >
> > > I've checked several options (Xorg drivers, Wayland instead of
> > > Xorg, radeon.agpmode=-1 in kernel command line and so on), but the
> > > problem persisted. I've managed to find that the problem was in the
> > > kernel, as everything worked well with 4.19 kernel with everything
> > > else being from Debian 11.
> > >
> > > I have managed to find the culprit of that corruption, that is the
> > > commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713 on the linux kernel.
> > > Reverting this commit and building the kernel with that commit reverted
> > > fixes the problem. Disabling HIMEM also gets rid of that problem. But it
> > > also leaves the system with less that 1G of RAM, which is, of course,
> > > undesirable.
> > >
> > > Apparently this problem is somewhat known, as I can tell after googling
> > > for the commit id, see this link for example:
> > > https://lkml.org/lkml/2020/1/9/518
> > >
> > > Mageia distro, for example, reverted this commit in the kernel they are
> > > building:
> > >
> > > http://sophie.zarb.org/distrib/Mageia/7/i586/by-pkgid/b9193a4f85192bc57f4d770fb9bb399c/files/32
> > >
> > > I've reported this bug to Debian bugtracker, checked the recent verion
> > > of the kernel (5.17), bug still persists. Here's a link to the Debian
> > > bug page:
> > >
> > > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993670
> > >
> > > I'm not sure if reverting this commit is the correct way to go, so if
> > > you need to check any changes/patches that I could apply and test on
> > > the real hardware, I'll be glad to do that (but please keep in mind
> > > that testing could take some time, I don't have access to this computer
> > > 24/7, but I'll do my best to respond ASAP).
> >
> > I would be happy to revert that commit.  I attempted to revert it a
> > year or so ago, but Christoph didn't want to.  He was going to look
> > further into it.  I was not able to repro the issue.  It seemed to be
> > related to highmem support.  You might try disabling that.  Here is
> > the previous thread for reference:
> > https://lists.freedesktop.org/archives/amd-gfx/2020-September/053922.html
> >
> > Alex
>
> So, is there any progress on this issue? I do understand it's not a high
> priority one, and today I've checked it on 6.0 kernel, and
> unfortunately, it still persists...
>
> I'm considering writing a patch that will allow user to override
> need_dma32/dma_bits setting with a module parameter. I'll have some time
> after the New Year for that.
>
> Is it at all possible that such a patch will be merged into kernel?

Unless someone familiar with HIMEM can figure out what is going wrong
we should just revert the patch.

Alex


Re: Screen corruption using radeon kernel driver

2022-05-18 Thread Mikhail Krylov
On Mon, Apr 25, 2022 at 01:22:04PM -0400, Alex Deucher wrote:
> + dri-devel
> 
> On Mon, Apr 25, 2022 at 3:33 AM Krylov Michael  wrote:
> >
> > Hello!
> >
> > After updating my Linux kernel from version 4.19 (Debian 10 version) to
> > 5.10 (packaged with Debian 11), I've noticed that the image
> > displayed on my older computer, 32-bit Pentium 4 using ATI Radeon X1950
> > AGP video card is severely corrupted in the graphical (Xorg and Wayland)
> > mode: all kinds of black and white stripes across the screen, some
> > letters missing, etc.
> >
> > I've checked several options (Xorg drivers, Wayland instead of
> > Xorg, radeon.agpmode=-1 in kernel command line and so on), but the
> > problem persisted. I've managed to find that the problem was in the
> > kernel, as everything worked well with 4.19 kernel with everything
> > else being from Debian 11.
> >
> > I have managed to find the culprit of that corruption, that is the
> > commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713 on the linux kernel.
> > Reverting this commit and building the kernel with that commit reverted
> > fixes the problem. Disabling HIMEM also gets rid of that problem. But it
> > also leaves the system with less that 1G of RAM, which is, of course,
> > undesirable.
> >
> > Apparently this problem is somewhat known, as I can tell after googling
> > for the commit id, see this link for example:
> > https://lkml.org/lkml/2020/1/9/518
> >
> > Mageia distro, for example, reverted this commit in the kernel they are
> > building:
> >
> > http://sophie.zarb.org/distrib/Mageia/7/i586/by-pkgid/b9193a4f85192bc57f4d770fb9bb399c/files/32
> >
> > I've reported this bug to Debian bugtracker, checked the recent verion
> > of the kernel (5.17), bug still persists. Here's a link to the Debian
> > bug page:
> >
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993670
> >
> > I'm not sure if reverting this commit is the correct way to go, so if
> > you need to check any changes/patches that I could apply and test on
> > the real hardware, I'll be glad to do that (but please keep in mind
> > that testing could take some time, I don't have access to this computer
> > 24/7, but I'll do my best to respond ASAP).
> 
> I would be happy to revert that commit.  I attempted to revert it a
> year or so ago, but Christoph didn't want to.  He was going to look
> further into it.  I was not able to repro the issue.  It seemed to be
> related to highmem support.  You might try disabling that.  Here is
> the previous thread for reference:
> https://lists.freedesktop.org/archives/amd-gfx/2020-September/053922.html
> 
> Alex

Yeah, I tried to disable HIMEM, and that indeed fixes the problem, but
it leaves me with less than 1G of available memory which is undesirable.


signature.asc
Description: PGP signature


Re: Screen corruption using radeon kernel driver

2022-04-25 Thread Alex Deucher
+ dri-devel

On Mon, Apr 25, 2022 at 3:33 AM Krylov Michael  wrote:
>
> Hello!
>
> After updating my Linux kernel from version 4.19 (Debian 10 version) to
> 5.10 (packaged with Debian 11), I've noticed that the image
> displayed on my older computer, 32-bit Pentium 4 using ATI Radeon X1950
> AGP video card is severely corrupted in the graphical (Xorg and Wayland)
> mode: all kinds of black and white stripes across the screen, some
> letters missing, etc.
>
> I've checked several options (Xorg drivers, Wayland instead of
> Xorg, radeon.agpmode=-1 in kernel command line and so on), but the
> problem persisted. I've managed to find that the problem was in the
> kernel, as everything worked well with 4.19 kernel with everything
> else being from Debian 11.
>
> I have managed to find the culprit of that corruption, that is the
> commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713 on the linux kernel.
> Reverting this commit and building the kernel with that commit reverted
> fixes the problem. Disabling HIMEM also gets rid of that problem. But it
> also leaves the system with less that 1G of RAM, which is, of course,
> undesirable.
>
> Apparently this problem is somewhat known, as I can tell after googling
> for the commit id, see this link for example:
> https://lkml.org/lkml/2020/1/9/518
>
> Mageia distro, for example, reverted this commit in the kernel they are
> building:
>
> http://sophie.zarb.org/distrib/Mageia/7/i586/by-pkgid/b9193a4f85192bc57f4d770fb9bb399c/files/32
>
> I've reported this bug to Debian bugtracker, checked the recent verion
> of the kernel (5.17), bug still persists. Here's a link to the Debian
> bug page:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993670
>
> I'm not sure if reverting this commit is the correct way to go, so if
> you need to check any changes/patches that I could apply and test on
> the real hardware, I'll be glad to do that (but please keep in mind
> that testing could take some time, I don't have access to this computer
> 24/7, but I'll do my best to respond ASAP).

I would be happy to revert that commit.  I attempted to revert it a
year or so ago, but Christoph didn't want to.  He was going to look
further into it.  I was not able to repro the issue.  It seemed to be
related to highmem support.  You might try disabling that.  Here is
the previous thread for reference:
https://lists.freedesktop.org/archives/amd-gfx/2020-September/053922.html

Alex