Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-20 Thread Neel Chauhan

Hi Hans,

Sorry if I have to bother you again.

On 2021-06-20 01:36, Hans Petter Selasky wrote:

sg_dma_address() is zero, because the memory hasn't been loaded.


Makes sense!


You need to handle two cases there:

When r->iobase is -1 and when it is not.

I suspect you should add r->iobase to the sg_dma_address() only and
only when it is non -1.


I have addressed it in my code.

Hans, while your approach is correct (that's also done in Linux), for 
some reason it still crashes on vm_page_insert(). I must be doing "pa = 
" wrong.


Stacktrace: https://misc.neelc.org/drm-kmod/stacktrace2.txt

I have a separate function for getting the physical address: 
https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/intel_freebsd.c#L219


This crash happens when "iobase == -1", so we execute Lines 228-231:
-
} else {
struct sgt_iter sgt = __sgt_iter(sgl, 0);
pa = (sgt.pfn + (sgt.curr >> PAGE_SHIFT)) << PAGE_SHIFT;
}

sgt.curr is zero, since (I believe) it's the first iteration.

I have copied the Linux approach and ported what I felt was necessary, I 
am worried I am doing this wrong.



Also, there is a superfluous "pa = " in the beginning of the function.


(Hopefully) I have cleaned it up.


--HPS


-Neel (nc@)



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-20 Thread Hans Petter Selasky

On 6/20/21 7:32 AM, Neel Chauhan wrote:

On 2021-06-18 20:03, Neel Chauhan wrote:

Apparently, the vm_start values is for some reason coming as 0 when it
is passed into vm_fault_cpu(). That's why it's giving these errors: of
course the address at 0 is mapped, it is (probably) used by the
kernel.


An update: The vm_start 0 seems to be expected. I checked the values 
with printf()s.


I have posted this on Twitter, and am considering hiring a kernel 
consultant to help if I am unable to do this on my own.


So I am guessing this line (Line 231) is incorrect:

     pa = sg_dma_address(sgl);

Source: 
https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L231 


Hi Neel,

sg_dma_address() is zero, because the memory hasn't been loaded.

You need to handle two cases there:

When r->iobase is -1 and when it is not.

I suspect you should add r->iobase to the sg_dma_address() only and only 
when it is non -1.


Also, there is a superfluous "pa = " in the beginning of the function.

--HPS




Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-19 Thread Neel Chauhan

On 2021-06-18 20:03, Neel Chauhan wrote:

Apparently, the vm_start values is for some reason coming as 0 when it
is passed into vm_fault_cpu(). That's why it's giving these errors: of
course the address at 0 is mapped, it is (probably) used by the
kernel.


An update: The vm_start 0 seems to be expected. I checked the values 
with printf()s.


I have posted this on Twitter, and am considering hiring a kernel 
consultant to help if I am unable to do this on my own.


So I am guessing this line (Line 231) is incorrect:

pa = sg_dma_address(sgl);

Source: 
https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L231


-Neel (nc@)



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-18 Thread Neel Chauhan

On 2021-06-18 12:19, Neel Chauhan wrote:

I'm not sure.  I'll just note that the Linux code appears to be trying
to map a set of pages belonging to a scatter-gather list.  Taking the
physical address of the first page and assuming that all subsequent
pages are physically contiguous doesn't seem correct, but this is what
is happening in that loop, since each iteration simply increments pa 
by

PAGE_SIZE.


Good news! While I don't have a fix, I have figured out what the issue 
is.


Apparently, the vm_start values is for some reason coming as 0 when it 
is passed into vm_fault_cpu(). That's why it's giving these errors: of 
course the address at 0 is mapped, it is (probably) used by the kernel.


I will look more into it tomorrow, but my brother has come over to 
Seattle from Connecticut so I may not be able to dedicate as much time 
as I would like to.


-Neel (nc@)



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-18 Thread Neel Chauhan

Hi Mark,

On 2021-06-18 06:57, Mark Johnston wrote:

That seems surprising, since the vm_page_grab() call should return the
page at pidx if one exists.


I believe that's not the case. I did pringfs


Any hints on where the physical address is? Should we have an
FreeBSD-specific "pa" argument for the physical address if it's 
needed?


I'm not sure.  I'll just note that the Linux code appears to be trying
to map a set of pages belonging to a scatter-gather list.  Taking the
physical address of the first page and assuming that all subsequent
pages are physically contiguous doesn't seem correct, but this is what
is happening in that loop, since each iteration simply increments pa by
PAGE_SIZE.


Based on this email and our private one, and prior debugging it seems 
this panic comes on the first iteration. Something must be


vm_page_grab() returns NULL, and then we run:

if (!vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL))
goto retry;
if (vm_page_insert(m, vm_obj, pidx)) {
vm_page_xunbusy(m);
VM_OBJECT_WUNLOCK(vm_obj);
vm_wait(NULL);
VM_OBJECT_WLOCK(vm_obj);
goto retry;
}

Source: 
https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L245


The first if() doesn't panic, but we panic at the second one, it doesn't 
go into or jumo over the statement.


I could use for() or for_each_sg_page() and they both panic.

I am almost feeling I'd have to hire a FreeBSD kernel consultant and/or 
sell my TigerLake laptop for an AMD Ryzen-based laptop.


-Neel (nc@)



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-18 Thread Mark Johnston
On Wed, Jun 16, 2021 at 10:16:56PM -0700, Neel Chauhan wrote:
> Hi Mark,
> 
> Thank you so much for your response!
> 
> On 2021-06-16 14:05, Mark Johnston wrote:
> > 
> > The function in question appears to implement a device page fault
> > handler.  In FreeBSD, such handlers are responsible only for ensuring
> > that the requested page(s) are present in the VM object backing the
> > mapping that was faulted on.  The generic fault handler in
> > sys/vm/vm_fault.c is responsible for actually updating the faulting
> > process' page tables by calling pmap_enter().  In other words, our 
> > fault
> > handler interface is quite different from OpenBSD's and their example
> > should not be followed exactly.  Adding a vm_object_busy() call in the
> > loop will silence the assertion I guess but the handler is still wrong.
> 
> Good to hear. No wonder why I still had the "blank screen of death" with 
> the (now previous) code.
> 
> > If you look further down at vm_fault_gtt() (and in earlier versions of
> > the DRM drivers, i915_gem_fault()), the remap_io_mapping()
> > implementation in the LinuxKPI does basically what I'm describing.
> > Something similar is required for vm_fault_cpu(), though I don't quite
> > understand when vm_fault_cpu() is supposed to be used.
> 
> I have some code to implement remap_io_sg() based on remap_io_mapping(), 
> but it's not complete.
> 
> I am still trying to figure out how to get the physical address. Right 
> now, I have:
> 
> https://github.com/neelchauhan/drm-kmod/commit/92a3d9bb585acb55c1dab9c5ed11190f7db73ecf
> 
> My get_pfn() function (Lines 221-231) attempts to get the physical 
> address, and it is called at Line 248.
>
> Note: This code is probably incorrect, since it gives me an "page 
> already inserted" panic.

That seems surprising, since the vm_page_grab() call should return the
page at pidx if one exists.

> Any hints on where the physical address is? Should we have an 
> FreeBSD-specific "pa" argument for the physical address if it's needed?

I'm not sure.  I'll just note that the Linux code appears to be trying
to map a set of pages belonging to a scatter-gather list.  Taking the
physical address of the first page and assuming that all subsequent
pages are physically contiguous doesn't seem correct, but this is what
is happening in that loop, since each iteration simply increments pa by
PAGE_SIZE.



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-16 Thread Neel Chauhan

Hi Mark,

Thank you so much for your response!

On 2021-06-16 14:05, Mark Johnston wrote:


The function in question appears to implement a device page fault
handler.  In FreeBSD, such handlers are responsible only for ensuring
that the requested page(s) are present in the VM object backing the
mapping that was faulted on.  The generic fault handler in
sys/vm/vm_fault.c is responsible for actually updating the faulting
process' page tables by calling pmap_enter().  In other words, our 
fault

handler interface is quite different from OpenBSD's and their example
should not be followed exactly.  Adding a vm_object_busy() call in the
loop will silence the assertion I guess but the handler is still wrong.


Good to hear. No wonder why I still had the "blank screen of death" with 
the (now previous) code.



If you look further down at vm_fault_gtt() (and in earlier versions of
the DRM drivers, i915_gem_fault()), the remap_io_mapping()
implementation in the LinuxKPI does basically what I'm describing.
Something similar is required for vm_fault_cpu(), though I don't quite
understand when vm_fault_cpu() is supposed to be used.


I have some code to implement remap_io_sg() based on remap_io_mapping(), 
but it's not complete.


I am still trying to figure out how to get the physical address. Right 
now, I have:


https://github.com/neelchauhan/drm-kmod/commit/92a3d9bb585acb55c1dab9c5ed11190f7db73ecf

My get_pfn() function (Lines 221-231) attempts to get the physical 
address, and it is called at Line 248.


Note: This code is probably incorrect, since it gives me an "page 
already inserted" panic.


Any hints on where the physical address is? Should we have an 
FreeBSD-specific "pa" argument for the physical address if it's needed?


Sorry for the kernel newbie questions.

-Neel (nc@)



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-16 Thread Mark Johnston
On Tue, Jun 15, 2021 at 08:36:07PM -0700, Neel Chauhan wrote:
> Hi current@,
> 
> First off, sorry if I spammed developers@ and other mailing lists with 
> my previous message, and to bz@/hselasky@/manu@ sent so many duplicate 
> emails.
> 
> Right now, I am attempting to update the drm-kmod driver to the Linux 
> 5.7 code, and am having an issue with the pmap lock. I am new-ish to the 
> kernel, meaning not a whole lot of "experience", but do have patches in 
> src.
> 
> But like it not we need kernel newbies, they're the next generation of 
> experts. If we don't, we'd be the next Minix with **zero** development 
> since Tanenbaum retired.
> 
> Going back, the code in question is here: 
> https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346
> 
> The lines important are 346-356, but lines of interest are also the 
> non-"#ifdef __linux__" sections of vm_fault_cpu().
> 
> The code gives this error: panic: Assertion 
> vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
> 
> I have attached the core dump log.
> 
> To those who aren't graphics driver experts, it happens when I load Xorg 
> when Xorg attempts to map the I/O to userspace. But I feel this is more 
> of me not using page locks correctly (which is needed for the pmap), or 
> maybe a linuxkpi issue, rather than a graphics-specific issue.
> 
> I spent days on this (all my non-$DAYJOB hours at one point + all my 
> weekends) and haven't figured out the locks completely. Does anyone have 
> suggestions to what I'm doing wrong in my code and locks?
> 
> If it is important, OpenBSD's version of this code is here: 
> https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459
>  
> (lines 459-523, but some calls are unsurprisingly different).
> 

The function in question appears to implement a device page fault
handler.  In FreeBSD, such handlers are responsible only for ensuring
that the requested page(s) are present in the VM object backing the
mapping that was faulted on.  The generic fault handler in
sys/vm/vm_fault.c is responsible for actually updating the faulting
process' page tables by calling pmap_enter().  In other words, our fault
handler interface is quite different from OpenBSD's and their example
should not be followed exactly.  Adding a vm_object_busy() call in the
loop will silence the assertion I guess but the handler is still wrong.

If you look further down at vm_fault_gtt() (and in earlier versions of
the DRM drivers, i915_gem_fault()), the remap_io_mapping()
implementation in the LinuxKPI does basically what I'm describing.
Something similar is required for vm_fault_cpu(), though I don't quite
understand when vm_fault_cpu() is supposed to be used.



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-16 Thread Hans Petter Selasky

Hi Neel,

On 6/16/21 5:28 PM, Neel Chauhan wrote:

Hi,

On 2021-06-16 00:35, Hans Petter Selasky wrote:

Do you have the full backtrace?


Yes.

I have attached a stack trace in the previous email, but if you didn't 
get it, I have uploaded it to GitHub: 
https://gist.github.com/neelchauhan/437bd10239f84c563aafb37ab440029a



Doesn't this code work in the current DRM - kmod? What changed? Did
you perhaps miss a patch?


I think there is new code with Linux 5.6 which changes how this is done.

I have been attempting to make a FreeBSD equivalent, but it panics. It 
is **not** from missing Linux commits since I believe I added them all.


The code in my GH repo: 
https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 



I think the following changes are needed. CC'ing Jeff.



for_each_sg_page(pages->sgl, _iter, pages->nents, 0) {
pmap_t pmap = vm_map_pmap(map);
struct vm_page *pa = sg_page_iter_page(_iter);
VM_OBJECT_RLOCK(pa->object);


Try adding this:
vm_object_busy(pa->object);


if (pmap_enter(pmap, va, pa, 0, flags, 0)) {


Try adding this:
vm_object_unbusy(pa->object);
VM_OBJECT_RUNLOCK(pa->object);


err = -ENOMEM;
break;
}

Try adding this:
vm_object_unbusy(pa->object);


VM_OBJECT_RUNLOCK(pa->object);
va += PAGE_SIZE;


--HPS



Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-16 Thread Neel Chauhan

Hi,

On 2021-06-16 00:35, Hans Petter Selasky wrote:

Do you have the full backtrace?


Yes.

I have attached a stack trace in the previous email, but if you didn't 
get it, I have uploaded it to GitHub: 
https://gist.github.com/neelchauhan/437bd10239f84c563aafb37ab440029a



Doesn't this code work in the current DRM - kmod? What changed? Did
you perhaps miss a patch?


I think there is new code with Linux 5.6 which changes how this is done.

I have been attempting to make a FreeBSD equivalent, but it panics. It 
is **not** from missing Linux commits since I believe I added them all.


The code in my GH repo: 
https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346



--HPS


-Neel (nc@)




Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455

2021-06-16 Thread Hans Petter Selasky

On 6/16/21 5:36 AM, Neel Chauhan wrote:

Hi current@,

First off, sorry if I spammed developers@ and other mailing lists with 
my previous message, and to bz@/hselasky@/manu@ sent so many duplicate 
emails.


Right now, I am attempting to update the drm-kmod driver to the Linux 
5.7 code, and am having an issue with the pmap lock. I am new-ish to the 
kernel, meaning not a whole lot of "experience", but do have patches in 
src.


But like it not we need kernel newbies, they're the next generation of 
experts. If we don't, we'd be the next Minix with **zero** development 
since Tanenbaum retired.


Going back, the code in question is here: 
https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 



The lines important are 346-356, but lines of interest are also the 
non-"#ifdef __linux__" sections of vm_fault_cpu().


The code gives this error: panic: Assertion 
vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455


I have attached the core dump log.

To those who aren't graphics driver experts, it happens when I load Xorg 
when Xorg attempts to map the I/O to userspace. But I feel this is more 
of me not using page locks correctly (which is needed for the pmap), or 
maybe a linuxkpi issue, rather than a graphics-specific issue.


I spent days on this (all my non-$DAYJOB hours at one point + all my 
weekends) and haven't figured out the locks completely. Does anyone have 
suggestions to what I'm doing wrong in my code and locks?


If it is important, OpenBSD's version of this code is here: 
https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459 
(lines 459-523, but some calls are unsurprisingly different).


Hope you all can help.


Hi,

Do you have the full backtrace?

Doesn't this code work in the current DRM - kmod? What changed? Did you 
perhaps miss a patch?


--HPS