Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
Hi Hans, Sorry if I have to bother you again. On 2021-06-20 01:36, Hans Petter Selasky wrote: sg_dma_address() is zero, because the memory hasn't been loaded. Makes sense! You need to handle two cases there: When r->iobase is -1 and when it is not. I suspect you should add r->iobase to the sg_dma_address() only and only when it is non -1. I have addressed it in my code. Hans, while your approach is correct (that's also done in Linux), for some reason it still crashes on vm_page_insert(). I must be doing "pa = " wrong. Stacktrace: https://misc.neelc.org/drm-kmod/stacktrace2.txt I have a separate function for getting the physical address: https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/intel_freebsd.c#L219 This crash happens when "iobase == -1", so we execute Lines 228-231: - } else { struct sgt_iter sgt = __sgt_iter(sgl, 0); pa = (sgt.pfn + (sgt.curr >> PAGE_SHIFT)) << PAGE_SHIFT; } sgt.curr is zero, since (I believe) it's the first iteration. I have copied the Linux approach and ported what I felt was necessary, I am worried I am doing this wrong. Also, there is a superfluous "pa = " in the beginning of the function. (Hopefully) I have cleaned it up. --HPS -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On 6/20/21 7:32 AM, Neel Chauhan wrote: On 2021-06-18 20:03, Neel Chauhan wrote: Apparently, the vm_start values is for some reason coming as 0 when it is passed into vm_fault_cpu(). That's why it's giving these errors: of course the address at 0 is mapped, it is (probably) used by the kernel. An update: The vm_start 0 seems to be expected. I checked the values with printf()s. I have posted this on Twitter, and am considering hiring a kernel consultant to help if I am unable to do this on my own. So I am guessing this line (Line 231) is incorrect: pa = sg_dma_address(sgl); Source: https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L231 Hi Neel, sg_dma_address() is zero, because the memory hasn't been loaded. You need to handle two cases there: When r->iobase is -1 and when it is not. I suspect you should add r->iobase to the sg_dma_address() only and only when it is non -1. Also, there is a superfluous "pa = " in the beginning of the function. --HPS
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On 2021-06-18 20:03, Neel Chauhan wrote: Apparently, the vm_start values is for some reason coming as 0 when it is passed into vm_fault_cpu(). That's why it's giving these errors: of course the address at 0 is mapped, it is (probably) used by the kernel. An update: The vm_start 0 seems to be expected. I checked the values with printf()s. I have posted this on Twitter, and am considering hiring a kernel consultant to help if I am unable to do this on my own. So I am guessing this line (Line 231) is incorrect: pa = sg_dma_address(sgl); Source: https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L231 -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On 2021-06-18 12:19, Neel Chauhan wrote: I'm not sure. I'll just note that the Linux code appears to be trying to map a set of pages belonging to a scatter-gather list. Taking the physical address of the first page and assuming that all subsequent pages are physically contiguous doesn't seem correct, but this is what is happening in that loop, since each iteration simply increments pa by PAGE_SIZE. Good news! While I don't have a fix, I have figured out what the issue is. Apparently, the vm_start values is for some reason coming as 0 when it is passed into vm_fault_cpu(). That's why it's giving these errors: of course the address at 0 is mapped, it is (probably) used by the kernel. I will look more into it tomorrow, but my brother has come over to Seattle from Connecticut so I may not be able to dedicate as much time as I would like to. -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
Hi Mark, On 2021-06-18 06:57, Mark Johnston wrote: That seems surprising, since the vm_page_grab() call should return the page at pidx if one exists. I believe that's not the case. I did pringfs Any hints on where the physical address is? Should we have an FreeBSD-specific "pa" argument for the physical address if it's needed? I'm not sure. I'll just note that the Linux code appears to be trying to map a set of pages belonging to a scatter-gather list. Taking the physical address of the first page and assuming that all subsequent pages are physically contiguous doesn't seem correct, but this is what is happening in that loop, since each iteration simply increments pa by PAGE_SIZE. Based on this email and our private one, and prior debugging it seems this panic comes on the first iteration. Something must be vm_page_grab() returns NULL, and then we run: if (!vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL)) goto retry; if (vm_page_insert(m, vm_obj, pidx)) { vm_page_xunbusy(m); VM_OBJECT_WUNLOCK(vm_obj); vm_wait(NULL); VM_OBJECT_WLOCK(vm_obj); goto retry; } Source: https://github.com/neelchauhan/drm-kmod/blob/d0eee96973ee0772e977b813678f92c5becf0507/drivers/gpu/drm/i915/intel_freebsd.c#L245 The first if() doesn't panic, but we panic at the second one, it doesn't go into or jumo over the statement. I could use for() or for_each_sg_page() and they both panic. I am almost feeling I'd have to hire a FreeBSD kernel consultant and/or sell my TigerLake laptop for an AMD Ryzen-based laptop. -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On Wed, Jun 16, 2021 at 10:16:56PM -0700, Neel Chauhan wrote: > Hi Mark, > > Thank you so much for your response! > > On 2021-06-16 14:05, Mark Johnston wrote: > > > > The function in question appears to implement a device page fault > > handler. In FreeBSD, such handlers are responsible only for ensuring > > that the requested page(s) are present in the VM object backing the > > mapping that was faulted on. The generic fault handler in > > sys/vm/vm_fault.c is responsible for actually updating the faulting > > process' page tables by calling pmap_enter(). In other words, our > > fault > > handler interface is quite different from OpenBSD's and their example > > should not be followed exactly. Adding a vm_object_busy() call in the > > loop will silence the assertion I guess but the handler is still wrong. > > Good to hear. No wonder why I still had the "blank screen of death" with > the (now previous) code. > > > If you look further down at vm_fault_gtt() (and in earlier versions of > > the DRM drivers, i915_gem_fault()), the remap_io_mapping() > > implementation in the LinuxKPI does basically what I'm describing. > > Something similar is required for vm_fault_cpu(), though I don't quite > > understand when vm_fault_cpu() is supposed to be used. > > I have some code to implement remap_io_sg() based on remap_io_mapping(), > but it's not complete. > > I am still trying to figure out how to get the physical address. Right > now, I have: > > https://github.com/neelchauhan/drm-kmod/commit/92a3d9bb585acb55c1dab9c5ed11190f7db73ecf > > My get_pfn() function (Lines 221-231) attempts to get the physical > address, and it is called at Line 248. > > Note: This code is probably incorrect, since it gives me an "page > already inserted" panic. That seems surprising, since the vm_page_grab() call should return the page at pidx if one exists. > Any hints on where the physical address is? Should we have an > FreeBSD-specific "pa" argument for the physical address if it's needed? I'm not sure. I'll just note that the Linux code appears to be trying to map a set of pages belonging to a scatter-gather list. Taking the physical address of the first page and assuming that all subsequent pages are physically contiguous doesn't seem correct, but this is what is happening in that loop, since each iteration simply increments pa by PAGE_SIZE.
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
Hi Mark, Thank you so much for your response! On 2021-06-16 14:05, Mark Johnston wrote: The function in question appears to implement a device page fault handler. In FreeBSD, such handlers are responsible only for ensuring that the requested page(s) are present in the VM object backing the mapping that was faulted on. The generic fault handler in sys/vm/vm_fault.c is responsible for actually updating the faulting process' page tables by calling pmap_enter(). In other words, our fault handler interface is quite different from OpenBSD's and their example should not be followed exactly. Adding a vm_object_busy() call in the loop will silence the assertion I guess but the handler is still wrong. Good to hear. No wonder why I still had the "blank screen of death" with the (now previous) code. If you look further down at vm_fault_gtt() (and in earlier versions of the DRM drivers, i915_gem_fault()), the remap_io_mapping() implementation in the LinuxKPI does basically what I'm describing. Something similar is required for vm_fault_cpu(), though I don't quite understand when vm_fault_cpu() is supposed to be used. I have some code to implement remap_io_sg() based on remap_io_mapping(), but it's not complete. I am still trying to figure out how to get the physical address. Right now, I have: https://github.com/neelchauhan/drm-kmod/commit/92a3d9bb585acb55c1dab9c5ed11190f7db73ecf My get_pfn() function (Lines 221-231) attempts to get the physical address, and it is called at Line 248. Note: This code is probably incorrect, since it gives me an "page already inserted" panic. Any hints on where the physical address is? Should we have an FreeBSD-specific "pa" argument for the physical address if it's needed? Sorry for the kernel newbie questions. -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On Tue, Jun 15, 2021 at 08:36:07PM -0700, Neel Chauhan wrote: > Hi current@, > > First off, sorry if I spammed developers@ and other mailing lists with > my previous message, and to bz@/hselasky@/manu@ sent so many duplicate > emails. > > Right now, I am attempting to update the drm-kmod driver to the Linux > 5.7 code, and am having an issue with the pmap lock. I am new-ish to the > kernel, meaning not a whole lot of "experience", but do have patches in > src. > > But like it not we need kernel newbies, they're the next generation of > experts. If we don't, we'd be the next Minix with **zero** development > since Tanenbaum retired. > > Going back, the code in question is here: > https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 > > The lines important are 346-356, but lines of interest are also the > non-"#ifdef __linux__" sections of vm_fault_cpu(). > > The code gives this error: panic: Assertion > vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455 > > I have attached the core dump log. > > To those who aren't graphics driver experts, it happens when I load Xorg > when Xorg attempts to map the I/O to userspace. But I feel this is more > of me not using page locks correctly (which is needed for the pmap), or > maybe a linuxkpi issue, rather than a graphics-specific issue. > > I spent days on this (all my non-$DAYJOB hours at one point + all my > weekends) and haven't figured out the locks completely. Does anyone have > suggestions to what I'm doing wrong in my code and locks? > > If it is important, OpenBSD's version of this code is here: > https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459 > > (lines 459-523, but some calls are unsurprisingly different). > The function in question appears to implement a device page fault handler. In FreeBSD, such handlers are responsible only for ensuring that the requested page(s) are present in the VM object backing the mapping that was faulted on. The generic fault handler in sys/vm/vm_fault.c is responsible for actually updating the faulting process' page tables by calling pmap_enter(). In other words, our fault handler interface is quite different from OpenBSD's and their example should not be followed exactly. Adding a vm_object_busy() call in the loop will silence the assertion I guess but the handler is still wrong. If you look further down at vm_fault_gtt() (and in earlier versions of the DRM drivers, i915_gem_fault()), the remap_io_mapping() implementation in the LinuxKPI does basically what I'm describing. Something similar is required for vm_fault_cpu(), though I don't quite understand when vm_fault_cpu() is supposed to be used.
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
Hi Neel, On 6/16/21 5:28 PM, Neel Chauhan wrote: Hi, On 2021-06-16 00:35, Hans Petter Selasky wrote: Do you have the full backtrace? Yes. I have attached a stack trace in the previous email, but if you didn't get it, I have uploaded it to GitHub: https://gist.github.com/neelchauhan/437bd10239f84c563aafb37ab440029a Doesn't this code work in the current DRM - kmod? What changed? Did you perhaps miss a patch? I think there is new code with Linux 5.6 which changes how this is done. I have been attempting to make a FreeBSD equivalent, but it panics. It is **not** from missing Linux commits since I believe I added them all. The code in my GH repo: https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 I think the following changes are needed. CC'ing Jeff. for_each_sg_page(pages->sgl, _iter, pages->nents, 0) { pmap_t pmap = vm_map_pmap(map); struct vm_page *pa = sg_page_iter_page(_iter); VM_OBJECT_RLOCK(pa->object); Try adding this: vm_object_busy(pa->object); if (pmap_enter(pmap, va, pa, 0, flags, 0)) { Try adding this: vm_object_unbusy(pa->object); VM_OBJECT_RUNLOCK(pa->object); err = -ENOMEM; break; } Try adding this: vm_object_unbusy(pa->object); VM_OBJECT_RUNLOCK(pa->object); va += PAGE_SIZE; --HPS
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
Hi, On 2021-06-16 00:35, Hans Petter Selasky wrote: Do you have the full backtrace? Yes. I have attached a stack trace in the previous email, but if you didn't get it, I have uploaded it to GitHub: https://gist.github.com/neelchauhan/437bd10239f84c563aafb37ab440029a Doesn't this code work in the current DRM - kmod? What changed? Did you perhaps miss a patch? I think there is new code with Linux 5.6 which changes how this is done. I have been attempting to make a FreeBSD equivalent, but it panics. It is **not** from missing Linux commits since I believe I added them all. The code in my GH repo: https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 --HPS -Neel (nc@)
Re: Kernel/driver hacking: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455
On 6/16/21 5:36 AM, Neel Chauhan wrote: Hi current@, First off, sorry if I spammed developers@ and other mailing lists with my previous message, and to bz@/hselasky@/manu@ sent so many duplicate emails. Right now, I am attempting to update the drm-kmod driver to the Linux 5.7 code, and am having an issue with the pmap lock. I am new-ish to the kernel, meaning not a whole lot of "experience", but do have patches in src. But like it not we need kernel newbies, they're the next generation of experts. If we don't, we'd be the next Minix with **zero** development since Tanenbaum retired. Going back, the code in question is here: https://github.com/neelchauhan/drm-kmod/blob/5.7-wip/drivers/gpu/drm/i915/gem/i915_gem_mman.c#L346 The lines important are 346-356, but lines of interest are also the non-"#ifdef __linux__" sections of vm_fault_cpu(). The code gives this error: panic: Assertion vm_object_busied((m->object)) failed at /usr/src/sys/vm/vm_page.c:5455 I have attached the core dump log. To those who aren't graphics driver experts, it happens when I load Xorg when Xorg attempts to map the I/O to userspace. But I feel this is more of me not using page locks correctly (which is needed for the pmap), or maybe a linuxkpi issue, rather than a graphics-specific issue. I spent days on this (all my non-$DAYJOB hours at one point + all my weekends) and haven't figured out the locks completely. Does anyone have suggestions to what I'm doing wrong in my code and locks? If it is important, OpenBSD's version of this code is here: https://github.com/openbsd/src/blob/2207c4325726fdc5c4bcd0011af0fdf7d3dab137/sys/dev/pci/drm/i915/gem/i915_gem_mman.c#L459 (lines 459-523, but some calls are unsurprisingly different). Hope you all can help. Hi, Do you have the full backtrace? Doesn't this code work in the current DRM - kmod? What changed? Did you perhaps miss a patch? --HPS