Re: [PATCH v5 05/15] mm/frame-vector: Use FOLL_LONGTERM
On Fri, 2020-11-06 at 08:55 -0400, Jason Gunthorpe wrote: > On Fri, Nov 06, 2020 at 11:27:59AM +0100, Daniel Vetter wrote: > > On Fri, Nov 6, 2020 at 11:01 AM Daniel Vetter > > wrote: > > > On Fri, Nov 6, 2020 at 5:08 AM John Hubbard > > > wrote: > > > > On 11/5/20 4:49 AM, Jason Gunthorpe wrote: > > > > > On Thu, Nov 05, 2020 at 10:25:24AM +0100, Daniel Vetter > > > > > wrote: > > > > > > > /* > > > > > > > * If we can't determine whether or not a pte is > > > > > > > special, then fail immediately > > > > > > > * for ptes. Note, we can still pin HugeTLB and THP as > > > > > > > these are guaranteed not > > > > > > > * to be special. > > > > > > > * > > > > > > > * For a futex to be placed on a THP tail page, > > > > > > > get_futex_key requires a > > > > > > > * get_user_pages_fast_only implementation that can pin > > > > > > > pages. Thus it's still > > > > > > > * useful to have gup_huge_pmd even if we can't operate > > > > > > > on ptes. > > > > > > > */ > > > > > > > > > > > > We support hugepage faults in gpu drivers since recently, > > > > > > and I'm not > > > > > > seeing a pud_mkhugespecial anywhere. So not sure this > > > > > > works, but probably > > > > > > just me missing something again. > > > > > > > > > > It means ioremap can't create an IO page PUD, it has to be > > > > > broken up. > > > > > > > > > > Does ioremap even create anything larger than PTEs? > > > > > > gpu drivers also tend to use vmf_insert_pfn* directly, so we can > > > do > > > on-demand paging and move buffers around. From what I glanced for > > > lowest level we to the pte_mkspecial correctly (I think I > > > convinced > > > myself that vm_insert_pfn does that), but for pud/pmd levels it > > > seems > > > just yolo. > > > > So I dug around a bit more and ttm sets PFN_DEV | PFN_MAP to get > > past > > the various pft_t_devmap checks (see e.g. > > vmf_insert_pfn_pmd_prot()). > > x86-64 has ARCH_HAS_PTE_DEVMAP, and gup.c seems to handle these > > specially, but frankly I got totally lost in what this does. > > The fact vmf_insert_pfn_pmd_prot() has all those BUG_ON's to prevent > putting VM_PFNMAP pages into the page tables seems like a big red > flag. > > The comment seems to confirm what we are talking about here: > > /* >* If we had pmd_special, we could avoid all these > restrictions, >* but we need to be consistent with PTEs and architectures > that >* can't support a 'special' bit. >*/ > > ie without the ability to mark special we can't block fast gup and > anyone who does O_DIRECT on these ranges will crash the kernel when > it > tries to convert a IO page into a struct page. > > Should be easy enough to directly test? > > Putting non-struct page PTEs into a VMA without setting VM_PFNMAP > just > seems horribly wrong to me. Although core mm special huge-page support is currently quite limited, some time ago, I extended the pre-existing vma_is_dax() to vma_is_special_huge(): /** * vma_is_special_huge - Are transhuge page-table entries considered special? * @vma: Pointer to the struct vm_area_struct to consider * * Whether transhuge page-table entries are considered "special" following * the definition in vm_normal_page(). * * Return: true if transhuge page-table entries should be considered special, * false otherwise. */ static inline bool vma_is_special_huge(const struct vm_area_struct *vma) { return vma_is_dax(vma) || (vma->vm_file && (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))); } meaning that currently all transhuge page-table-entries in a PFNMAP or MIXEDMAP vma are considered "special". The number of calls to this function (mainly in the page-splitting code) is quite limited so replacing it with a more elaborate per-page-table-entry scheme would, I guess, definitely be possible. Although all functions using it would need to require a fallback path for architectures not supporting it. /Thomas > > Jason
Re: [PATCH 00/17] Convert TTM to the new fence interface.
On 2014-07-09 14:29, Maarten Lankhorst wrote: This series applies on top of the driver-core-next branch of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git Before converting ttm to the new fence interface I had to fix some drivers to require a reservation before poking with fence_obj. After flipping the switch RCU becomes available instead, and the extra reservations can be dropped again. :-) I've done at least basic testing on all the drivers I've converted at some point, but more testing is definitely welcomed! I'm currently on vacation for the next couple of weeks, so I can't test or review but otherwise Acked-by: Thomas Hellstrom --- Maarten Lankhorst (17): drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers drm/ttm: kill off some members to ttm_validate_buffer drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence drm/ttm: call ttm_bo_wait while inside a reservation drm/ttm: kill fence_lock drm/nouveau: rework to new fence interface drm/radeon: add timeout argument to radeon_fence_wait_seq drm/radeon: use common fence implementation for fences drm/qxl: rework to new fence interface drm/vmwgfx: get rid of different types of fence_flags entirely drm/vmwgfx: rework to new fence interface drm/ttm: flip the switch, and convert to dma_fence drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep drm/radeon: use rcu waits in some ioctls drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab drm/ttm: use rcu in core ttm drivers/gpu/drm/nouveau/core/core/event.c |4 drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +--- drivers/gpu/drm/nouveau/nouveau_display.c | 25 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 431 +++-- drivers/gpu/drm/nouveau/nouveau_fence.h | 22 + drivers/gpu/drm/nouveau/nouveau_gem.c | 55 +--- drivers/gpu/drm/nouveau/nv04_fence.c |4 drivers/gpu/drm/nouveau/nv10_fence.c |4 drivers/gpu/drm/nouveau/nv17_fence.c |2 drivers/gpu/drm/nouveau/nv50_fence.c |2 drivers/gpu/drm/nouveau/nv84_fence.c | 11 - drivers/gpu/drm/qxl/Makefile |2 drivers/gpu/drm/qxl/qxl_cmd.c |7 drivers/gpu/drm/qxl/qxl_debugfs.c | 16 + drivers/gpu/drm/qxl/qxl_drv.h | 20 - drivers/gpu/drm/qxl/qxl_fence.c | 91 -- drivers/gpu/drm/qxl/qxl_kms.c |1 drivers/gpu/drm/qxl/qxl_object.c |2 drivers/gpu/drm/qxl/qxl_object.h |6 drivers/gpu/drm/qxl/qxl_release.c | 172 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 93 -- drivers/gpu/drm/radeon/radeon.h | 15 - drivers/gpu/drm/radeon/radeon_cs.c| 10 + drivers/gpu/drm/radeon/radeon_device.c| 60 drivers/gpu/drm/radeon/radeon_display.c | 21 + drivers/gpu/drm/radeon/radeon_fence.c | 283 +++ drivers/gpu/drm/radeon/radeon_gem.c | 19 + drivers/gpu/drm/radeon/radeon_object.c|8 - drivers/gpu/drm/radeon/radeon_ttm.c | 34 -- drivers/gpu/drm/radeon/radeon_uvd.c | 10 - drivers/gpu/drm/radeon/radeon_vm.c| 16 + drivers/gpu/drm/ttm/ttm_bo.c | 187 ++--- drivers/gpu/drm/ttm/ttm_bo_util.c | 28 -- drivers/gpu/drm/ttm/ttm_bo_vm.c |3 drivers/gpu/drm/ttm/ttm_execbuf_util.c| 146 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c| 47 --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |1 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 24 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 329 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 35 +- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 43 +-- include/drm/ttm/ttm_bo_api.h |7 include/drm/ttm/ttm_bo_driver.h | 29 -- include/drm/ttm/ttm_execbuf_util.h| 22 + 44 files changed, 1256 insertions(+), 1150 deletions(-) delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/17] Convert TTM to the new fence interface.
On 2014-07-09 14:29, Maarten Lankhorst wrote: This series applies on top of the driver-core-next branch of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git Before converting ttm to the new fence interface I had to fix some drivers to require a reservation before poking with fence_obj. After flipping the switch RCU becomes available instead, and the extra reservations can be dropped again. :-) I've done at least basic testing on all the drivers I've converted at some point, but more testing is definitely welcomed! I'm currently on vacation for the next couple of weeks, so I can't test or review but otherwise Acked-by: Thomas Hellstrom thellst...@vmware.com --- Maarten Lankhorst (17): drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers drm/ttm: kill off some members to ttm_validate_buffer drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence drm/ttm: call ttm_bo_wait while inside a reservation drm/ttm: kill fence_lock drm/nouveau: rework to new fence interface drm/radeon: add timeout argument to radeon_fence_wait_seq drm/radeon: use common fence implementation for fences drm/qxl: rework to new fence interface drm/vmwgfx: get rid of different types of fence_flags entirely drm/vmwgfx: rework to new fence interface drm/ttm: flip the switch, and convert to dma_fence drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep drm/radeon: use rcu waits in some ioctls drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab drm/ttm: use rcu in core ttm drivers/gpu/drm/nouveau/core/core/event.c |4 drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +--- drivers/gpu/drm/nouveau/nouveau_display.c | 25 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 431 +++-- drivers/gpu/drm/nouveau/nouveau_fence.h | 22 + drivers/gpu/drm/nouveau/nouveau_gem.c | 55 +--- drivers/gpu/drm/nouveau/nv04_fence.c |4 drivers/gpu/drm/nouveau/nv10_fence.c |4 drivers/gpu/drm/nouveau/nv17_fence.c |2 drivers/gpu/drm/nouveau/nv50_fence.c |2 drivers/gpu/drm/nouveau/nv84_fence.c | 11 - drivers/gpu/drm/qxl/Makefile |2 drivers/gpu/drm/qxl/qxl_cmd.c |7 drivers/gpu/drm/qxl/qxl_debugfs.c | 16 + drivers/gpu/drm/qxl/qxl_drv.h | 20 - drivers/gpu/drm/qxl/qxl_fence.c | 91 -- drivers/gpu/drm/qxl/qxl_kms.c |1 drivers/gpu/drm/qxl/qxl_object.c |2 drivers/gpu/drm/qxl/qxl_object.h |6 drivers/gpu/drm/qxl/qxl_release.c | 172 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 93 -- drivers/gpu/drm/radeon/radeon.h | 15 - drivers/gpu/drm/radeon/radeon_cs.c| 10 + drivers/gpu/drm/radeon/radeon_device.c| 60 drivers/gpu/drm/radeon/radeon_display.c | 21 + drivers/gpu/drm/radeon/radeon_fence.c | 283 +++ drivers/gpu/drm/radeon/radeon_gem.c | 19 + drivers/gpu/drm/radeon/radeon_object.c|8 - drivers/gpu/drm/radeon/radeon_ttm.c | 34 -- drivers/gpu/drm/radeon/radeon_uvd.c | 10 - drivers/gpu/drm/radeon/radeon_vm.c| 16 + drivers/gpu/drm/ttm/ttm_bo.c | 187 ++--- drivers/gpu/drm/ttm/ttm_bo_util.c | 28 -- drivers/gpu/drm/ttm/ttm_bo_vm.c |3 drivers/gpu/drm/ttm/ttm_execbuf_util.c| 146 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c| 47 --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |1 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 24 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 329 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 35 +- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 43 +-- include/drm/ttm/ttm_bo_api.h |7 include/drm/ttm/ttm_bo_driver.h | 29 -- include/drm/ttm/ttm_execbuf_util.h| 22 + 44 files changed, 1256 insertions(+), 1150 deletions(-) delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] reservation: cross-device reservation support
On 9/28/12 2:43 PM, Maarten Lankhorst wrote: This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices. The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer. "Anything is done with the buffer" should probably be rephrased, as different members of the buffer struct may be protected by different locks. It may not be practical or even possible to protect all buffer members with reservation. Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway. As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code. Signed-off-by: Maarten Lankhorst --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Makefile |2 drivers/base/reservation.c| 285 + include/linux/reservation.h | 179 ++ 4 files changed, 467 insertions(+), 1 deletion(-) create mode 100644 drivers/base/reservation.c create mode 100644 include/linux/reservation.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index ad14396..24e6e80 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c !Edrivers/base/fence.c !Iinclude/linux/fence.h !Iinclude/linux/seqno-fence.h +!Edrivers/base/reservation.c +!Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 0026563..f6f731d 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA)+= node.o diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c new file mode 100644 index 000..93e2d9f --- /dev/null +++ b/drivers/base/reservation.c @@ -0,0 +1,285 @@ +/* + * Copyright (C) 2012 Canonical Ltd + * + * Based on bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ +/* + * Authors: Thomas Hellstrom + */ + +#include +#include +#include +#include +#include + +atomic64_t reservation_counter = ATOMIC64_INIT(1); +EXPORT_SYMBOL(reservation_counter); + +int +object_reserve(struct reservation_object *obj, bool intr, bool no_wait, + reservation_ticket_t *ticket) +{ + int ret; + u64 sequence = ticket ? ticket->seqno : 1; + u64 oldseq; + + while (unlikely(oldseq = atomic64_cmpxchg(>reserved, 0, sequence))) { + + /** +* Deadlock avoidance for multi-obj reserving. +*/ + if (sequence > 1 && oldseq > 1) { +
Re: [PATCH 4/5] reservation: cross-device reservation support
On 9/28/12 2:43 PM, Maarten Lankhorst wrote: This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices. The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer. Anything is done with the buffer should probably be rephrased, as different members of the buffer struct may be protected by different locks. It may not be practical or even possible to protect all buffer members with reservation. Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway. As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code. Signed-off-by: Maarten Lankhorst maarten.lankho...@canonical.com --- Documentation/DocBook/device-drivers.tmpl |2 drivers/base/Makefile |2 drivers/base/reservation.c| 285 + include/linux/reservation.h | 179 ++ 4 files changed, 467 insertions(+), 1 deletion(-) create mode 100644 drivers/base/reservation.c create mode 100644 include/linux/reservation.h diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index ad14396..24e6e80 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c !Edrivers/base/fence.c !Iinclude/linux/fence.h !Iinclude/linux/seqno-fence.h +!Edrivers/base/reservation.c +!Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c /sect1 diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 0026563..f6f731d 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA)+= node.o diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c new file mode 100644 index 000..93e2d9f --- /dev/null +++ b/drivers/base/reservation.c @@ -0,0 +1,285 @@ +/* + * Copyright (C) 2012 Canonical Ltd + * + * Based on bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * Software), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **/ +/* + * Authors: Thomas Hellstrom thellstrom-at-vmware-dot-com + */ + +#include linux/fence.h +#include linux/reservation.h +#include linux/export.h +#include linux/sched.h +#include linux/slab.h + +atomic64_t reservation_counter = ATOMIC64_INIT(1); +EXPORT_SYMBOL(reservation_counter); + +int +object_reserve(struct reservation_object *obj, bool intr, bool no_wait, + reservation_ticket_t *ticket) +{ + int ret; + u64 sequence = ticket ? ticket-seqno : 1; + u64 oldseq; + + while (unlikely(oldseq = atomic64_cmpxchg(obj-reserved, 0, sequence))) { + + /** +
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Jerome Glisse wrote: On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote: Jerome Glisse wrote: > On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote: >> Keith Packard wrote: >> > On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: >> > >> > >> >> It might be possible to find schemes that work around this. One way >> >> could possibly be to have a buffer mapping -and validate order for >> >> shared buffers. >> >> >> > >> > If mapping never blocks on anything other than the fence, then there >> > isn't any dead lock possibility. What this says is that ordering of >> > rendering between clients is *not DRMs problem*. I think that's a good >> > solution though; I want to let multiple apps work on DRM-able memory >> > with their own CPU without contention. >> > >> > I don't recall if Eric layed out the proposed rules, but: >> > >> > 1) Map never blocks on map. Clients interested in dealing with this >> > are on their own. >> > >> > 2) Submit blocks on map. You must unmap all buffers before submitting >> > them. Doing the relocations in the kernel makes this all possible. >> > >> > 3) Map blocks on the fence from submit. We can play with pending the >> > flush until the app asks for the buffer back, or we can play with >> > figuring out when flushes are useful automatically. Doesn't matter >> > if the policy is in the kernel. >> > >> > I'm interested in making deadlock avoidence trivial and eliminating >> any >> > map-map contention. >> > >> > >> It's rare to have two clients access the same buffer at the same time. >> In what situation will this occur? >> >> If we think of map / unmap and validation / fence as taking a buffer >> mutex either for the CPU or for the GPU, that's the way implementation >> is done today. The CPU side of the mutex should IIRC be per-client >> recursive. OTOH, the TTM implementation won't stop the CPU from >> accessing the buffer when it is unmapped, but then you're on your own. >> "Mutexes" need to be taken in the correct order, otherwise a deadlock >> will occur, and GL will, as outlined in Eric's illustration, more or >> less encourage us to take buffers in the "incorrect" order. >> >> In essence what you propose is to eliminate the deadlock problem by just >> avoiding taking the buffer mutex unless we know the GPU has it. I see >> two problems with this: >> >> * It will encourage different DRI clients to simultaneously access >> the same buffer. >> * Inter-client and GPU data coherence can be guaranteed if we issue >> a mb() / write-combining flush with the unmap operation (which, >> BTW, I'm not sure is done today). Otherwise it is up to the >> clients, and very easy to forget. >> >> I'm a bit afraid we might in the future regret taking the easy way out? >> >> OTOH, letting DRM resolve the deadlock by unmapping and remapping shared >> buffers in the correct order might not be the best one either. It will >> certainly mean some CPU overhead and what if we have to do the same with >> buffer validation? (Yes for some operations with thousands and thousands >> of relocations, the user space validation might need to stay). >> >> Personally, I'm slightly biased towards having DRM resolve the deadlock, >> but I think any solution will do as long as the implications and why we >> choose a certain solution are totally clear. >> >> For item 3) above the kernel must have a way to issue a flush when >> needed for buffer eviction. >> The current implementation also requires the buffer to be completely >> flushed before mapping. >> Other than that the flushing policy is currently completely up to the >> DRM drivers. >> >> /Thomas > > I might say stupid things as i don't think i fully understand all > the input to this problem. Anyway here is my thought on all this: > > 1) First client map never block (as in Keith layout) except on >fence from drm side (point 3 in Keith layout) > But is there really a need for this except to avoid the above-mentioned deadlock? As I'm not too up to date with all the possibilities the servers and GL clients may be using shared buffers, I need some enlightenment :). Could we have an example, please? I think the current main consumer would be compiz or any other compositor which use TextureFromPixmap, i really think the we might see furthe
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Jerome Glisse wrote: On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote: Keith Packard wrote: > On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: > > >> It might be possible to find schemes that work around this. One way >> could possibly be to have a buffer mapping -and validate order for >> shared buffers. >> > > If mapping never blocks on anything other than the fence, then there > isn't any dead lock possibility. What this says is that ordering of > rendering between clients is *not DRMs problem*. I think that's a good > solution though; I want to let multiple apps work on DRM-able memory > with their own CPU without contention. > > I don't recall if Eric layed out the proposed rules, but: > > 1) Map never blocks on map. Clients interested in dealing with this > are on their own. > > 2) Submit blocks on map. You must unmap all buffers before submitting > them. Doing the relocations in the kernel makes this all possible. > > 3) Map blocks on the fence from submit. We can play with pending the > flush until the app asks for the buffer back, or we can play with > figuring out when flushes are useful automatically. Doesn't matter > if the policy is in the kernel. > > I'm interested in making deadlock avoidence trivial and eliminating any > map-map contention. > > It's rare to have two clients access the same buffer at the same time. In what situation will this occur? If we think of map / unmap and validation / fence as taking a buffer mutex either for the CPU or for the GPU, that's the way implementation is done today. The CPU side of the mutex should IIRC be per-client recursive. OTOH, the TTM implementation won't stop the CPU from accessing the buffer when it is unmapped, but then you're on your own. "Mutexes" need to be taken in the correct order, otherwise a deadlock will occur, and GL will, as outlined in Eric's illustration, more or less encourage us to take buffers in the "incorrect" order. In essence what you propose is to eliminate the deadlock problem by just avoiding taking the buffer mutex unless we know the GPU has it. I see two problems with this: * It will encourage different DRI clients to simultaneously access the same buffer. * Inter-client and GPU data coherence can be guaranteed if we issue a mb() / write-combining flush with the unmap operation (which, BTW, I'm not sure is done today). Otherwise it is up to the clients, and very easy to forget. I'm a bit afraid we might in the future regret taking the easy way out? OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). Personally, I'm slightly biased towards having DRM resolve the deadlock, but I think any solution will do as long as the implications and why we choose a certain solution are totally clear. For item 3) above the kernel must have a way to issue a flush when needed for buffer eviction. The current implementation also requires the buffer to be completely flushed before mapping. Other than that the flushing policy is currently completely up to the DRM drivers. /Thomas I might say stupid things as i don't think i fully understand all the input to this problem. Anyway here is my thought on all this: 1) First client map never block (as in Keith layout) except on fence from drm side (point 3 in Keith layout) But is there really a need for this except to avoid the above-mentioned deadlock? As I'm not too up to date with all the possibilities the servers and GL clients may be using shared buffers, I need some enlightenment :). Could we have an example, please? 4) We got 2 gpu queue: - one with pending apps ask in which we do all stuff necessary to be done before submitting (locking buffer, validation, ...) for instance we might wait here for each buffer that are still mapped by some other apps in user space - one run queue in which we add each apps ask that are now ready to be submited to the gpu This is getting closer and closer to a GPU scheduler, an interesting topic indeed. Perhaps we should have a separate discussion on the needs and requirements for such a thing? Regards, /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Keith Packard wrote: On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: It might be possible to find schemes that work around this. One way could possibly be to have a buffer mapping -and validate order for shared buffers. If mapping never blocks on anything other than the fence, then there isn't any dead lock possibility. What this says is that ordering of rendering between clients is *not DRMs problem*. I think that's a good solution though; I want to let multiple apps work on DRM-able memory with their own CPU without contention. I don't recall if Eric layed out the proposed rules, but: 1) Map never blocks on map. Clients interested in dealing with this are on their own. 2) Submit blocks on map. You must unmap all buffers before submitting them. Doing the relocations in the kernel makes this all possible. 3) Map blocks on the fence from submit. We can play with pending the flush until the app asks for the buffer back, or we can play with figuring out when flushes are useful automatically. Doesn't matter if the policy is in the kernel. I'm interested in making deadlock avoidence trivial and eliminating any map-map contention. It's rare to have two clients access the same buffer at the same time. In what situation will this occur? If we think of map / unmap and validation / fence as taking a buffer mutex either for the CPU or for the GPU, that's the way implementation is done today. The CPU side of the mutex should IIRC be per-client recursive. OTOH, the TTM implementation won't stop the CPU from accessing the buffer when it is unmapped, but then you're on your own. "Mutexes" need to be taken in the correct order, otherwise a deadlock will occur, and GL will, as outlined in Eric's illustration, more or less encourage us to take buffers in the "incorrect" order. In essence what you propose is to eliminate the deadlock problem by just avoiding taking the buffer mutex unless we know the GPU has it. I see two problems with this: * It will encourage different DRI clients to simultaneously access the same buffer. * Inter-client and GPU data coherence can be guaranteed if we issue a mb() / write-combining flush with the unmap operation (which, BTW, I'm not sure is done today). Otherwise it is up to the clients, and very easy to forget. I'm a bit afraid we might in the future regret taking the easy way out? OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). Personally, I'm slightly biased towards having DRM resolve the deadlock, but I think any solution will do as long as the implications and why we choose a certain solution are totally clear. For item 3) above the kernel must have a way to issue a flush when needed for buffer eviction. The current implementation also requires the buffer to be completely flushed before mapping. Other than that the flushing policy is currently completely up to the DRM drivers. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Keith Packard wrote: On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: It might be possible to find schemes that work around this. One way could possibly be to have a buffer mapping -and validate order for shared buffers. If mapping never blocks on anything other than the fence, then there isn't any dead lock possibility. What this says is that ordering of rendering between clients is *not DRMs problem*. I think that's a good solution though; I want to let multiple apps work on DRM-able memory with their own CPU without contention. I don't recall if Eric layed out the proposed rules, but: 1) Map never blocks on map. Clients interested in dealing with this are on their own. 2) Submit blocks on map. You must unmap all buffers before submitting them. Doing the relocations in the kernel makes this all possible. 3) Map blocks on the fence from submit. We can play with pending the flush until the app asks for the buffer back, or we can play with figuring out when flushes are useful automatically. Doesn't matter if the policy is in the kernel. I'm interested in making deadlock avoidence trivial and eliminating any map-map contention. It's rare to have two clients access the same buffer at the same time. In what situation will this occur? If we think of map / unmap and validation / fence as taking a buffer mutex either for the CPU or for the GPU, that's the way implementation is done today. The CPU side of the mutex should IIRC be per-client recursive. OTOH, the TTM implementation won't stop the CPU from accessing the buffer when it is unmapped, but then you're on your own. Mutexes need to be taken in the correct order, otherwise a deadlock will occur, and GL will, as outlined in Eric's illustration, more or less encourage us to take buffers in the incorrect order. In essence what you propose is to eliminate the deadlock problem by just avoiding taking the buffer mutex unless we know the GPU has it. I see two problems with this: * It will encourage different DRI clients to simultaneously access the same buffer. * Inter-client and GPU data coherence can be guaranteed if we issue a mb() / write-combining flush with the unmap operation (which, BTW, I'm not sure is done today). Otherwise it is up to the clients, and very easy to forget. I'm a bit afraid we might in the future regret taking the easy way out? OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). Personally, I'm slightly biased towards having DRM resolve the deadlock, but I think any solution will do as long as the implications and why we choose a certain solution are totally clear. For item 3) above the kernel must have a way to issue a flush when needed for buffer eviction. The current implementation also requires the buffer to be completely flushed before mapping. Other than that the flushing policy is currently completely up to the DRM drivers. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Jerome Glisse wrote: On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote: Keith Packard wrote: On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: It might be possible to find schemes that work around this. One way could possibly be to have a buffer mapping -and validate order for shared buffers. If mapping never blocks on anything other than the fence, then there isn't any dead lock possibility. What this says is that ordering of rendering between clients is *not DRMs problem*. I think that's a good solution though; I want to let multiple apps work on DRM-able memory with their own CPU without contention. I don't recall if Eric layed out the proposed rules, but: 1) Map never blocks on map. Clients interested in dealing with this are on their own. 2) Submit blocks on map. You must unmap all buffers before submitting them. Doing the relocations in the kernel makes this all possible. 3) Map blocks on the fence from submit. We can play with pending the flush until the app asks for the buffer back, or we can play with figuring out when flushes are useful automatically. Doesn't matter if the policy is in the kernel. I'm interested in making deadlock avoidence trivial and eliminating any map-map contention. It's rare to have two clients access the same buffer at the same time. In what situation will this occur? If we think of map / unmap and validation / fence as taking a buffer mutex either for the CPU or for the GPU, that's the way implementation is done today. The CPU side of the mutex should IIRC be per-client recursive. OTOH, the TTM implementation won't stop the CPU from accessing the buffer when it is unmapped, but then you're on your own. Mutexes need to be taken in the correct order, otherwise a deadlock will occur, and GL will, as outlined in Eric's illustration, more or less encourage us to take buffers in the incorrect order. In essence what you propose is to eliminate the deadlock problem by just avoiding taking the buffer mutex unless we know the GPU has it. I see two problems with this: * It will encourage different DRI clients to simultaneously access the same buffer. * Inter-client and GPU data coherence can be guaranteed if we issue a mb() / write-combining flush with the unmap operation (which, BTW, I'm not sure is done today). Otherwise it is up to the clients, and very easy to forget. I'm a bit afraid we might in the future regret taking the easy way out? OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). Personally, I'm slightly biased towards having DRM resolve the deadlock, but I think any solution will do as long as the implications and why we choose a certain solution are totally clear. For item 3) above the kernel must have a way to issue a flush when needed for buffer eviction. The current implementation also requires the buffer to be completely flushed before mapping. Other than that the flushing policy is currently completely up to the DRM drivers. /Thomas I might say stupid things as i don't think i fully understand all the input to this problem. Anyway here is my thought on all this: 1) First client map never block (as in Keith layout) except on fence from drm side (point 3 in Keith layout) But is there really a need for this except to avoid the above-mentioned deadlock? As I'm not too up to date with all the possibilities the servers and GL clients may be using shared buffers, I need some enlightenment :). Could we have an example, please? 4) We got 2 gpu queue: - one with pending apps ask in which we do all stuff necessary to be done before submitting (locking buffer, validation, ...) for instance we might wait here for each buffer that are still mapped by some other apps in user space - one run queue in which we add each apps ask that are now ready to be submited to the gpu This is getting closer and closer to a GPU scheduler, an interesting topic indeed. Perhaps we should have a separate discussion on the needs and requirements for such a thing? Regards, /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Jerome Glisse wrote: On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote: Jerome Glisse wrote: On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote: Keith Packard wrote: On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote: It might be possible to find schemes that work around this. One way could possibly be to have a buffer mapping -and validate order for shared buffers. If mapping never blocks on anything other than the fence, then there isn't any dead lock possibility. What this says is that ordering of rendering between clients is *not DRMs problem*. I think that's a good solution though; I want to let multiple apps work on DRM-able memory with their own CPU without contention. I don't recall if Eric layed out the proposed rules, but: 1) Map never blocks on map. Clients interested in dealing with this are on their own. 2) Submit blocks on map. You must unmap all buffers before submitting them. Doing the relocations in the kernel makes this all possible. 3) Map blocks on the fence from submit. We can play with pending the flush until the app asks for the buffer back, or we can play with figuring out when flushes are useful automatically. Doesn't matter if the policy is in the kernel. I'm interested in making deadlock avoidence trivial and eliminating any map-map contention. It's rare to have two clients access the same buffer at the same time. In what situation will this occur? If we think of map / unmap and validation / fence as taking a buffer mutex either for the CPU or for the GPU, that's the way implementation is done today. The CPU side of the mutex should IIRC be per-client recursive. OTOH, the TTM implementation won't stop the CPU from accessing the buffer when it is unmapped, but then you're on your own. Mutexes need to be taken in the correct order, otherwise a deadlock will occur, and GL will, as outlined in Eric's illustration, more or less encourage us to take buffers in the incorrect order. In essence what you propose is to eliminate the deadlock problem by just avoiding taking the buffer mutex unless we know the GPU has it. I see two problems with this: * It will encourage different DRI clients to simultaneously access the same buffer. * Inter-client and GPU data coherence can be guaranteed if we issue a mb() / write-combining flush with the unmap operation (which, BTW, I'm not sure is done today). Otherwise it is up to the clients, and very easy to forget. I'm a bit afraid we might in the future regret taking the easy way out? OTOH, letting DRM resolve the deadlock by unmapping and remapping shared buffers in the correct order might not be the best one either. It will certainly mean some CPU overhead and what if we have to do the same with buffer validation? (Yes for some operations with thousands and thousands of relocations, the user space validation might need to stay). Personally, I'm slightly biased towards having DRM resolve the deadlock, but I think any solution will do as long as the implications and why we choose a certain solution are totally clear. For item 3) above the kernel must have a way to issue a flush when needed for buffer eviction. The current implementation also requires the buffer to be completely flushed before mapping. Other than that the flushing policy is currently completely up to the DRM drivers. /Thomas I might say stupid things as i don't think i fully understand all the input to this problem. Anyway here is my thought on all this: 1) First client map never block (as in Keith layout) except on fence from drm side (point 3 in Keith layout) But is there really a need for this except to avoid the above-mentioned deadlock? As I'm not too up to date with all the possibilities the servers and GL clients may be using shared buffers, I need some enlightenment :). Could we have an example, please? I think the current main consumer would be compiz or any other compositor which use TextureFromPixmap, i really think the we might see further use of sharing graphical data among applications, i got example here at my work of such use case even thought this doesn't use GL at all but another indoor protocol. Another possible case where such buffer sharing might occur is inside same application with two or more GL context (i am ready to bet that we already have some where example of such application). I was actually referring to an example where two clients need to have a buffer mapped and access it at exactly the same time. If there is such a situation, we have no other choice than to drop the buffer locking on map. If there isn't we can at least consider other alternatives that resolve the deadlock issue but that also will help clients synchronize and keep data coherent. /Thomas - To unsubscribe from this list: send the line unsubscribe linux
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Eric Anholt wrote: On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote: Hi, The patch is too big to fit on the list and I've no idea how we could break it down further, it just happens to be a lot of new code.. http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt The patch header and diffstat are below, This isn't for integration yet but we'd like an initial review by anyone with the spare time and inclination, there is a lot stuff relying on getting this code bet into shape and into the kernel but any cleanups people can suggest now especially to the user interfaces would be appreciated as once we set that stuff in stone it'll be a pain to change... also it doesn't have any driver side code, this is just the generic pieces. I'll post the intel 915 side code later but there isn't that much to it.. It applies on top of my drm-2.6 git tree drm-mm branch - This patch brings in the TTM (Translation Table Maps) memory management system from Thomas Hellstrom at Tungsten Graphics. This patch only covers the core functionality and changes to the drm core. The TTM memory manager enables dynamic mapping of memory objects in and out of the graphic card accessible memory (e.g. AGP), this implements the AGP backend for TTM to be used by the i915 driver. I've been slow responding, but we've been talking a lot on IRC and at Intel about the TTM interface recently, and trying to come up with a concensus between us as to what we'd like to see. 1) Multiplexed ioctls hurt The first issue I have with this version is the userland interface. You've got two ioctls for buffer management and once for fence management, yet these 3 ioctls are actually just attempting to be generic interfaces for around 25 actual functions you want to call (except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence). So there are quasi-generic arguments to these ioctls, where most of the members are ignored by any given function, but it's not obvious to the caller which ones. There's no comments or anything as to what the arguments to these functions are or what exactly they do. We've got 100 generic ioctl numbers allocated and unused still, so I don't think we should be shy about having separate ioctls for separate functions, if this is the interface we expect to use going forward. Right. This interface was in its infancy when there were only (without looking to deeply) three generic IOCTLS left. this is definitely a good point and I agree completely. 2) Early microoptimizations There's also apparently an unused way to chain these function calls in a single ioctl call for the buffer object ioctl. This is one of a couple of microoptimizations at the expense of code clarity which have bothered me while reviewing the TTM code, when I'm guessing no testing was done to see if it was actually a bottleneck. Yes. The function chaining is currently only used to validate buffer lists. I still believe it is needed for that functionality, a bit depending on what we want to be able to change when a buffer is validated. But I can currently not see any other use for it in the future. 3) Fencing and flushing troubles I'm definitely concerned by the fencing interface. Right now, the driver is flushing caches and fencing every buffer with EXE and its driver-specific FLUSHED flag in dispatching command buffers. We almost surely don't want to be flushing for every batch buffer just in case someone wants to do CPU reads from something. However, with the current mechanism, if I fence my operation with just EXE and no flush, then somebody goes to map that buffer, they'll wait for the fence, but no flush will be emitted. The interface we've been imagining wouldn't have driver-specific fence flags, but instead be only a marker of when command execution has passed a certain point (which is what fencing is about, anyway). In validating buffers, you would pass whether they're for READ or WRITE as we currently do, and you'd put it on the unfenced read/write lists as appropriate. Add one buffer object function for emitting the flush, which would then determine whether the next fence-all-unfenced call would cover just the list of unfenced reads or the list of both unfenced reads and unfenced writes. Then, in mapping, check if it's on the unfenced-writes list and emit the flush and fence, and then wait for a fence on the buffer before continuing with the mapping. Right. This functionality is actually available in the current code, except that we have only one unfenced list and the fence flags indicate what type of flushes are needed. There's even an implementation of the intel sync flush mechanism in the fencing code. If the batch-buffer flush is omitted the driver specific flush flag is not needed and the fence mechanism will do a sync flush whenever
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Eric Anholt wrote: On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote: Hi, The patch is too big to fit on the list and I've no idea how we could break it down further, it just happens to be a lot of new code.. http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt The patch header and diffstat are below, This isn't for integration yet but we'd like an initial review by anyone with the spare time and inclination, there is a lot stuff relying on getting this code bet into shape and into the kernel but any cleanups people can suggest now especially to the user interfaces would be appreciated as once we set that stuff in stone it'll be a pain to change... also it doesn't have any driver side code, this is just the generic pieces. I'll post the intel 915 side code later but there isn't that much to it.. It applies on top of my drm-2.6 git tree drm-mm branch - This patch brings in the TTM (Translation Table Maps) memory management system from Thomas Hellstrom at Tungsten Graphics. This patch only covers the core functionality and changes to the drm core. The TTM memory manager enables dynamic mapping of memory objects in and out of the graphic card accessible memory (e.g. AGP), this implements the AGP backend for TTM to be used by the i915 driver. I've been slow responding, but we've been talking a lot on IRC and at Intel about the TTM interface recently, and trying to come up with a concensus between us as to what we'd like to see. 1) Multiplexed ioctls hurt The first issue I have with this version is the userland interface. You've got two ioctls for buffer management and once for fence management, yet these 3 ioctls are actually just attempting to be generic interfaces for around 25 actual functions you want to call (except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence). So there are quasi-generic arguments to these ioctls, where most of the members are ignored by any given function, but it's not obvious to the caller which ones. There's no comments or anything as to what the arguments to these functions are or what exactly they do. We've got 100 generic ioctl numbers allocated and unused still, so I don't think we should be shy about having separate ioctls for separate functions, if this is the interface we expect to use going forward. Right. This interface was in its infancy when there were only (without looking to deeply) three generic IOCTLS left. this is definitely a good point and I agree completely. 2) Early microoptimizations There's also apparently an unused way to chain these function calls in a single ioctl call for the buffer object ioctl. This is one of a couple of microoptimizations at the expense of code clarity which have bothered me while reviewing the TTM code, when I'm guessing no testing was done to see if it was actually a bottleneck. Yes. The function chaining is currently only used to validate buffer lists. I still believe it is needed for that functionality, a bit depending on what we want to be able to change when a buffer is validated. But I can currently not see any other use for it in the future. 3) Fencing and flushing troubles I'm definitely concerned by the fencing interface. Right now, the driver is flushing caches and fencing every buffer with EXE and its driver-specific FLUSHED flag in dispatching command buffers. We almost surely don't want to be flushing for every batch buffer just in case someone wants to do CPU reads from something. However, with the current mechanism, if I fence my operation with just EXE and no flush, then somebody goes to map that buffer, they'll wait for the fence, but no flush will be emitted. The interface we've been imagining wouldn't have driver-specific fence flags, but instead be only a marker of when command execution has passed a certain point (which is what fencing is about, anyway). In validating buffers, you would pass whether they're for READ or WRITE as we currently do, and you'd put it on the unfenced read/write lists as appropriate. Add one buffer object function for emitting the flush, which would then determine whether the next fence-all-unfenced call would cover just the list of unfenced reads or the list of both unfenced reads and unfenced writes. Then, in mapping, check if it's on the unfenced-writes list and emit the flush and fence, and then wait for a fence on the buffer before continuing with the mapping. Right. This functionality is actually available in the current code, except that we have only one unfenced list and the fence flags indicate what type of flushes are needed. There's even an implementation of the intel sync flush mechanism in the fencing code. If the batch-buffer flush is omitted the driver specific flush flag is not needed and the fence mechanism will do a sync flush whenever
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Dave Airlie wrote: Most likely in doxygen as that is what Mesa uses and the intersection of developers is higher in that area, I'll take it as a task to try and kerneldoc the drm at some stage.. - what's with the /proc interface? Don't add new proc code for non-process related things. This should all go into sysfs somewhere. And yes, I know /proc/dri/ is there today, but don't add new stuff please. Well we should move all that stuff to sysfs, but we have all the infrastructure for publishing this stuff under /proc/dri and adding new files doesn't take a major amount, as much as I appreciate sysfs, it isn't suitable for this sort of information dump, the whole one value per file is quite useless to provide this sort of information which is uni-directional for users to send to us for debugging without have to install some special tool to join all the values into one place.. and I don't think drmfs is the answer either... or maybe it is - struct drm_bo_arg can't use an int or unsigned, as it crosses the userspace/kernelspace boundry, use the proper types for all values in those types of structures (__u32, etc.) int is defined, unsigned I'm not so sure about, the drm user space interface is usually specified in non-system specific types so the drm.h file is consistent across systems, so we would probably have to use uint32_t which other people have objected to, but I'd rather use uint32_t than unspecified types.. - there doesn't seem to be any validity checking for the arguments passed into this new ioctl. Possibly that's just the way the rest of the dri interface is, which is scary, but with the memory stuff, you really should check things properly... Okay this needs fixing, we do check most ioctls args, the main thing passed in are handles and these are all looked up in the hash table, it may not be so obvious, also most of the ioctls are probably going to end up root or DRM master only, I'd like do an ioctl fuzzer at some stage, I'd suspect a lot more then the dri would be oopsable with permissions... Thanks, Dave. I agree with Dave for most if not all of the above. Typing, for example unsigned / uint32 vs u32, __u32 is very easily fixable once we decide on a clear way to go to keep (if we want to keep) compatibility with *bsd. For the IOCTL checking, as Dave states, most invalid data will be trapped in hash lookups and checks in the buffer validation system, but probably far from all. A fuzzer would be a nice tool to trap the exceptions. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] [PATCH] DRM TTM Memory Manager patch
Dave Airlie wrote: Most likely in doxygen as that is what Mesa uses and the intersection of developers is higher in that area, I'll take it as a task to try and kerneldoc the drm at some stage.. - what's with the /proc interface? Don't add new proc code for non-process related things. This should all go into sysfs somewhere. And yes, I know /proc/dri/ is there today, but don't add new stuff please. Well we should move all that stuff to sysfs, but we have all the infrastructure for publishing this stuff under /proc/dri and adding new files doesn't take a major amount, as much as I appreciate sysfs, it isn't suitable for this sort of information dump, the whole one value per file is quite useless to provide this sort of information which is uni-directional for users to send to us for debugging without have to install some special tool to join all the values into one place.. and I don't think drmfs is the answer either... or maybe it is - struct drm_bo_arg can't use an int or unsigned, as it crosses the userspace/kernelspace boundry, use the proper types for all values in those types of structures (__u32, etc.) int is defined, unsigned I'm not so sure about, the drm user space interface is usually specified in non-system specific types so the drm.h file is consistent across systems, so we would probably have to use uint32_t which other people have objected to, but I'd rather use uint32_t than unspecified types.. - there doesn't seem to be any validity checking for the arguments passed into this new ioctl. Possibly that's just the way the rest of the dri interface is, which is scary, but with the memory stuff, you really should check things properly... Okay this needs fixing, we do check most ioctls args, the main thing passed in are handles and these are all looked up in the hash table, it may not be so obvious, also most of the ioctls are probably going to end up root or DRM master only, I'd like do an ioctl fuzzer at some stage, I'd suspect a lot more then the dri would be oopsable with permissions... Thanks, Dave. I agree with Dave for most if not all of the above. Typing, for example unsigned / uint32 vs u32, __u32 is very easily fixable once we decide on a clear way to go to keep (if we want to keep) compatibility with *bsd. For the IOCTL checking, as Dave states, most invalid data will be trapped in hash lookups and checks in the buffer validation system, but probably far from all. A fuzzer would be a nice tool to trap the exceptions. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Dave Jones wrote: On Mon, Feb 05, 2007 at 01:56:22PM +0100, Thomas Hellström wrote: > Dave Jones wrote: > > On Sun, Feb 04, 2007 at 02:26:59PM -0500, Eric Buddington wrote: > > > On Sun, Feb 04, 2007 at 11:00:05AM +0100, Thomas Hellstr?m wrote: > > > > Eric, > > > > Sorry for the breakage. Can you try the attached patch and see if it > > > > fixes the problem. > > > > > > > > /Thomas > > > > > > Thomas, > > > > > > Thanks! That seems to fix it: > > > > Great. Thomas, can you resend me that patch with a description & Signed-off-by ? > > (Also for some reason that one looks like it was MIME damaged, git-am wouldn't eat it) > > > > Dave > > > > > Dave, > I'll post an updated complete patch. The previous one was really only > for Eric to try. Can you send just an incremental against what's in agpgart.git right now? Thanks, Dave Ok. Coming right away, /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Dave Jones wrote: On Sun, Feb 04, 2007 at 02:26:59PM -0500, Eric Buddington wrote: > On Sun, Feb 04, 2007 at 11:00:05AM +0100, Thomas Hellstr?m wrote: > > Eric, > > Sorry for the breakage. Can you try the attached patch and see if it > > fixes the problem. > > > > /Thomas > > Thomas, > > Thanks! That seems to fix it: Great. Thomas, can you resend me that patch with a description & Signed-off-by ? (Also for some reason that one looks like it was MIME damaged, git-am wouldn't eat it) Dave Dave, I'll post an updated complete patch. The previous one was really only for Eric to try. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Dave Jones wrote: On Sun, Feb 04, 2007 at 02:26:59PM -0500, Eric Buddington wrote: On Sun, Feb 04, 2007 at 11:00:05AM +0100, Thomas Hellstr?m wrote: Eric, Sorry for the breakage. Can you try the attached patch and see if it fixes the problem. /Thomas Thomas, Thanks! That seems to fix it: Great. Thomas, can you resend me that patch with a description Signed-off-by ? (Also for some reason that one looks like it was MIME damaged, git-am wouldn't eat it) Dave Dave, I'll post an updated complete patch. The previous one was really only for Eric to try. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Dave Jones wrote: On Mon, Feb 05, 2007 at 01:56:22PM +0100, Thomas Hellström wrote: Dave Jones wrote: On Sun, Feb 04, 2007 at 02:26:59PM -0500, Eric Buddington wrote: On Sun, Feb 04, 2007 at 11:00:05AM +0100, Thomas Hellstr?m wrote: Eric, Sorry for the breakage. Can you try the attached patch and see if it fixes the problem. /Thomas Thomas, Thanks! That seems to fix it: Great. Thomas, can you resend me that patch with a description Signed-off-by ? (Also for some reason that one looks like it was MIME damaged, git-am wouldn't eat it) Dave Dave, I'll post an updated complete patch. The previous one was really only for Eric to try. Can you send just an incremental against what's in agpgart.git right now? Thanks, Dave Ok. Coming right away, /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Eric Buddington wrote: On Sun, Feb 04, 2007 at 10:20:29AM +1100, Dave Airlie wrote: What AGP chipset do you have? it looks like it might be caused by the AGP changes for TTM.. lspci: 00:00.0 Host bridge: Silicon Integrated Systems [SiS] 741/741GX/M741 Host (rev 03) 00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SiS AGP Port (virtual PCI-to-PCI bridge) 00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS963 [MuTIOL Media IO] (rev 25) 00:02.1 SMBus: Silicon Integrated Systems [SiS] SiS961/2 SMBus Controller 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] ... -Eric Eric, Sorry for the breakage. Can you try the attached patch and see if it fixes the problem. /Thomas >From 531c7dc4e9d26a9a2c7a64d52cf73951c7c03ee8 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom Date: Sun, 4 Feb 2007 10:53:17 +0100 Subject: [PATCH] Add missing .agp_type_to_mask_type entries. --- drivers/char/agp/parisc-agp.c |1 + drivers/char/agp/sis-agp.c|1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/char/agp/parisc-agp.c b/drivers/char/agp/parisc-agp.c index 17c50b0..b7b4590 100644 --- a/drivers/char/agp/parisc-agp.c +++ b/drivers/char/agp/parisc-agp.c @@ -228,6 +228,7 @@ struct agp_bridge_driver parisc_agp_driv .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = 1, }; diff --git a/drivers/char/agp/sis-agp.c b/drivers/char/agp/sis-agp.c index a00fd48..60342b7 100644 --- a/drivers/char/agp/sis-agp.c +++ b/drivers/char/agp/sis-agp.c @@ -140,6 +140,7 @@ static struct agp_bridge_driver sis_driv .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_device_ids sis_agp_device_ids[] __devinitdata = -- 1.4.1
Re: 2.6.20-rc6-mm3 BUG in drm_ioctl with Rage 128 card
Eric Buddington wrote: On Sun, Feb 04, 2007 at 10:20:29AM +1100, Dave Airlie wrote: What AGP chipset do you have? it looks like it might be caused by the AGP changes for TTM.. lspci: 00:00.0 Host bridge: Silicon Integrated Systems [SiS] 741/741GX/M741 Host (rev 03) 00:01.0 PCI bridge: Silicon Integrated Systems [SiS] SiS AGP Port (virtual PCI-to-PCI bridge) 00:02.0 ISA bridge: Silicon Integrated Systems [SiS] SiS963 [MuTIOL Media IO] (rev 25) 00:02.1 SMBus: Silicon Integrated Systems [SiS] SiS961/2 SMBus Controller 00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] ... -Eric Eric, Sorry for the breakage. Can you try the attached patch and see if it fixes the problem. /Thomas From 531c7dc4e9d26a9a2c7a64d52cf73951c7c03ee8 Mon Sep 17 00:00:00 2001 From: Thomas Hellstrom thomas-at-tungstengraphics-dot-com Date: Sun, 4 Feb 2007 10:53:17 +0100 Subject: [PATCH] Add missing .agp_type_to_mask_type entries. --- drivers/char/agp/parisc-agp.c |1 + drivers/char/agp/sis-agp.c|1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/drivers/char/agp/parisc-agp.c b/drivers/char/agp/parisc-agp.c index 17c50b0..b7b4590 100644 --- a/drivers/char/agp/parisc-agp.c +++ b/drivers/char/agp/parisc-agp.c @@ -228,6 +228,7 @@ struct agp_bridge_driver parisc_agp_driv .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, .cant_use_aperture = 1, }; diff --git a/drivers/char/agp/sis-agp.c b/drivers/char/agp/sis-agp.c index a00fd48..60342b7 100644 --- a/drivers/char/agp/sis-agp.c +++ b/drivers/char/agp/sis-agp.c @@ -140,6 +140,7 @@ static struct agp_bridge_driver sis_driv .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_device_ids sis_agp_device_ids[] __devinitdata = -- 1.4.1
Re: Support for i386 PATs
Dave Jones wrote: On Sat, Jan 27, 2007 at 10:20:18AM +0100, Thomas Hellström wrote: > Hi! > > Does anybody have a strong opinion against adding support for > i386 Page Attribute Tables? It pops up about once a year, everyone agrees it'd be a good idea, and then nothing happens. Last year, I started picking over Terrence Ripperda's original implementation, but that needs a bit of work. (http://lkml.org/lkml/2005/5/12/187) Whoa, two years even. Time flies. Dave Ah. This might explain why nothing happens :). I had something much simpler in mind. Just the basic PAT setup and some new pgprot_ types which affected drivers could use directly. Like in(drivers/char/drm/drm_vm.c) where ia64 has a pgprot_writecombine() function. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Support for i386 PATs
Dave Jones wrote: On Sat, Jan 27, 2007 at 10:20:18AM +0100, Thomas Hellström wrote: Hi! Does anybody have a strong opinion against adding support for i386 Page Attribute Tables? It pops up about once a year, everyone agrees it'd be a good idea, and then nothing happens. Last year, I started picking over Terrence Ripperda's original implementation, but that needs a bit of work. (http://lkml.org/lkml/2005/5/12/187) Whoa, two years even. Time flies. Dave Ah. This might explain why nothing happens :). I had something much simpler in mind. Just the basic PAT setup and some new pgprot_ types which affected drivers could use directly. Like in(drivers/char/drm/drm_vm.c) where ia64 has a pgprot_writecombine() function. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Support for i386 PATs
Hi! Does anybody have a strong opinion against adding support for i386 Page Attribute Tables? The main benefit would be that one can have write-combining memory regions without setting up MTRRs. This will come in handy for a device we're working with where the device driver needs to allocate the display memory directly from system memory, and it may be difficult to fit the mtrr alignment constraints. Outline: The PAT may be set up at boot time with fixed backwards-compatible memory types for the different PAT entries + defines like the following: #define _PAGE_PAT_WB xxx #define _PAGE_PAT_WT xxx #define _PAGE_PAT_UC0 xxx #define _PAGE_PAT_UC1 xxx #define _PAGE_PAT_WC xxx which can be used in parallel with the old _PAGE_PWT and _PAGE_PCD bits. The idea is that new memory types, WC for example, will use the pat entries 7 downto 4, whereas 0-3 are left to boot setting to maintain backwards compatibility. Issues: 1) The _PAGE_BIT_PAT will be the same as _PAGE_PSE, and _PAGE_PROTNONE. As I understand it, _PAGE_PROTNONE is not used when the page is present, so this might not be an issue. What about _PAGE_PSE? 2) The PATs need to be setup for each processor just after system boot. Where is the best place to do this? /Thomas Hellström - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Support for i386 PATs
Hi! Does anybody have a strong opinion against adding support for i386 Page Attribute Tables? The main benefit would be that one can have write-combining memory regions without setting up MTRRs. This will come in handy for a device we're working with where the device driver needs to allocate the display memory directly from system memory, and it may be difficult to fit the mtrr alignment constraints. Outline: The PAT may be set up at boot time with fixed backwards-compatible memory types for the different PAT entries + defines like the following: #define _PAGE_PAT_WB xxx #define _PAGE_PAT_WT xxx #define _PAGE_PAT_UC0 xxx #define _PAGE_PAT_UC1 xxx #define _PAGE_PAT_WC xxx which can be used in parallel with the old _PAGE_PWT and _PAGE_PCD bits. The idea is that new memory types, WC for example, will use the pat entries 7 downto 4, whereas 0-3 are left to boot setting to maintain backwards compatibility. Issues: 1) The _PAGE_BIT_PAT will be the same as _PAGE_PSE, and _PAGE_PROTNONE. As I understand it, _PAGE_PROTNONE is not used when the page is present, so this might not be an issue. What about _PAGE_PSE? 2) The PATs need to be setup for each processor just after system boot. Where is the best place to do this? /Thomas Hellström - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i810fb fails to load
Dave Airlie wrote: >> >> > > Don't know. But I bet someone on the Cc does... > Tilman, Thanks for reporting. Can you try the attached patch to see if that fixes the problem. Hi Thomas, This also fixes X starting on old i810/5 hardware, I had noticed it broken but hadn't had time to investigate it, this patch fixes it... Dave. Great. The oldest hardware I have is an i915 which hampers me a bit. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i810fb fails to load
Dave Airlie wrote: Don't know. But I bet someone on the Cc does... Tilman, Thanks for reporting. Can you try the attached patch to see if that fixes the problem. Hi Thomas, This also fixes X starting on old i810/5 hardware, I had noticed it broken but hadn't had time to investigate it, this patch fixes it... Dave. Great. The oldest hardware I have is an i915 which hampers me a bit. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: i810fb fails to load
Andrew Morton wrote: On Mon, 15 Jan 2007 00:52:36 +0100 Tilman Schmidt <[EMAIL PROTECTED]> wrote: With kernel 2.6.20-rc4-mm1 and all hotfixes, i810fb fails to load on my Dell Optiplex GX110. Here's an excerpt of the diff between the boot logs of 2.6.20-rc5 (working) and 2.6.20-rc4-mm1 (non-working): @@ -4,7 +4,7 @@ No module symbols loaded - kernel modules not enabled. klogd 1.4.1, log source = ksyslog started. -<5>Linux version 2.6.20-rc5-noinitrd ([EMAIL PROTECTED]) (gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)) #2 PREEMPT Sun Jan 14 23:37:12 CET 2007 +<5>Linux version 2.6.20-rc4-mm1-noinitrd ([EMAIL PROTECTED]) (gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)) #3 PREEMPT Sun Jan 14 21:08:56 CET 2007 <6>BIOS-provided physical RAM map: <4>sanitize start <4>sanitize end @@ -188,7 +192,6 @@ <6>ACPI: Interpreter enabled <6>ACPI: Using PIC for interrupt routing <6>ACPI: PCI Root Bridge [PCI0] (:00) -<7>PCI: Probing PCI hardware (bus 00) <6>ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 <7>Boot video device is :00:01.0 <4>PCI quirk: region 0800-087f claimed by ICH4 ACPI/GPIO/TCO @@ -238,20 +241,15 @@ <6>isapnp: No Plug & Play device found <6>Real Time Clock Driver v1.12ac <6>Intel 82802 RNG detected -<6>Linux agpgart interface v0.101 (c) Dave Jones +<6>Linux agpgart interface v0.102 (c) Dave Jones <6>agpgart: Detected an Intel i810 E Chipset. <6>agpgart: detected 4MB dedicated video ram. <6>agpgart: AGP aperture is 64M @ 0xf800 <4>ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 9 <7>PCI: setting IRQ 9 as level-triggered <6>ACPI: PCI Interrupt :00:01.0[A] -> Link [LNKA] -> GSI 9 (level, low) -> IRQ 9 -<4>i810-i2c: Probe DDC1 Bus -<4>i810fb_init_pci: DDC probe successful -<4>Console: switching to colour frame buffer device 160x64 -<4>I810FB: fb0 : Intel(R) 810E Framebuffer Device v0.9.0 -<4>I810FB: Video RAM : 4096K -<4>I810FB: Monitor : H: 30-83 KHz V: 55-75 Hz -<4>I810FB: Mode: [EMAIL PROTECTED] +<4>i810fb_alloc_fbmem: can't bind framebuffer memory +<4>i810fb: probe of :00:01.0 failed with error -16 <6>Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled <6>serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A <6>serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Please let me know if you need more information. Don't know. But I bet someone on the Cc does... Tilman, Thanks for reporting. Can you try the attached patch to see if that fixes the problem. Regards, Thomas Hellström diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index 91c1f36..6ef0960 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -190,6 +190,7 @@ struct agp_memory *agp_create_memory(int return NULL; } new->num_scratch_pages = scratch_pages; + new->type = AGP_NORMAL_MEMORY; return new; } EXPORT_SYMBOL(agp_create_memory); diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index b8896c8..5a0713c 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -260,6 +260,7 @@ static int intel_i810_insert_entries(str readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); break; case AGP_PHYS_MEMORY: + case AGP_NORMAL_MEMORY: if (!mem->is_flushed) global_cache_flush(); for (i = 0, j = pg_start; i < mem->page_count; i++, j++) {
Re: i810fb fails to load
Andrew Morton wrote: On Mon, 15 Jan 2007 00:52:36 +0100 Tilman Schmidt [EMAIL PROTECTED] wrote: With kernel 2.6.20-rc4-mm1 and all hotfixes, i810fb fails to load on my Dell Optiplex GX110. Here's an excerpt of the diff between the boot logs of 2.6.20-rc5 (working) and 2.6.20-rc4-mm1 (non-working): @@ -4,7 +4,7 @@ No module symbols loaded - kernel modules not enabled. klogd 1.4.1, log source = ksyslog started. -5Linux version 2.6.20-rc5-noinitrd ([EMAIL PROTECTED]) (gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)) #2 PREEMPT Sun Jan 14 23:37:12 CET 2007 +5Linux version 2.6.20-rc4-mm1-noinitrd ([EMAIL PROTECTED]) (gcc version 4.0.2 20050901 (prerelease) (SUSE Linux)) #3 PREEMPT Sun Jan 14 21:08:56 CET 2007 6BIOS-provided physical RAM map: 4sanitize start 4sanitize end @@ -188,7 +192,6 @@ 6ACPI: Interpreter enabled 6ACPI: Using PIC for interrupt routing 6ACPI: PCI Root Bridge [PCI0] (:00) -7PCI: Probing PCI hardware (bus 00) 6ACPI: Assume root bridge [\_SB_.PCI0] bus is 0 7Boot video device is :00:01.0 4PCI quirk: region 0800-087f claimed by ICH4 ACPI/GPIO/TCO @@ -238,20 +241,15 @@ 6isapnp: No Plug Play device found 6Real Time Clock Driver v1.12ac 6Intel 82802 RNG detected -6Linux agpgart interface v0.101 (c) Dave Jones +6Linux agpgart interface v0.102 (c) Dave Jones 6agpgart: Detected an Intel i810 E Chipset. 6agpgart: detected 4MB dedicated video ram. 6agpgart: AGP aperture is 64M @ 0xf800 4ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 9 7PCI: setting IRQ 9 as level-triggered 6ACPI: PCI Interrupt :00:01.0[A] - Link [LNKA] - GSI 9 (level, low) - IRQ 9 -4i810-i2c: Probe DDC1 Bus -4i810fb_init_pci: DDC probe successful -4Console: switching to colour frame buffer device 160x64 -4I810FB: fb0 : Intel(R) 810E Framebuffer Device v0.9.0 -4I810FB: Video RAM : 4096K -4I810FB: Monitor : H: 30-83 KHz V: 55-75 Hz -4I810FB: Mode: [EMAIL PROTECTED] +4i810fb_alloc_fbmem: can't bind framebuffer memory +4i810fb: probe of :00:01.0 failed with error -16 6Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled 6serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A 6serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A Please let me know if you need more information. Don't know. But I bet someone on the Cc does... Tilman, Thanks for reporting. Can you try the attached patch to see if that fixes the problem. Regards, Thomas Hellström diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index 91c1f36..6ef0960 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -190,6 +190,7 @@ struct agp_memory *agp_create_memory(int return NULL; } new-num_scratch_pages = scratch_pages; + new-type = AGP_NORMAL_MEMORY; return new; } EXPORT_SYMBOL(agp_create_memory); diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index b8896c8..5a0713c 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -260,6 +260,7 @@ static int intel_i810_insert_entries(str readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); break; case AGP_PHYS_MEMORY: + case AGP_NORMAL_MEMORY: if (!mem-is_flushed) global_cache_flush(); for (i = 0, j = pg_start; i mem-page_count; i++, j++) {
Re: Replace nopage() / nopfn() with fault()
Nick Piggin wrote: On Tue, Jan 09, 2007 at 04:02:08PM +0100, Thomas Hellström wrote: Nick, We're working to slowly get the new DRM memory manager into the mainstream kernel. This means we have a need for the page fault handler patch you wrote some time ago. I guess we could take the no_pfn() route, but that would need a check for racing unmap_mapping_range(), and other problems may arise. What is the current status and plans for inclusion of the fault() code? Hi Thomas, fault should have gone in already, but the ordering of my patchset was a little bit unfortunate :P I will submit them to Andrew later today. Nick Thanks, Nick. /Thomas. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Replace nopage() / nopfn() with fault()
Nick Piggin wrote: On Tue, Jan 09, 2007 at 04:02:08PM +0100, Thomas Hellström wrote: Nick, We're working to slowly get the new DRM memory manager into the mainstream kernel. This means we have a need for the page fault handler patch you wrote some time ago. I guess we could take the no_pfn() route, but that would need a check for racing unmap_mapping_range(), and other problems may arise. What is the current status and plans for inclusion of the fault() code? Hi Thomas, fault should have gone in already, but the ordering of my patchset was a little bit unfortunate :P I will submit them to Andrew later today. Nick Thanks, Nick. /Thomas. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Replace nopage() / nopfn() with fault()
Nick, We're working to slowly get the new DRM memory manager into the mainstream kernel. This means we have a need for the page fault handler patch you wrote some time ago. I guess we could take the no_pfn() route, but that would need a check for racing unmap_mapping_range(), and other problems may arise. What is the current status and plans for inclusion of the fault() code? Thanks, Thomas Hellstrom Tungsten Graphics. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: agpgart: drm-populated memory types
Arjan van de Ven wrote: A short recap why I belive the kmalloc / vmalloc construct is necessary: 0) The current code uses vmalloc only. 1) The allocated area ranges from 4 bytes possibly up to 512 kB, depending on on the size of the AGP buffer allocated. 2) Large buffers are very few. Small buffers tend to be quite many. If we continue to use vmalloc only or another page-based scheme we will waste approx one page per buffer, together with the added slowness of vmalloc. This will severely hurt applications with a lot of small texture buffers. Please let me know if you still consider this unacceptable. explicit use of either kmalloc/vmalloc is fine with me; I would suggest an 2*PAGE_SIZE cutoff for this decision In that case I suggest sticking with vmalloc for now. Also please let me know if there are other parths of the patch that should be reworked. The patch that follows is against Dave's agpgart repo. Hmm. Still struggling with git-send-email. Now it should have arrived. Thanks, Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: agpgart: drm-populated memory types
Arjan van de Ven wrote: A short recap why I belive the kmalloc / vmalloc construct is necessary: 0) The current code uses vmalloc only. 1) The allocated area ranges from 4 bytes possibly up to 512 kB, depending on on the size of the AGP buffer allocated. 2) Large buffers are very few. Small buffers tend to be quite many. If we continue to use vmalloc only or another page-based scheme we will waste approx one page per buffer, together with the added slowness of vmalloc. This will severely hurt applications with a lot of small texture buffers. Please let me know if you still consider this unacceptable. explicit use of either kmalloc/vmalloc is fine with me; I would suggest an 2*PAGE_SIZE cutoff for this decision In that case I suggest sticking with vmalloc for now. Also please let me know if there are other parths of the patch that should be reworked. The patch that follows is against Dave's agpgart repo. you forgot the patch Hmm. Still struggling with git-send-email. Now it should have arrived. Thanks, Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Replace nopage() / nopfn() with fault()
Nick, We're working to slowly get the new DRM memory manager into the mainstream kernel. This means we have a need for the page fault handler patch you wrote some time ago. I guess we could take the no_pfn() route, but that would need a check for racing unmap_mapping_range(), and other problems may arise. What is the current status and plans for inclusion of the fault() code? Thanks, Thomas Hellstrom Tungsten Graphics. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: On Tue, 2006-12-19 at 13:47 +0100, Thomas Hellström wrote: Arjan van de Ven wrote: A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. but if it's clearly the right thing. How hard can it be? there are what.. 5 or 6 AGP drivers in the kernel? Hmm, but we would still waste a lot of memory compared to kmalloc, surely it's at most 4Kb for the entire system? Nope. These structures get allocated once per display memory buffer, and a display memory buffer may be as large as the AGP aperture size, (usually up to 512MB) or as small as one page. The latter could be a user allocating a texture buffer for each character in a font, and they can be quite numerous, so we would waste almost 4Kb per buffer, which is not acceptable. (if agp allows the non-root user to pin a lot more than that in kernel memory there is a different problem of rlimits ;) The drm memory manager sets aside and keeps track of a preset amount of memory that can be pinned in the kernel for video use, which is shared by all users running direct rendering clients. Currently this is a hard limit, but the idea is to unlock memory and make it swappable if resources become scarce. The memory we're discussing above is included in the bookkeeping. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. but if it's clearly the right thing. How hard can it be? there are what.. 5 or 6 AGP drivers in the kernel? Hmm, but we would still waste a lot of memory compared to kmalloc, when the amount of memory needed is much less than one page, which tends to be a very common case. Unless we allow the first entry in the array to be the virtual adress to an arbitrary-sized (max one page) kmalloc() area, the rest of the entries can be pointers to pages allocated with __get_free_page(). This would almost introduce the same level of confusion as the original proposal, and effectively we'd be doing virtual address translation in software for each access. If it's acceptable I'd like to go for the vmalloc / kmalloc flag, or at worst keep the current vmalloc only but that's such a _huge_ memory waste for small buffers. The flag was the original idea, but unfortunately the agp_memory struct is part of the drm interface, and I wasn't sure we could add a variable to it. I doubt this is part of the userspace interface so for sure we can change it to be right. /Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: On Sat, 2006-12-09 at 00:05 +0100, Thomas Hellström wrote: On Fri, 2006-12-08 at 19:24 +0100, Thomas Hellström wrote: + } + + if (alloc_size <= PAGE_SIZE) { + new->memory = kmalloc(alloc_size, GFP_KERNEL); + } + if (new->memory == NULL) { + new->memory = vmalloc(alloc_size); this bit is more or less evil as well... 1) vmalloc is expensive all the way, higher tlb use etc etc 2) mixing allocation types is just a recipe for disaster 3) if this isn't a frequent operation, kmalloc is fine upto at least 2 pages; I doubt you'll ever want more I understand your feelings about this, and as you probably understand, the kfree / vfree thingy is a result of the above allocation scheme. the kfree/vfree thing at MINIMUM should be changed though. Even if you need both kfree and vfree, you should key it off of a flag that you store, not off the address of the memory, that's just unportable and highly fragile. You *know* which allocator you used, so store it and use THAT info. The allocated memory holds an array of struct page pointers. The number of struct page pointers will range from 1 to about 8192, so the alloc size will range from 4bytes to 64K, but could go higher depending on architecture. hmm 64Kb is a bit much indeed. You can't do an array of upto 16 entries with one page in each array entry? Arjan, Thanks for taking time to review this. A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. If it's acceptable I'd like to go for the vmalloc / kmalloc flag, or at worst keep the current vmalloc only but that's such a _huge_ memory waste for small buffers. The flag was the original idea, but unfortunately the agp_memory struct is part of the drm interface, and I wasn't sure we could add a variable to it. DaveJ, is it possible to extend struct agp_memory with a flags field? Regards, Thomas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: On Sat, 2006-12-09 at 00:05 +0100, Thomas Hellström wrote: On Fri, 2006-12-08 at 19:24 +0100, Thomas Hellström wrote: + } + + if (alloc_size = PAGE_SIZE) { + new-memory = kmalloc(alloc_size, GFP_KERNEL); + } + if (new-memory == NULL) { + new-memory = vmalloc(alloc_size); this bit is more or less evil as well... 1) vmalloc is expensive all the way, higher tlb use etc etc 2) mixing allocation types is just a recipe for disaster 3) if this isn't a frequent operation, kmalloc is fine upto at least 2 pages; I doubt you'll ever want more I understand your feelings about this, and as you probably understand, the kfree / vfree thingy is a result of the above allocation scheme. the kfree/vfree thing at MINIMUM should be changed though. Even if you need both kfree and vfree, you should key it off of a flag that you store, not off the address of the memory, that's just unportable and highly fragile. You *know* which allocator you used, so store it and use THAT info. The allocated memory holds an array of struct page pointers. The number of struct page pointers will range from 1 to about 8192, so the alloc size will range from 4bytes to 64K, but could go higher depending on architecture. hmm 64Kb is a bit much indeed. You can't do an array of upto 16 entries with one page in each array entry? Arjan, Thanks for taking time to review this. A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. If it's acceptable I'd like to go for the vmalloc / kmalloc flag, or at worst keep the current vmalloc only but that's such a _huge_ memory waste for small buffers. The flag was the original idea, but unfortunately the agp_memory struct is part of the drm interface, and I wasn't sure we could add a variable to it. DaveJ, is it possible to extend struct agp_memory with a flags field? Regards, Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. but if it's clearly the right thing. How hard can it be? there are what.. 5 or 6 AGP drivers in the kernel? Hmm, but we would still waste a lot of memory compared to kmalloc, when the amount of memory needed is much less than one page, which tends to be a very common case. Unless we allow the first entry in the array to be the virtual adress to an arbitrary-sized (max one page) kmalloc() area, the rest of the entries can be pointers to pages allocated with __get_free_page(). This would almost introduce the same level of confusion as the original proposal, and effectively we'd be doing virtual address translation in software for each access. If it's acceptable I'd like to go for the vmalloc / kmalloc flag, or at worst keep the current vmalloc only but that's such a _huge_ memory waste for small buffers. The flag was the original idea, but unfortunately the agp_memory struct is part of the drm interface, and I wasn't sure we could add a variable to it. I doubt this is part of the userspace interface so for sure we can change it to be right. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
Arjan van de Ven wrote: On Tue, 2006-12-19 at 13:47 +0100, Thomas Hellström wrote: Arjan van de Ven wrote: A short background: The current code uses vmalloc only. The potential use of kmalloc was introduced to save memory and cpu-speed. All agp drivers expect to see a single memory chunk, so I'm not sure we want to have an array of pages. That may require rewriting a lot of code. but if it's clearly the right thing. How hard can it be? there are what.. 5 or 6 AGP drivers in the kernel? Hmm, but we would still waste a lot of memory compared to kmalloc, surely it's at most 4Kb for the entire system? Nope. These structures get allocated once per display memory buffer, and a display memory buffer may be as large as the AGP aperture size, (usually up to 512MB) or as small as one page. The latter could be a user allocating a texture buffer for each character in a font, and they can be quite numerous, so we would waste almost 4Kb per buffer, which is not acceptable. (if agp allows the non-root user to pin a lot more than that in kernel memory there is a different problem of rlimits ;) The drm memory manager sets aside and keeps track of a preset amount of memory that can be pinned in the kernel for video use, which is shared by all users running direct rendering clients. Currently this is a hard limit, but the idea is to unlock memory and make it swappable if resources become scarce. The memory we're discussing above is included in the bookkeeping. /Thomas - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/2] agpgart - allow user-populated memory types.
> On Fri, 2006-12-08 at 19:24 +0100, Thomas Hellström wrote: >> >> + } >> + >> + if (alloc_size <= PAGE_SIZE) { >> + new->memory = kmalloc(alloc_size, GFP_KERNEL); >> + } >> + if (new->memory == NULL) { >> + new->memory = vmalloc(alloc_size); > > this bit is more or less evil as well... > > 1) vmalloc is expensive all the way, higher tlb use etc etc > 2) mixing allocation types is just a recipe for disaster > 3) if this isn't a frequent operation, kmalloc is fine upto at least 2 > pages; I doubt you'll ever want more I understand your feelings about this, and as you probably understand, the kfree / vfree thingy is a result of the above allocation scheme. The allocated memory holds an array of struct page pointers. The number of struct page pointers will range from 1 to about 8192, so the alloc size will range from 4bytes to 64K, but could go higher depending on architecture. I figured that kmalloc could fail, and, according to "linux device drivers" (IIRC), vmalloc is faster when allocation size exceeds 1 page. Using only vmalloc waists far too much memory if the number of small buffers is high. So we can't use only vmalloc, and we can't fail so we can't use only kmalloc. I'm open to suggestions on how to improve this. Regards, Thomas > > > > -- > if you want to mail me at work (you don't), use arjan (at) linux.intel.com > Test the interaction between Linux and your BIOS via > http://www.linuxfirmwarekit.org > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/2] agpgart - Remove unnecessary flushes.
This patch is to speed up flipping of pages in and out of the AGP aperture as needed by the new drm memory manager. A number of global cache flushes are removed as well as some PCI posting flushes. The following guidelines have been used: 1) Memory that is only mapped uncached and that has been subject to a global cache flush after the mapping was changed to uncached does not need any more cache flushes. Neither before binding to the aperture nor after unbinding. 2) Only do one PCI posting flush after a sequence of writes modifying page entries in the GATT. Patch against davej's agpgart.git /Thomas Hellström diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index 7a6107f..75557de 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -1076,8 +1076,8 @@ int agp_generic_insert_memory(struct agp for (i = 0, j = pg_start; i < mem->page_count; i++, j++) { writel(bridge->driver->mask_memory(bridge, mem->memory[i], mask_type), bridge->gatt_table+j); - readl(bridge->gatt_table+j); /* PCI Posting. */ } + readl(bridge->gatt_table+j-1); /* PCI Posting. */ bridge->driver->tlb_flush(mem); return 0; @@ -,10 +,9 @@ int agp_generic_remove_memory(struct agp /* AK: bogus, should encode addresses > 4GB */ for (i = pg_start; i < (mem->page_count + pg_start); i++) { writel(bridge->scratch_page, bridge->gatt_table+i); - readl(bridge->gatt_table+i); /* PCI Posting. */ } + readl(bridge->gatt_table+i-1); /* PCI Posting. */ - global_cache_flush(); bridge->driver->tlb_flush(mem); return 0; } diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index 677581a..aaf9b30 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -256,9 +256,8 @@ static int intel_i810_insert_entries(str for (i = pg_start; i < (pg_start + mem->page_count); i++) { writel((i*4096)|I810_PTE_LOCAL|I810_PTE_VALID, intel_i810_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(i*4)); - } + readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); break; case AGP_PHYS_MEMORY: if (!mem->is_flushed) @@ -268,13 +267,13 @@ static int intel_i810_insert_entries(str mem->memory[i], mask_type), intel_i810_private.registers+I810_PTE_BASE+(j*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(j*4)); } + readl(intel_i810_private.registers+I810_PTE_BASE+((j-1)*4)); break; default: goto out_err; } - global_cache_flush(); + agp_bridge->driver->tlb_flush(mem); out: ret = 0; @@ -293,10 +292,9 @@ static int intel_i810_remove_entries(str for (i = pg_start; i < (mem->page_count + pg_start); i++) { writel(agp_bridge->scratch_page, intel_i810_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(i*4)) ; } - - global_cache_flush(); + readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); + agp_bridge->driver->tlb_flush(mem); return 0; } @@ -646,7 +644,7 @@ static int intel_i830_insert_entries(str if (mask_type != 0 && mask_type != AGP_PHYS_MEMORY && mask_type != INTEL_AGP_CACHED_MEMORY) goto out_err; - + if (!mem->is_flushed) global_cache_flush(); @@ -654,9 +652,8 @@ static int intel_i830_insert_entries(str writel(agp_bridge->driver->mask_memory(agp_bridge, mem->memory[i], mask_type), intel_i830_private.registers+I810_PTE_BASE+(j*4)); - readl(intel_i830_private.registers+I810_PTE_BASE+(j*4)); } - global_cache_flush(); + readl(intel_i830_private.registers+I810_PTE_BASE+((j-1)*4)); agp_bridge->driver->tlb_flush(mem); out: @@ -671,8 +668,6 @@ static int intel_i830_remove_entries(str { int i; - global_cache_flush(); - if (pg_start < intel_i830_private.gtt_entries) { printk (KERN_INFO PFX "Trying to disable local/stolen memory\n"); return -EINVAL; @@ -680,10 +675,9 @@ static int intel_i830_remove_entries(str for (i = pg_start; i < (mem->page_count + pg_start); i++) { writel(agp_bridge->scratch_page, intel_i830_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i830_private.registers+I810_PTE_BASE+(i*4)); } - - global_cache_flush(); + readl(intel_i830_private.registers+I810_PTE_BASE+((i-1)*4)); + agp_bridge->driver->tlb_flush(mem); return 0; } @@ -777,10 +771,9 @@ static int intel_i915_insert_entries(str for (i = 0, j = pg_start; i < mem->page_count; i++, j++) { writel(agp_bridge->driver->mask_memory(agp_bridge, mem->memory[i], mask_type), intel_i830_private.gtt+j); - readl(intel_i830_private.gtt+j); } - global_cache_flush(); + readl(intel_i830_private.gtt+j-1); agp_bridge->driver->tlb_flush(mem); out: @@ -795,19 +788,16 @@ static int intel_i915_remove_entries(str {
[patch 1/2] agpgart - allow user-populated memory types.
This patch allows drm to populate an agpgart structure with pages of its own. It's needed for the new drm memory manager which dynamically flips pages in and out of AGP. The patch modifies the generic functions as well as the intel agp driver. The intel drm driver is currently the only one supporting the new memory manager. Other agp drivers may need some minor fixing up once they have a corresponding memory manager enabled drm driver. AGP memory types >= AGP_USER_TYPES are not populated by the agpgart driver, but the drm is expected to do that, as well as taking care of cache- and tlb flushing when needed. It's not possible to request these types from user space using agpgart ioctls. The intel driver also gets a new memory type for pages that can be bound cached to the intel GTT. Patch against davej's agpgart.git /Thomas Hellström diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h index 8b3317f..8abbbda 100644 --- a/drivers/char/agp/agp.h +++ b/drivers/char/agp/agp.h @@ -114,6 +114,7 @@ struct agp_bridge_driver { void (*free_by_type)(struct agp_memory *); void *(*agp_alloc_page)(struct agp_bridge_data *); void (*agp_destroy_page)(void *); +int (*agp_type_to_mask_type) (struct agp_bridge_data *, int); }; struct agp_bridge_data { @@ -218,6 +219,7 @@ #define I810_PTE_BASE 0x1 #define I810_PTE_MAIN_UNCACHED 0x #define I810_PTE_LOCAL 0x0002 #define I810_PTE_VALID 0x0001 +#define I830_PTE_SYSTEM_CACHED 0x0006 #define I810_SMRAM_MISCC 0x70 #define I810_GFX_MEM_WIN_SIZE 0x0001 #define I810_GFX_MEM_WIN_32M 0x0001 @@ -266,8 +268,14 @@ void global_cache_flush(void); void get_agp_version(struct agp_bridge_data *bridge); unsigned long agp_generic_mask_memory(struct agp_bridge_data *bridge, unsigned long addr, int type); +int agp_generic_type_to_mask_type(struct agp_bridge_data *bridge, + int type); struct agp_bridge_data *agp_generic_find_bridge(struct pci_dev *pdev); +/* generic functions for user-populated AGP memory types */ +struct agp_memory *agp_generic_alloc_user(size_t page_count, int type); +void agp_generic_free_user(struct agp_memory *curr); + /* generic routines for agp>=3 */ int agp3_generic_fetch_size(void); void agp3_generic_tlbflush(struct agp_memory *mem); diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c index 5a31ec7..98177a9 100644 --- a/drivers/char/agp/ali-agp.c +++ b/drivers/char/agp/ali-agp.c @@ -214,6 +214,7 @@ static struct agp_bridge_driver ali_gene .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = ali_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_bridge_driver ali_m1541_bridge = { @@ -237,6 +238,7 @@ static struct agp_bridge_driver ali_m154 .free_by_type = agp_generic_free_by_type, .agp_alloc_page = m1541_alloc_page, .agp_destroy_page = m1541_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c index b4e00a3..b0acf41 100644 --- a/drivers/char/agp/alpha-agp.c +++ b/drivers/char/agp/alpha-agp.c @@ -91,6 +91,9 @@ static int alpha_core_agp_insert_memory( int num_entries, status; void *temp; + if (type >= AGP_USER_TYPES || mem->type >= AGP_USER_TYPES) + return -EINVAL; + temp = agp_bridge->current_size; num_entries = A_SIZE_FIX(temp)->num_entries; if ((pg_start + mem->page_count) > num_entries) @@ -142,6 +145,7 @@ struct agp_bridge_driver alpha_core_agp_ .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; struct agp_bridge_data *alpha_bridge; diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c index 51d0d56..9c129bd 100644 --- a/drivers/char/agp/amd-k7-agp.c +++ b/drivers/char/agp/amd-k7-agp.c @@ -376,6 +376,7 @@ static struct agp_bridge_driver amd_iron .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_device_ids amd_agp_device_ids[] __devinitdata = diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c index 00b17ae..ac24340 100644 --- a/drivers/char/agp/amd64-agp.c +++ b/drivers/char/agp/amd64-agp.c @@ -62,12 +62,18 @@ static int amd64_insert_memory(struct ag { int i, j, num_entries; long long tmp; + int mask_type; + struct agp_bridge_data *bridge = mem->bridge; u32 pte; num_entries = agp_num_entries(); - if (type != 0 || mem->type != 0) + if (type != mem->type) return -EINVAL; + mask_type = bridge->driver->agp_type_to_mask_type(bridge, type); + if (mask_type != 0) + return -EINVAL; + /* Make sure we can fit the ran
[patch 1/2] agpgart - allow user-populated memory types.
This patch allows drm to populate an agpgart structure with pages of its own. It's needed for the new drm memory manager which dynamically flips pages in and out of AGP. The patch modifies the generic functions as well as the intel agp driver. The intel drm driver is currently the only one supporting the new memory manager. Other agp drivers may need some minor fixing up once they have a corresponding memory manager enabled drm driver. AGP memory types = AGP_USER_TYPES are not populated by the agpgart driver, but the drm is expected to do that, as well as taking care of cache- and tlb flushing when needed. It's not possible to request these types from user space using agpgart ioctls. The intel driver also gets a new memory type for pages that can be bound cached to the intel GTT. Patch against davej's agpgart.git /Thomas Hellström diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h index 8b3317f..8abbbda 100644 --- a/drivers/char/agp/agp.h +++ b/drivers/char/agp/agp.h @@ -114,6 +114,7 @@ struct agp_bridge_driver { void (*free_by_type)(struct agp_memory *); void *(*agp_alloc_page)(struct agp_bridge_data *); void (*agp_destroy_page)(void *); +int (*agp_type_to_mask_type) (struct agp_bridge_data *, int); }; struct agp_bridge_data { @@ -218,6 +219,7 @@ #define I810_PTE_BASE 0x1 #define I810_PTE_MAIN_UNCACHED 0x #define I810_PTE_LOCAL 0x0002 #define I810_PTE_VALID 0x0001 +#define I830_PTE_SYSTEM_CACHED 0x0006 #define I810_SMRAM_MISCC 0x70 #define I810_GFX_MEM_WIN_SIZE 0x0001 #define I810_GFX_MEM_WIN_32M 0x0001 @@ -266,8 +268,14 @@ void global_cache_flush(void); void get_agp_version(struct agp_bridge_data *bridge); unsigned long agp_generic_mask_memory(struct agp_bridge_data *bridge, unsigned long addr, int type); +int agp_generic_type_to_mask_type(struct agp_bridge_data *bridge, + int type); struct agp_bridge_data *agp_generic_find_bridge(struct pci_dev *pdev); +/* generic functions for user-populated AGP memory types */ +struct agp_memory *agp_generic_alloc_user(size_t page_count, int type); +void agp_generic_free_user(struct agp_memory *curr); + /* generic routines for agp=3 */ int agp3_generic_fetch_size(void); void agp3_generic_tlbflush(struct agp_memory *mem); diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c index 5a31ec7..98177a9 100644 --- a/drivers/char/agp/ali-agp.c +++ b/drivers/char/agp/ali-agp.c @@ -214,6 +214,7 @@ static struct agp_bridge_driver ali_gene .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = ali_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_bridge_driver ali_m1541_bridge = { @@ -237,6 +238,7 @@ static struct agp_bridge_driver ali_m154 .free_by_type = agp_generic_free_by_type, .agp_alloc_page = m1541_alloc_page, .agp_destroy_page = m1541_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c index b4e00a3..b0acf41 100644 --- a/drivers/char/agp/alpha-agp.c +++ b/drivers/char/agp/alpha-agp.c @@ -91,6 +91,9 @@ static int alpha_core_agp_insert_memory( int num_entries, status; void *temp; + if (type = AGP_USER_TYPES || mem-type = AGP_USER_TYPES) + return -EINVAL; + temp = agp_bridge-current_size; num_entries = A_SIZE_FIX(temp)-num_entries; if ((pg_start + mem-page_count) num_entries) @@ -142,6 +145,7 @@ struct agp_bridge_driver alpha_core_agp_ .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; struct agp_bridge_data *alpha_bridge; diff --git a/drivers/char/agp/amd-k7-agp.c b/drivers/char/agp/amd-k7-agp.c index 51d0d56..9c129bd 100644 --- a/drivers/char/agp/amd-k7-agp.c +++ b/drivers/char/agp/amd-k7-agp.c @@ -376,6 +376,7 @@ static struct agp_bridge_driver amd_iron .free_by_type = agp_generic_free_by_type, .agp_alloc_page = agp_generic_alloc_page, .agp_destroy_page = agp_generic_destroy_page, + .agp_type_to_mask_type = agp_generic_type_to_mask_type, }; static struct agp_device_ids amd_agp_device_ids[] __devinitdata = diff --git a/drivers/char/agp/amd64-agp.c b/drivers/char/agp/amd64-agp.c index 00b17ae..ac24340 100644 --- a/drivers/char/agp/amd64-agp.c +++ b/drivers/char/agp/amd64-agp.c @@ -62,12 +62,18 @@ static int amd64_insert_memory(struct ag { int i, j, num_entries; long long tmp; + int mask_type; + struct agp_bridge_data *bridge = mem-bridge; u32 pte; num_entries = agp_num_entries(); - if (type != 0 || mem-type != 0) + if (type != mem-type) return -EINVAL; + mask_type = bridge-driver-agp_type_to_mask_type(bridge, type); + if (mask_type != 0) + return -EINVAL; + /* Make sure we can fit the range in the gatt table. */ /* FIXME: could wrap
[patch 2/2] agpgart - Remove unnecessary flushes.
This patch is to speed up flipping of pages in and out of the AGP aperture as needed by the new drm memory manager. A number of global cache flushes are removed as well as some PCI posting flushes. The following guidelines have been used: 1) Memory that is only mapped uncached and that has been subject to a global cache flush after the mapping was changed to uncached does not need any more cache flushes. Neither before binding to the aperture nor after unbinding. 2) Only do one PCI posting flush after a sequence of writes modifying page entries in the GATT. Patch against davej's agpgart.git /Thomas Hellström diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c index 7a6107f..75557de 100644 --- a/drivers/char/agp/generic.c +++ b/drivers/char/agp/generic.c @@ -1076,8 +1076,8 @@ int agp_generic_insert_memory(struct agp for (i = 0, j = pg_start; i mem-page_count; i++, j++) { writel(bridge-driver-mask_memory(bridge, mem-memory[i], mask_type), bridge-gatt_table+j); - readl(bridge-gatt_table+j); /* PCI Posting. */ } + readl(bridge-gatt_table+j-1); /* PCI Posting. */ bridge-driver-tlb_flush(mem); return 0; @@ -,10 +,9 @@ int agp_generic_remove_memory(struct agp /* AK: bogus, should encode addresses 4GB */ for (i = pg_start; i (mem-page_count + pg_start); i++) { writel(bridge-scratch_page, bridge-gatt_table+i); - readl(bridge-gatt_table+i); /* PCI Posting. */ } + readl(bridge-gatt_table+i-1); /* PCI Posting. */ - global_cache_flush(); bridge-driver-tlb_flush(mem); return 0; } diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c index 677581a..aaf9b30 100644 --- a/drivers/char/agp/intel-agp.c +++ b/drivers/char/agp/intel-agp.c @@ -256,9 +256,8 @@ static int intel_i810_insert_entries(str for (i = pg_start; i (pg_start + mem-page_count); i++) { writel((i*4096)|I810_PTE_LOCAL|I810_PTE_VALID, intel_i810_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(i*4)); - } + readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); break; case AGP_PHYS_MEMORY: if (!mem-is_flushed) @@ -268,13 +267,13 @@ static int intel_i810_insert_entries(str mem-memory[i], mask_type), intel_i810_private.registers+I810_PTE_BASE+(j*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(j*4)); } + readl(intel_i810_private.registers+I810_PTE_BASE+((j-1)*4)); break; default: goto out_err; } - global_cache_flush(); + agp_bridge-driver-tlb_flush(mem); out: ret = 0; @@ -293,10 +292,9 @@ static int intel_i810_remove_entries(str for (i = pg_start; i (mem-page_count + pg_start); i++) { writel(agp_bridge-scratch_page, intel_i810_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i810_private.registers+I810_PTE_BASE+(i*4)) ; } - - global_cache_flush(); + readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4)); + agp_bridge-driver-tlb_flush(mem); return 0; } @@ -646,7 +644,7 @@ static int intel_i830_insert_entries(str if (mask_type != 0 mask_type != AGP_PHYS_MEMORY mask_type != INTEL_AGP_CACHED_MEMORY) goto out_err; - + if (!mem-is_flushed) global_cache_flush(); @@ -654,9 +652,8 @@ static int intel_i830_insert_entries(str writel(agp_bridge-driver-mask_memory(agp_bridge, mem-memory[i], mask_type), intel_i830_private.registers+I810_PTE_BASE+(j*4)); - readl(intel_i830_private.registers+I810_PTE_BASE+(j*4)); } - global_cache_flush(); + readl(intel_i830_private.registers+I810_PTE_BASE+((j-1)*4)); agp_bridge-driver-tlb_flush(mem); out: @@ -671,8 +668,6 @@ static int intel_i830_remove_entries(str { int i; - global_cache_flush(); - if (pg_start intel_i830_private.gtt_entries) { printk (KERN_INFO PFX Trying to disable local/stolen memory\n); return -EINVAL; @@ -680,10 +675,9 @@ static int intel_i830_remove_entries(str for (i = pg_start; i (mem-page_count + pg_start); i++) { writel(agp_bridge-scratch_page, intel_i830_private.registers+I810_PTE_BASE+(i*4)); - readl(intel_i830_private.registers+I810_PTE_BASE+(i*4)); } - - global_cache_flush(); + readl(intel_i830_private.registers+I810_PTE_BASE+((i-1)*4)); + agp_bridge-driver-tlb_flush(mem); return 0; } @@ -777,10 +771,9 @@ static int intel_i915_insert_entries(str for (i = 0, j = pg_start; i mem-page_count; i++, j++) { writel(agp_bridge-driver-mask_memory(agp_bridge, mem-memory[i], mask_type), intel_i830_private.gtt+j); - readl(intel_i830_private.gtt+j); } - global_cache_flush(); + readl(intel_i830_private.gtt+j-1); agp_bridge-driver-tlb_flush(mem); out: @@ -795,19 +788,16 @@ static int intel_i915_remove_entries(str { int i; - global_cache_flush(); - if (pg_start intel_i830_private.gtt_entries) { printk (KERN_INFO PFX Trying to disable local/stolen memory\n); return -EINVAL; } - + for (i = pg_start; i (mem-page_count
Re: [patch 1/2] agpgart - allow user-populated memory types.
On Fri, 2006-12-08 at 19:24 +0100, Thomas Hellström wrote: + } + + if (alloc_size = PAGE_SIZE) { + new-memory = kmalloc(alloc_size, GFP_KERNEL); + } + if (new-memory == NULL) { + new-memory = vmalloc(alloc_size); this bit is more or less evil as well... 1) vmalloc is expensive all the way, higher tlb use etc etc 2) mixing allocation types is just a recipe for disaster 3) if this isn't a frequent operation, kmalloc is fine upto at least 2 pages; I doubt you'll ever want more I understand your feelings about this, and as you probably understand, the kfree / vfree thingy is a result of the above allocation scheme. The allocated memory holds an array of struct page pointers. The number of struct page pointers will range from 1 to about 8192, so the alloc size will range from 4bytes to 64K, but could go higher depending on architecture. I figured that kmalloc could fail, and, according to linux device drivers (IIRC), vmalloc is faster when allocation size exceeds 1 page. Using only vmalloc waists far too much memory if the number of small buffers is high. So we can't use only vmalloc, and we can't fail so we can't use only kmalloc. I'm open to suggestions on how to improve this. Regards, Thomas -- if you want to mail me at work (you don't), use arjan (at) linux.intel.com Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/