Re: 2.6.35-rc4-git3: Reported regressions from 2.6.34
* Linus Torvalds torva...@linux-foundation.org wrote: Bug-Entry ? ? ? : http://bugzilla.kernel.org/show_bug.cgi?id=16346 Subject ? ? ? ? : 2.6.35-rc3-git8 - include/linux/fdtable.h:88 invoked rcu_dereference_check() without protection! Submitter ? ? ? : Miles Lane miles.l...@gmail.com Date ? ? ? ? ? ?: 2010-07-04 22:04 (5 days old) Message-ID ? ? ?: aanlktinof0k28rk4rmr66aubxcrl2rfa5zearj1lq...@mail.gmail.com References ? ? ?: http://marc.info/?l=linux-kernelm=127828107815930w=2 I'm not entirely sure if these RCU proving things should count as regressions. Generally not - and we've delayed at least one more complex (cgroups) fix to v2.6.36 because the patch itself was riskier than the warning. Still most of the warning fixes turned out to be simple, so we merged the very-low-risk ones and right now we seem to be on top of them. But in general the default rule is that we delay these fixes to v2.6.36. Sure, the option to enable RCU proving is new, but the things it reports about generally are not new - and they are usually not even bugs in the sense that they necessarily cause any real problems. That particular one is in the single-thread optimizated case for fget_light, ie if (likely((atomic_read(files-count) == 1))) { file = fcheck_files(files, fd); where I think it should be entirely safe in all ways without any locking. So I think it's a false positive too. Yeah, it's a bit like with lockdep (and it's a bit like with compiler warning fixes): we had to punch through a large stack of false positives that accumulated in the past 10 years. ( Because real bugs eventually get fixed, while false positives always just accumulate. So almost by definition we always start with a very assymetric collection of warnings and a large stack of false positives. ) Having said that, it appears we got most of the false positives and are beginning to be in a more representative equilibrium now. If v2.6.35 isnt going to be warning-free then v2.6.36 certainly will be and new warnings will have a much higher likelyhood of being real (and new) bugs (not just accumulated false-positives). Ingo -- This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
* Jens Axboe jax...@fusionio.com wrote: On 2010-06-11 10:32, Ingo Molnar wrote: * Jens Axboe jax...@fusionio.com wrote: On 2010-06-09 03:53, Linus Torvalds wrote: Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=16129 Subject : BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2 Submitter: Jan Kreuzer kontrolla...@gmx.de Date : 2010-06-05 06:15 (4 days old) This seems to have been introduced by commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5 Author: Ingo Molnar mi...@elte.hu Date: Sat Nov 8 17:05:38 2008 +0100 sched: optimize sched_clock() a bit sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling variant of __cycles_2_ns(). Most of the time sched_clock() is called with irqs disabled already. The few places that call it with irqs enabled need to be updated. Signed-off-by: Ingo Molnar mi...@elte.hu and this seems to be one of those calling cases that need to be updated.. Ingo? The call trace is: BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2-8/337 caller is native_sched_clock+0x3c/0x68 Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4 Call Trace: [812362c5] debug_smp_processor_id+0xc9/0xe4 [8101059d] native_sched_clock+0x3c/0x68 [8101043d] sched_clock+0x9/0xd [81212d7a] blk_rq_init+0x97/0xa3 [81214d71] get_request+0x1c4/0x2d0 [81214ea6] get_request_wait+0x29/0x1a6 [81215537] __make_request+0x338/0x45b [812147c2] generic_make_request+0x2bb/0x330 [81214909] submit_bio+0xd2/0xef [811413cb] submit_bh+0xf4/0x116 [81144853] block_write_full_page_endio+0x89/0x96 [81144875] block_write_full_page+0x15/0x17 [8119b00a] ext4_writepage+0x356/0x36b [810e1f91] __writepage+0x1a/0x39 [810e32a6] write_cache_pages+0x20d/0x346 [810e3406] generic_writepages+0x27/0x29 [811ca279] journal_submit_data_buffers+0x110/0x17d [811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d [811d0cba] kjournald2+0x147/0x37a (from the bugzilla thing) This should be fixed by commit 28f4197e which was merged on friday. Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With some configs i get bad spinlock warnings during bootup: [ 28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 usecs [ 28.972003] calling b44_init+0x0/0x55 @ 1 [ 28.976009] bus: 'pci': add driver b44 [ 28.976374] sda: [ 28.978157] BUG: spinlock bad magic on CPU#1, async/0/117 [ 28.98] lock: 7e1c5bbc, .magic: , .owner: none/-1, .owner_cpu: 0 [ 28.98] Pid: 117, comm: async/0 Not tainted 2.6.35-rc2-tip-01092-g010e7ef-dirty #8183 [ 28.98] Call Trace: [ 28.98] [41ba6d55] ? printk+0x20/0x24 [ 28.98] [4134b7b7] spin_bug+0x7c/0x87 [ 28.98] [4134b853] do_raw_spin_lock+0x1e/0x123 [ 28.98] [41ba92ca] ? _raw_spin_lock_irqsave+0x12/0x20 [ 28.98] [41ba92d2] _raw_spin_lock_irqsave+0x1a/0x20 [ 28.98] [4133476f] blkiocg_update_io_add_stats+0x25/0xfb [ 28.98] [41335dae] ? cfq_prio_tree_add+0xb1/0xc1 [ 28.98] [41337bc7] cfq_insert_request+0x8c/0x425 [ 28.98] [41ba9271] ? _raw_spin_unlock_irqrestore+0x17/0x23 [ 28.98] [41ba9271] ? _raw_spin_unlock_irqrestore+0x17/0x23 [ 28.98] [41329225] elv_insert+0x107/0x1a0 [ 28.98] [41329354] __elv_add_request+0x96/0x9d [ 28.98] [4132bb8c] ? drive_stat_acct+0x9d/0xc6 [ 28.98] [4132dd64] __make_request+0x335/0x376 [ 28.98] [4132c726] generic_make_request+0x336/0x39d [ 28.98] [410ad422] ? kmem_cache_alloc+0xa1/0x105 [ 28.98] [41089285] ? mempool_alloc_slab+0xe/0x10 [ 28.98] [41089285] ? mempool_alloc_slab+0xe/0x10 [ 28.98] [41089285] ? mempool_alloc_slab+0xe/0x10 [ 28.98] [41089347] ? mempool_alloc+0x57/0xe2 [ 28.98] [4132c804] submit_bio+0x77/0x8f [ 28.98] [410d2cbc] ? bio_alloc_bioset+0x37/0x94 [ 28.98] [410ceb90] submit_bh+0xc3/0xe2 [ 28.98] [410d1474] block_read_full_page+0x249/0x259 [ 28.98] [410d31fb] ? blkdev_get_block+0x0/0xc6 [ 28.98] [41087bfa] ? add_to_page_cache_locked+0x94/0xb5 [ 28.98] [410d3d92] blkdev_readpage+0xf/0x11 [ 28.98] [41088823] do_read_cache_page+0x7d/0x11a [ 28.98] [410d3d83] ? blkdev_readpage+0x0/0x11 [ 28.98] [410888f4] read_cache_page_async+0x16/0x1b [ 28.98] [41088904] read_cache_page+0xb/0x12 [ 28.98] [410e80e1] read_dev_sector+0x2a/0x63 [ 28.98] [410e92e8] adfspart_check_ICS+0x2e/0x166 [ 28.98] [41ba6d55] ? printk+0x20/0x24 [ 28.98
Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
* Jens Axboe jax...@fusionio.com wrote: On 2010-06-09 03:53, Linus Torvalds wrote: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129 Subject: BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2 Submitter : Jan Kreuzer kontrolla...@gmx.de Date : 2010-06-05 06:15 (4 days old) This seems to have been introduced by commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5 Author: Ingo Molnar mi...@elte.hu Date: Sat Nov 8 17:05:38 2008 +0100 sched: optimize sched_clock() a bit sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling variant of __cycles_2_ns(). Most of the time sched_clock() is called with irqs disabled already. The few places that call it with irqs enabled need to be updated. Signed-off-by: Ingo Molnar mi...@elte.hu and this seems to be one of those calling cases that need to be updated. Ingo? The call trace is: BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2-8/337 caller is native_sched_clock+0x3c/0x68 Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4 Call Trace: [812362c5] debug_smp_processor_id+0xc9/0xe4 [8101059d] native_sched_clock+0x3c/0x68 [8101043d] sched_clock+0x9/0xd [81212d7a] blk_rq_init+0x97/0xa3 [81214d71] get_request+0x1c4/0x2d0 [81214ea6] get_request_wait+0x29/0x1a6 [81215537] __make_request+0x338/0x45b [812147c2] generic_make_request+0x2bb/0x330 [81214909] submit_bio+0xd2/0xef [811413cb] submit_bh+0xf4/0x116 [81144853] block_write_full_page_endio+0x89/0x96 [81144875] block_write_full_page+0x15/0x17 [8119b00a] ext4_writepage+0x356/0x36b [810e1f91] __writepage+0x1a/0x39 [810e32a6] write_cache_pages+0x20d/0x346 [810e3406] generic_writepages+0x27/0x29 [811ca279] journal_submit_data_buffers+0x110/0x17d [811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d [811d0cba] kjournald2+0x147/0x37a (from the bugzilla thing) This should be fixed by commit 28f4197e which was merged on friday. The scheduler commit adding local_clock() (for .36) is: c676329: sched_clock: Add local_clock() API and improve documentation So once that is upstream the block IO statistics code can use that. Thanks, Ingo -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34
* Linus Torvalds torva...@linux-foundation.org wrote: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=16129 Subject : BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2 Submitter : Jan Kreuzer kontrolla...@gmx.de Date: 2010-06-05 06:15 (4 days old) This seems to have been introduced by commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5 Author: Ingo Molnar mi...@elte.hu Date: Sat Nov 8 17:05:38 2008 +0100 sched: optimize sched_clock() a bit sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling variant of __cycles_2_ns(). Most of the time sched_clock() is called with irqs disabled already. The few places that call it with irqs enabled need to be updated. Signed-off-by: Ingo Molnar mi...@elte.hu and this seems to be one of those calling cases that need to be updated. That's a commit from 2008. Ingo? The call trace is: BUG: using smp_processor_id() in preemptible [] code: jbd2/sda2-8/337 caller is native_sched_clock+0x3c/0x68 Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4 Call Trace: [812362c5] debug_smp_processor_id+0xc9/0xe4 [8101059d] native_sched_clock+0x3c/0x68 [8101043d] sched_clock+0x9/0xd [81212d7a] blk_rq_init+0x97/0xa3 [81214d71] get_request+0x1c4/0x2d0 [81214ea6] get_request_wait+0x29/0x1a6 [81215537] __make_request+0x338/0x45b [812147c2] generic_make_request+0x2bb/0x330 [81214909] submit_bio+0xd2/0xef [811413cb] submit_bh+0xf4/0x116 [81144853] block_write_full_page_endio+0x89/0x96 [81144875] block_write_full_page+0x15/0x17 [8119b00a] ext4_writepage+0x356/0x36b [810e1f91] __writepage+0x1a/0x39 [810e32a6] write_cache_pages+0x20d/0x346 [810e3406] generic_writepages+0x27/0x29 [811ca279] journal_submit_data_buffers+0x110/0x17d [811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d [811d0cba] kjournald2+0x147/0x37a (from the bugzilla thing) The warning was introduced by this fresh commit (and a followup commit) merged in the .35 merge window: | commit 9195291e5f05e01d67f9a09c756b8aca8f009089 | Author: Divyesh Shah dps...@google.com | Date: Thu Apr 1 15:01:41 2010 -0700 | | blkio: Increment the blkio cgroup stats for real now IIRC Jens posted a fix for the regression. Jens, what's the status of that? As the code there started using a raw sched_clock() call for block statistics purposes, which was a poorly thought out (and buggy) approach: - it takes timestamps on different cpus and then compares then, but doesnt consider that sched_clock() is not comparable between CPUs without extra care - it doesnt consider the possibility for the sched_clock() result going backwards on certain platforms (such as x86) - it doesnt consider preemptability (There's work ongoing to add a clock variant that can be used for such purposes, but that's .36 fodder.) Thanks, Ingo -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm request 3
* Mike Galbraith efa...@gmx.de wrote: On the bright side, all this hubbub sends a very positive message to the noveau development crew. Folks, your work is important. I'd be proud as a peacock :) Heh, most definitely so! Noveau really is a game-changer i think, it's a big break-through for Xorg IMO. For the first time in Linux history we support 90%+ of graphics hardware in a more or less acceptable way. That is a big deal really and the xorg/drm folks should be proud of that accomplishment. It solves the nvidia.ko mess to a large degree and moves all these things into an actionable technical domain. I'm so much happier to argue with OSS folks who write this code than having to look at nvidia.ko tainted kernel crash logs. Ingo -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm request 3
* Ben Skeggs skeg...@gmail.com wrote: But here's the thing: if you expect people to do development, they _need_ to be able to mix things. A kernel developer needs to be able to update their kernel. And a kernel _tester_ needs to be able to test that kernel. Here's the thing. *You* pushed for nouveau to go into staging before any of the developers were ready for it. Two of the big reasons (from my POV) for not requesting inclusion were the context programs (which have since been dealt with) and that yes, we have no intention of keeping crusty APIs around when they aren't what we require. The idea of staging was to allow for exactly the second problem, so why are you surprised? The fact Fedora ships nouveau is irrelevant, we also expect that for the most part people will be using our packages, which deal with the ABI issues. Here is my experience with the development of various ABIs - and i've been on both sides of the fence, i've done 'flag day' ABI changes during development myself, and i've done gradual ABI development as well. One experience i can tell you with 100% certainty: no matter whether a project is small or large, simple or complex, whether the old ABI is the ugliest wart on this planet or just hit by an unfortunate limitation that needs to be eliminated. The conclusion is crystal clear, breaking an ABI via a flag day cleanup/feature/etc is: - wrong - harmful - limits the developer base - limits the tester base - wastes time and effort. (fewer developers/testers means that while _this_ feature was easier to add, all your _future_ features will be a bit harder to do. It compounds up.) - so it hurts even the very developer who is most convinced that this was the right thing to do It's a bad technical decision throughout. It's masochistic and often suicidal to just about any project in essence. I've seen projects that did it once and died just due to that single act of stupidity. I've seen projects that have done it a few times and took the usage hit, limped along with the wounds and never grew to the size they could have achieved. I've seen projects that did it once, took the hit, learned from it and never did it again. How many times does DRM want to take that bullet head on? I have _never_ seen a situation where in hindsight breaking the ABI of a widely deployed project could be considered 'good', for just about any sane definition of 'good'. It's really that simple IMO. There's very few unconditional rules in OSS, but this is one of them. Ingo -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm request 3
* Pekka Enberg penb...@cs.helsinki.fi wrote: On Fri, Mar 5, 2010 at 8:49 AM, Ingo Molnar mi...@elte.hu wrote: The conclusion is crystal clear, breaking an ABI via a flag day cleanup/feature/etc is: ?- wrong ?- harmful ?- limits the developer base ?- limits the tester base ?- wastes time and effort. (fewer developers/testers means that while _this_ ? feature was easier to add, all your _future_ features will be a bit harder ? to do. It compounds up.) ?- so it hurts even the very developer who is most convinced that this was the ? right thing to do It's a bad technical decision throughout. It's masochistic and often suicidal to just about any project in essence. I've seen projects that did it once and died just due to that single act of stupidity. I've seen projects that have done it a few times and took the usage hit, limped along with the wounds and never grew to the size they could have achieved. I've seen projects that did it once, took the hit, learned from it and never did it again. Agreed. What bothers me in this discussion is that people keep bringing up the fact that nouveau is mostly developed by volunteers and thus it doesn't make sense to make sure it's backwards (or forwards) compatible. But the way I see it, it's the complete opposite. It's _more_ important to support ABIs for community-driven efforts because you're relying on people who by definition don't have time to waste. While the nouveau people might have good intentions, I'm afraid they might be severely limiting their developer and tester base because they're not focused on real world problems (like the ones Linus is seeing). Yeah. I've seen a few other bad arguments as well: 'exploding test matrix' This is often the result of _another_ bad technical decision: over-modularization. Xorg, mesa/libdrm and the kernel DRM drivers pretty share this signature: - it's developed by the same tightly knit developer base who often cross between these packages. Features often need changes in each component. - a developer to be able to do real work has to have the latest sources of all these components. - a user just uses whatever horizontal version cut the distro did and never truly 'mixes' these components as a conscious decision. - distros just try to get the latest and most capable but still stable version. Desperately so. Often they will create a version mix that was never tested by developers in that form. They'll expose users to ABI combinations that were never really intended. They have trouble bootstrapping and stabilizing those essentially random combinations and then have trouble applying stability and security fixes. The thing is, if development has such characteristics then it's pretty clearly not 3-4 separate projects but _one_ abstract project. [*] So the 'exploding test matrix' is simply the result of: creating ABIs between 3-4 _artificial components of the same project_ and then going through developer hell living with that mistake. [**] It's a bit as if we split up the kernel into 'microkernel' components, did a VFS ABI, MM ABI, drivers ABI, scheduler ABI, networking ABI and arch ABIs, and then tried to develop them as separate components. If we did then then Linux kernel development would slow down massively while in reality everyone would _still_ have to have the latest and greatest source checked out to do some real development work and to be able to implement features that affect the whole kernel ... Linux would become an epic fail of historic proportions if we ever did that. Ingo [*] I realize that it's possibly hard for Xorg to merge with mesa and the kernel for license reasons, but my technical observation still stands. [**] I realize that modularization and many small packages were a clear advantage when we were still all downloading bits via 14.4k modems, but in this day and age of megabit connections that has become a non-issue. Rampant over-modularization and the resulting loss of focus on the end result (and the resulting explosion of a test matrix) is hurting us _far more_ than the disadvantages of any monolithic could ever hurt. We are seeing the proof of that all day with a 10+ MLOC kernel. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm request 3
* C. Bergstr?m cbergst...@pathscale.com wrote: Pekka Enberg wrote: On Fri, Mar 5, 2010 at 8:49 AM, Ingo Molnar mi...@elte.hu wrote: The conclusion is crystal clear, breaking an ABI via a flag day cleanup/feature/etc is: - wrong - harmful - limits the developer base - limits the tester base - wastes time and effort. (fewer developers/testers means that while _this_ feature was easier to add, all your _future_ features will be a bit harder to do. It compounds up.) - so it hurts even the very developer who is most convinced that this was the right thing to do It's a bad technical decision throughout. It's masochistic and often suicidal to just about any project in essence. I've seen projects that did it once and died just due to that single act of stupidity. I've seen projects that have done it a few times and took the usage hit, limped along with the wounds and never grew to the size they could have achieved. I've seen projects that did it once, took the hit, learned from it and never did it again. Agreed. What bothers me in this discussion is that people keep bringing up the fact that nouveau is mostly developed by volunteers and thus it doesn't make sense to make sure it's backwards (or forwards) compatible. But the way I see it, it's the complete opposite. It's _more_ important to support ABIs for community-driven efforts because you're relying on people who by definition don't have time to waste. While the nouveau people might have good intentions, I'm afraid they might be severely limiting their developer and tester base because they're not focused on real world problems (like the ones Linus is seeing). staging != stable Nobody guaranteed a stable API for staging and in fact it was stated previously it needed to be changed. Please lets just get back to work and stop declaring the sky is falling. I dont think you understood the argument. The (very simple) argument was: no matter how a project is developed, whether it's been freshly announced (not even in staging), in staging or been upstream for years, breaking ABIs is _technically wrong_. No ifs and when. A released ABI that is in use cannot be so messy to make it worth breaking. You've got users. You've got developers. You've got yourself. You can still phase it out gradually (and even do that quickly), one or two stable releases down the road you can even print out the final ABI removal patch on paper, make a bonfire out of it and jump on its ashes in joy, but if you are interested in running a successful OSS project then the current ABI is sacrosanct. Ingo -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@linux.ie wrote: If it now does not boot up if all its sub-options are enabled, even of some of those sub-options are new, does that count as a driver regression? Sure it does to me ... But it doesn't to anyone else under any reasonable meaning of the word regression. There are reactions in this thread that contradict your 'anyone else' point. The config option states Choose this option if you want kernel modesetting enabled by default, and you have a new enough userspace to support this. Running old userspaces with this enabled will cause pain. Will cause pain sounds painful to me, I can make it seem much worse if you'd like. Except you are missing that the hang (and the first crash as well) happens on brand-new user-space just as much - not just on 'old userspaces'. The bugs i've triggered are independent of any user-space component - it happens with a fresh distro just as much. As i suggested before, at least the text should be updated to include what has been written about CONFIG_DRM_RADEON_KMS in this thread before: This is a completely new driver. It's only part of the existing drm for compatibility reasons. It requires an entirely different graphics stack above it and works very differently from the old drm stack. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: On Fri, Feb 5, 2010 at 7:00 PM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@linux.ie wrote: If it now does not boot up if all its sub-options are enabled, even of some of those sub-options are new, does that count as a driver regression? Sure it does to me ... But it doesn't to anyone else under any reasonable meaning of the word regression. There are reactions in this thread that contradict your 'anyone else' point. The config option states Choose this option if you want kernel modesetting enabled by default, ? ? ? ? ? and you have a new enough userspace to support this. Running old ? ? ? ? ? userspaces with this enabled will cause pain. Will cause pain sounds painful to me, I can make it seem much worse if you'd like. Except you are missing that the hang (and the first crash as well) happens on brand-new user-space just as much - not just on 'old userspaces'. The bugs i've triggered are independent of any user-space component - it happens with a fresh distro just as much. As i suggested before, at least the text should be updated to include what has been written about CONFIG_DRM_RADEON_KMS in this thread before: ? This is a completely new driver. ?It's only part of the existing drm for ? compatibility reasons. ?It requires an entirely different graphics stack ? above it and works very differently from the old drm stack. Okay I've attached a patch with a revised Kconfig in it. Does this sound more like reality? Looks perfect to me! Acked-by: Ingo Molnar mi...@elte.hu Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 08:17:05AM +0100, Ingo Molnar wrote: btw., i just found another bug activated via this same commit, a boot hang after DRM init: The commit in question didn't cause the hang, so reverting it isn't the appropriate fix. Well, once i applied the revert i got no more hangs or crashes today, in lots of bootups. This is fully repeatable - if i re-apply that commit with the config i sent the hang happens again. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Linus Torvalds torva...@linux-foundation.org wrote: On Thu, 4 Feb 2010, Ingo Molnar wrote: Well, once i applied the revert i got no more hangs or crashes today, in lots of bootups. This is fully repeatable - if i re-apply that commit with the config i sent the hang happens again. But that's just because when it was in staging, you'd never enable it, right? Because your config generator always turns off staging stuff. IOW, it's not the code - it's more an interaction with how your test-bed works. Correct. Staging drivers can crash and there's no guarantee of quick regression fixes (although fixes are generally quick so this isnt a complaint), so i exclude them from boot testing. Anyway, i've essentially moved it back to staging as far as my testing goes, that solves the problem too. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 07:12:18PM +0100, Ingo Molnar wrote: * Matthew Garrett mj...@srcf.ucam.org wrote: The reason the option was in staging (as has been mentioned before) was because the ABI wasn't felt to be stable enough. Upstream is now willing to commit to that stability, so now seems as good a time to move it as any. There's no code change and there's no default configuration change, so I really can't see any way that it can be classed as a regression. But that argument in essence renders the regression policy meaningless for such code: just about any new driver feature under the sun could be shaped as a Kconfig option, introduced via a drivers/staging Kconfig entry, and then activated via a twoliner commit in a later -rc. Before this patch, CONFIG_DRM_RADEON_KMS=y would crash your system on boot. [...] Hm, in what way does that observation address the concerns i've outlined? Before this patch i could enable CONFIG_DRM_RADEON_KMS=y only if i enabled CONFIG_STAGING, which i dont, because doing so would taint my kernel with TAINT_CRAP, and the kernel log would contain: %s: module is from the staging directory, the quality is unknown, you have been warned., [...] After this patch, CONFIG_DRM_RADEON_KMS=y still crashes your system. [...] After this patch i suddenly get a new body of code with a default-off option that would only show up before if i had CONFIG_STAGING=y enabled before. Do you see my argument why any user who is hit by this would categorize this as a kernel regression in an existing driver? Moving driver functionality from drivers/staging/ to drivers/ might be justified, it might be pragmatic, but you dont try to justify it and you dont try to outline the pragmatic reasons - from all i can see you seem to argue that this is all perfectly fine in late -rc's, which has me worried somewhat. [ And if that is really fine i'd like to hear Linus's amen on that as well, because i'm sure others would like to use that mechanism too to enable new functionality in late -rc's. ] Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 07:56:03PM +0100, Ingo Molnar wrote: Do you see my argument why any user who is hit by this would categorize this as a kernel regression in an existing driver? No. If a user changes configuration and gets a hang, that's a bug but not a regression. Only if it's some brand-new driver or a brand-new kernel feature for which no-one can have any prior expectations of stability. Especially if it's added in the merge window when many new drivers are added. But isnt it a regression to a user if it's shipped in -rc7 appearing as a new sub-option of an existing driver? I'd wager that most main-street Linux users would consider that a regression. As i see it is that you are trying to have it both ways: claim it's a new driver when it comes to handling regressions, but also try to have the benefits (and adoption flux) of an old driver when it comes to facing it to users. Some info about that in the Kconfig would be helpful IMO - so that people are less surprised if it happens to break while the radeon driver worked fine for them before. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Feb 4, 2010 at 2:06 PM, Ingo Molnar mi...@elte.hu wrote: * Alex Deucher alexdeuc...@gmail.com wrote: On Thu, Feb 4, 2010 at 1:12 PM, Ingo Molnar mi...@elte.hu wrote: * Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 06:54:45PM +0100, Ingo Molnar wrote: But you could claim that it's not a regression because 1) technically the code got introduced in drivers/staging/, and staging drivers are not on the regression list 2) the Kconfig value is default-off so it can only harm those who got lured by a new Kconfig value popping up in -rc7 in a well working driver they already have enabled. So the moving of driver functionality from drivers/staging/ to drivers/ is a grey area it appears. Wouldnt it have been better to do this in the next merge window, as all other drivers do? It's not new hardware enablement either, it's feature enablement for an existing driver. The reason the option was in staging (as has been mentioned before) was because the ABI wasn't felt to be stable enough. Upstream is now willing to commit to that stability, so now seems as good a time to move it as any. There's no code change and there's no default configuration change, so I really can't see any way that it can be classed as a regression. But that argument in essence renders the regression policy meaningless for such code: just about any new driver feature under the sun could be shaped as a Kconfig option, introduced via a drivers/staging Kconfig entry, and then activated via a twoliner commit in a later -rc. IMHO the point of tracking regressions is to reduce the bugginess of the kernel and thus to help users, not to give ground for legalistic arguments. There _are_ common-sense exceptions from the regression rules, such as the introduction of a new piece of hardware that was previously unsupported (hence there's no expectation of stability) - but the tweaking of an existing, widely used driver (even if the new opion is default-off) hardly seems to qualify for that. This is a completely new driver. ?It's only part of the existing drm for compatibility reasons. ?It requires an entirely different graphics stack above it and works very differently from the old drm stack. Will the user know? IMHO what matters in the end is user expectation. Lets walk through what a current kernel tester of the drm/radeon driver sees when he types 'make oldconfig' after installing the (to-be-released) .33-rc7 kernel. Firstly, the user with a brand-new distro already has this enabled: ?CONFIG_DRM_RADEON=y and knows the driver, and it performs adequately. Then in -rc7 he gets a new option: ?ATI Radeon (DRM_RADEON) [Y/n/?] y ? ?Enable modesetting on radeon by default (DRM_RADEON_KMS) [N/y/?] (NEW) The user might easily go: Hey this is a driver i already have, and there's a new sub-option for this well-working driver. Sure, enable it, these kernel folks know what they are doing and i rarely see any crashes past -rc2 kernels. Does this new option tell him what you just told me, that: ? This is a completely new driver. ?It's only part of the existing drm for ? compatibility reasons. ?It requires an entirely different graphics stack ? above it and works very differently from the old drm stack. ? it doesnt. Even if he types '?', it tells: ?CONFIG_DRM_RADEON_KMS: ?Choose this option if you want kernel modesetting enabled by default, ?and you have a new enough userspace to support this. Running old ?userspaces with this enabled will cause pain. The user will likely go cool I have a fresh distro with recent Xorg, lets try it. And if it crashes, he'll report a bug and we'll fix it. Nobody has reacted to my related boot hang bugreport yet - and it's detailed and fully reproducible (so i can test any proposed fixes as well in short order). I.e. my limited testing has triggered two separate bugs in the same driver - and this will show up in -rc7. It might be all OK and no-one else will see trouble. Or past patterns might repeat themselves and i might simply be an early bird for trouble to come. My (oft repeated) point is that adding new sub-features to existing drivers is not what we do in late -rc's: there's simply not enough time to shake out bugs/regressions in them. We introduce new functionality to existing drivers in the merge window - in the two weeks following a stable kernel's release. In late -rc's we only try to fix regressions. Sometimes we make exceptions for pragmatic reasons, but then we are straightforward about those reasons and try to warn users about our zeal to help them with cool, new, not-to-be-missed GPU functionality ;-) Thanks, Ingo
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Jesse Barnes jbar...@virtuousgeek.org wrote: On Thu, 4 Feb 2010 20:32:32 +0100 Ingo Molnar mi...@elte.hu wrote: Nobody has reacted to my related boot hang bugreport yet - and it's detailed and fully reproducible (so i can test any proposed fixes as well in short order). I.e. my limited testing has triggered two separate bugs in the same driver - and this will show up in -rc7. It might be all OK and no-one else will see trouble. Or past patterns might repeat themselves and i might simply be an early bird for trouble to come. My (oft repeated) point is that adding new sub-features to existing drivers is not what we do in late -rc's: there's simply not enough time to shake out bugs/regressions in them. We introduce new functionality to existing drivers in the merge window - in the two weeks following a stable kernel's release. This is the .config issue right? It doesn't sound like the bug is new, you're just seeing now it because of the way you run tests. It shouldn't affect any more or fewer users than it did before, and reverting the move radeon KMS out of staging won't fix the bug at all or prevent anyone from seeing it. People using KMS will still use KMS and people without it won't, [...] I think you are missing my point. My point is very simple: existing non-KMS users of CONFIG_DRM_RADON=y (a pre-existing driver) might turn on the new sub-feature (CONFIG_DRM_RADEON_KMS=y), in the expectation that this is a safe addition to his currently well-working driver. ( I have to confess i do that all the time for drivers that work well for me, and if it pops up in a late -rc i sure expect it to be safe to enable. I dont even read the help text most of the time - if the single-line summary sounds useful i enable it. Especially if the Kconfig help entry says it's safe with a new distro, it's not CONFIG_EXPERIMENTAL, it's not marked CONFIG_BROKEN, it's not in CONFIG_STAGING, etc. ) That action might hang or crash his kernel, and if that user then reports: Hey, -rc7 just hung on me after enabling this new .config option it offered for the radeon driver i am using, please add this to the list of regressions. is this really the right kind of reply: Since we moved it from drivers/staging/ to drivers/ this hang you are seeing is technically not a regression, we might or might not fix it. ? I doubt the user would be overly enthusiastic about that kind of reply ;-) Guys, you should really _think_ about it a minute and realize what the purpose of a regression policy is. It's not to be a PITA to subsystem maintainers, it's not an annoyance just to keep you from doing cool stuff. It's not something which you should try to lawyer your way out of via an as narrow interpretation as you can. A regression policy is something that generally helps the quality of Linux, so it's worth interpreting broadly and generously in spirit not just in letter. If there's a single most prominent complaint i hear about the upstream kernel is that it breaks too often. (right after 'it doesnt support my graphics hardware' - so i sure can relate to the pragmatic reasons of pushing KMS strongly!) If i run into a crash and a hang, you can bet that others will as well. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Jerome Glisse gli...@freedesktop.org wrote: On Thu, Feb 04, 2010 at 08:19:35PM +0100, Ingo Molnar wrote: * Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 07:56:03PM +0100, Ingo Molnar wrote: Do you see my argument why any user who is hit by this would categorize this as a kernel regression in an existing driver? No. If a user changes configuration and gets a hang, that's a bug but not a regression. Only if it's some brand-new driver or a brand-new kernel feature for which no-one can have any prior expectations of stability. Especially if it's added in the merge window when many new drivers are added. But isnt it a regression to a user if it's shipped in -rc7 appearing as a new sub-option of an existing driver? I'd wager that most main-street Linux users would consider that a regression. As i see it is that you are trying to have it both ways: claim it's a new driver when it comes to handling regressions, but also try to have the benefits (and adoption flux) of an old driver when it comes to facing it to users. We have been treating KMS regression as regression, [...] Great! [...] i fixed numerous regressions since it was first merged as an staging driver, and i keep doing so, i try to be as much reactive as i can. I am sorry you have a bad experience about it. I just wanted to add that we planed to move KMS out of staging in 2.6.33 long time ago and yes maybe we should have done it earlier, but no matter when we do the change you will still face this bug until we fix it. I dont think you'll ever see my complain about a bug. I dont, and i introduce far too many of them to have any moral basis for complaint in any case ;-) I only questioned the validity of this initial reaction by Dave Arlie: | Its not enabled by default so reverting this doesn't make much sense. | | We can just treat this as a normal driver bugreport. So on fixing the issue front, one question do you also enable radeonfb ? if so then its likely the root issue of this bug, i think kconfig should forbid having both radeon kms + radeonfb but i am not sure how allyesconfig behave in respect of such constraint. Please see the bugreprt i made in this thread, under the following subject: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging. It has all that info and more. (i've bounced it to you privately as well) Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: On Wed, Feb 3, 2010 at 1:44 AM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@linux.ie wrote: It's the moving of radeom KMS out of staging after -rc6 that causes it, because it brought it into the scope of my testing: ?f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging. So at least on this box it's clearly not ready for mainline enablement yet. I've attached the revert patch further below. Its not enabled by default so reverting this doesn't make much sense. I boot allyesconfig kernels regularly, which testing method works fine with another 2000+ upstream drivers. (including the dozens of drivers which match to active hardware components on that box) Okay this was something I wondered about, since these are *not* allyesconfig .configs, I've generated some and CONFIG_FB_RADEON is always on here, and you seem to not have that enabled (not that enabling it is a good idea it is in fact a really bad idea). These were random configs - the size doesnt match an allyesconfig, those are way bigger. My above comment related to the first crash, and to my argument that all other drivers are fine during bootup - and there's a lot of them. So do you have something you are running after allyesconfig to fix things? or have you just got a config that is close enough to allyesconfig. I'm building kernels with your .config now and boot testing them on the full range of hardware I have/ Thanks. Is there something i can enable to get a better log for you to find out where (and why) it's hanging? It's still early during bootup so the box is not particularly debuggable - so i'm not sure i can get a task list dump, etc., unfortunately. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Jesse Barnes jbar...@virtuousgeek.org wrote: [...] That action might hang or crash his kernel, and if that user then reports: Hey, -rc7 just hung on me after enabling this new .config option it offered for the radeon driver i am using, please add this to the list of regressions. is this really the right kind of reply: Since we moved it from drivers/staging/ to drivers/ this hang you are seeing is technically not a regression, we might or might not fix it. ? I doubt the user would be overly enthusiastic about that kind of reply ;-) Whether or not it's a regression is mostly irrelevant, it's a real bug and the radeon guys are working on fixing it. [...] Fortunately it's being worked on. I beg to differ with your argument about it not mattering whether a bug is categorized as a regression: Rafael's regression list is far more prominent and the bugs listed there get fixed with a high likelhood. Note that there's clear evidence of that in this very thread: the hang bug was ignored as a plain DRM non-regression bugreport, _despite_ the prior scrutiny in the thread, up to the moment Linus pointed it out and turned it into a de-facto regression ... There's also another purpose of categorizing bugs as regressions: tester timeliness. We tend to treat bugs as 'plain bugs' when they are reported too late after a few kernel releases of the bug having been in the wild. We do this to encourage testers to test earlier -rc's as well, as there's a real tangible benefit of the 'we dont do regressions' policy: bugs get fixed and testers feel involved, and it's also the stage were we _can_ fix bugs with a lower cost to all parties involved. But what 'timeliness of testing' can there be if new features are added in a late -rc and bugs are explicitly categorized as 'not a regression'? Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 09:22:54PM +0100, Ingo Molnar wrote: Hey, -rc7 just hung on me after enabling this new .config option it offered for the radeon driver i am using, please add this to the list of regressions. If the same configuration options hang on both an old kernel and a new kernel, how is that in any plausible way a regression? What's regressed? Regressions are not limited to 'same config' kernels, last i checked. If that has changed (or if i'm misunderstanding it) then it would be nice to hear a clarification about that from Linus. The way i understand it is that there are narrow exceptions from the regression rules, such as completely new drivers for which there can be no prior expectation of stability by users. (but for even them we are generally on the safer side to list bugs in them as regressions as well - especially if we expect many users to enable it.) AFAIK there's no exception for new sub-features of existing facilities or drivers, even if it's default-disabled. This issue materially affects quite a few bugs i'm handling as a maintainer. Many of them are under default-off config options - most new aspects to existing code are introduced in such a way. It would remove quite a bit of urgent-workload from my workflow if i could strike them from Rafael's list and could deprioritize them as plain bugs, to be fixed as time permits. IMHO it would be rather counter-productive to kernel quality if we did that kind of regression-lawyering though. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: These were random configs - the size doesnt match an allyesconfig, those are way bigger. My above comment related to the first crash, and to my argument that all other drivers are fine during bootup - and there's a lot of them. So do you have something you are running after allyesconfig to fix things? or have you just got a config that is close enough to allyesconfig. I'm building kernels with your .config now and boot testing them on the full range of hardware I have/ Thanks. Is there something i can enable to get a better log for you to find out where (and why) it's hanging? It's still early during bootup so the box is not particularly debuggable - so i'm not sure i can get a task list dump, etc., unfortunately. Do you have NMI watchdog enabled? (does it work that early) a backtrace of where it hangs would be nice, Also a dmesg from booting with drm.debug=15 might help narrow it down also. Okay I've booted this on the rv370 + rv380 machines I have, I've no old Athlon's though so I'm trying to get RHTS to give me access to one or two internally, Another question, that came to mind, what is there any monitors plugged in? or a KVM or something? if yes can you try without and if no can you try with? Also can you add CONFIG_FRAMEBUFFER_CONSOLE as well since I don't think we test often without it. ok - i'll try your suggestions and send an update - will probably have to wait until next week. Feel free to deprioritize the bug until then (i have my revert as a short-term band-aid), unless others report similar problems too. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Matthew Garrett mj...@srcf.ucam.org wrote: On Thu, Feb 04, 2010 at 10:05:59PM +0100, Ingo Molnar wrote: Regressions are not limited to 'same config' kernels, last i checked. If that has changed (or if i'm misunderstanding it) then it would be nice to hear a clarification about that from Linus. If an option has *never* worked on a given configuration, then it's not a regression. [...] The *main driver* (CONFIG_DRM_RADEON=y, as far as the user is concerned) is many years old and it certainly worked just fine for tens of thousands of test iterations on this box, up until that commit. That box alone has done in excess of half a million boot iterations in the past 2+ years. About 28% of the tests had CONFIG_DRM_RADEON=y, so the number of successful bootups with CONFIG_DRM_RADEON=y is in excess of one hundred thousand. There was not a single failed bootup in those two years due to the CONFIG_DRM_RADEON driver that i can remember. If it now does not boot up if all its sub-options are enabled, even of some of those sub-options are new, does that count as a driver regression? Sure it does to me ... IMHO you are trying to put a narrow technical distinction into it which does not exist for users. You argue that it's a new default-off sub-option of an existing driver, hence it cannot be a regression. The option shows up as a sub-option to an existing driver in make oldconfig, with a fairly innocious sounding help text, so to a user it certainly looks as if it's one unit and that it is the radeon driver that regressed. *If* we make a driver feature available in a popular driver and make it Kconfig visible without obvious warnings (i.e. lure the user to enable it with the notion that it's just one coherent unit with a trusted, existing driver), we should also hold up the other side of the deal and treat the *bugs* as a coherent unit as well. I.e. treat the driver as a coherent whole not only when it's convenient to us, but also when it's somewhat inconvenient to do. We cannot have it both ways. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: On Wed, Feb 3, 2010 at 1:46 AM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@gmail.com wrote: On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@linux.ie wrote: Hi Linus, Please pull the 'drm-linus' branch from ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-linus I've also added an oops fix I seem to lose off my radar to this tree. commit 17aafccab4352b422aa01fa6ebf82daff693a5b3 Author: Michel D??nzer daen...@vmware.com Date: ? Fri Jan 22 09:20:00 2010 +0100 ? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure. Wierd this suggests something else is wrong on that machine can you get me the whole dmesg? I'm guessing some iommu or swiotlb issue. This box has no known hardware or software problems, just this week it booted in excess of 1000 kernels so i'd exclude that angle for now. I have bisected the crash back to the DRM tree and the crash went away with the Kconfig revert i applied - and it got fixed by Jerome's patch. I posted my config and i posted the relevant boot log as well. Find below the full bootlog as well with vanilla -git (ab65832) and the config. (i dont think it matters) I've asked Jerome to fix the oops, but really anyone with an old .config won't get hit by this, and we've booted this on quite a lot of machines at this point. I dont see the commit in yesterday's linux-next. It has very fresh timestamps: ?commit f71d0187987e691516cd10c2702f002c0e2f0edc ?Author: ? ? Dave Airlie airl...@redhat.com ?AuthorDate: Mon Feb 1 11:35:47 2010 +1000 ?Commit: ? ? Dave Airlie airl...@redhat.com ?CommitDate: Mon Feb 1 11:35:47 2010 +1000 What kind of widespread testing could this commit have gotten in the less than 24 hours before it hit mainline? Its shipping in a major distro by default, its planned to be shipped in an even more major distro. Its been boot tested on 1000s of machines by 1000s of ppl. Well but that's not the precise tree you sent to Linus, is it? Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: On Wed, Feb 3, 2010 at 1:46 AM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@gmail.com wrote: On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@linux.ie wrote: Hi Linus, Please pull the 'drm-linus' branch from ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-linus I've also added an oops fix I seem to lose off my radar to this tree. commit 17aafccab4352b422aa01fa6ebf82daff693a5b3 Author: Michel D??nzer daen...@vmware.com Date: ? Fri Jan 22 09:20:00 2010 +0100 ? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure. Wierd this suggests something else is wrong on that machine can you get me the whole dmesg? I'm guessing some iommu or swiotlb issue. This box has no known hardware or software problems, just this week it booted in excess of 1000 kernels so i'd exclude that angle for now. I have bisected the crash back to the DRM tree and the crash went away with the Kconfig revert i applied - and it got fixed by Jerome's patch. I posted my config and i posted the relevant boot log as well. Find below the full bootlog as well with vanilla -git (ab65832) and the config. (i dont think it matters) I've asked Jerome to fix the oops, but really anyone with an old .config won't get hit by this, and we've booted this on quite a lot of machines at this point. I dont see the commit in yesterday's linux-next. It has very fresh timestamps: ?commit f71d0187987e691516cd10c2702f002c0e2f0edc ?Author: ? ? Dave Airlie airl...@redhat.com ?AuthorDate: Mon Feb 1 11:35:47 2010 +1000 ?Commit: ? ? Dave Airlie airl...@redhat.com ?CommitDate: Mon Feb 1 11:35:47 2010 +1000 What kind of widespread testing could this commit have gotten in the less than 24 hours before it hit mainline? Its shipping in a major distro by default, its planned to be shipped in an even more major distro. Its been boot tested on 1000s of machines by 1000s of ppl. Well but that's not the precise tree you sent to Linus, is it? It pretty much is. If I could blame your crash on any of the recent patches I would but its something new and unfun. [...] You dont seem to realize the plain and simple fact that the bug (and some other bug) was obscure before because this particular KMS aspect of the radeon driver was in drivers/staging/, and it became more prominent via this post-rc6 commit: | From f71d0187987e691516cd10c2702f002c0e2f0edc Mon Sep 17 00:00:00 2001 | From: Dave Airlie airl...@redhat.com | Date: Mon, 1 Feb 2010 11:35:47 +1000 | Subject: [PATCH] drm/radeon/kms: move radeon KMS on/off switch out of staging. | | We are happy enough that the KMS driver is stable enough for enough people | for the kms enable/disable to leave staging. Distros can now contemplate | turning this on. | | Signed-off-by: Dave Airlie airl...@redhat.com | --- | drivers/gpu/drm/Kconfig |2 ++ | drivers/staging/Kconfig |2 -- | 2 files changed, 2 insertions(+), 2 deletions(-) I never claimed (and still dont claim) that the bug is 'new' per se, so why do you keep beating down on that straw man argument? I said it in my very first mail that this bug got brought upon us by the Kconfig commit above: It's the moving of radeom KMS out of staging after -rc6 that causes it, because it brought it into the scope of my testing: f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging. So at least on this box it's clearly not ready for mainline enablement yet. I dont mind reporting bugs and testing patches (as i did), all i said is that from a QA angle it's somewhat late to do that in -rc7. (It's not even a completely new driver either, which people would know to stay away from - it's a new config option of an existing driver, so i'd expect many people to turn it on when they see it in the oldconfig - even though it's default-off.) You made the bug more prominent by moving it into the driver proper, after -rc6, and while i dont mind reporting and working on bugs, your constant denial is somewhat counter-productive, as (beyond the waste of time on these emails) it suggests that we might see repeat incidents of this kind in the future. Anyway, with two bugs in a row this commit is clearly too problematic for me so i have reverted f71d018 from -tip. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@linux.ie wrote: It's the moving of radeom KMS out of staging after -rc6 that causes it, because it brought it into the scope of my testing: f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging. So at least on this box it's clearly not ready for mainline enablement yet. I've attached the revert patch further below. Its not enabled by default so reverting this doesn't make much sense. I boot allyesconfig kernels regularly, which testing method works fine with another 2000+ upstream drivers. (including the dozens of drivers which match to active hardware components on that box) We can just treat this as a normal driver bugreport. Commit f71d0187 is post-rc6, so i'm not sure it fits your this is business as usual, lets move on characterisation. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Jerome Glisse gli...@freedesktop.org wrote: On Tue, Feb 02, 2010 at 09:17:27AM +0100, Ingo Molnar wrote: * Dave Airlie airl...@linux.ie wrote: Hi Linus, Please pull the 'drm-linus' branch from ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-linus I've also added an oops fix I seem to lose off my radar to this tree. commit 17aafccab4352b422aa01fa6ebf82daff693a5b3 Author: Michel D??nzer daen...@vmware.com Date: Fri Jan 22 09:20:00 2010 +0100 drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure. FYI, this drm pull into mainline has triggered quick boot crashes in -tip testing (even with the above fix applied), on an Athlon64 whitebox PC with: 01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon X300 (PCIE)] 01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE] the crash is: [7.111003] radeon :01:00.0: Disabling GPU acceleration [7.273547] Failed to wait GUI idle while programming pipes. Bad things might happen. [7.436296] [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, shutting down CP. [7.598755] Failed to wait GUI idle while programming pipes. Bad things might happen. [7.599306] BUG: unable to handle kernel paging request at f838 [7.59] IP: [c149f0de] rv370_pcie_gart_set_page+0x2d/0x3c [7.59] *pde = 36d44067 *pte = [7.59] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC [7.59] last sysfs file: i have bisected it back to: | 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0 is the first bad commit | commit 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0 | Author: Dave Airlie airl...@redhat.com | Date: Fri Jan 29 15:31:47 2010 +1000 | | drm/radeon/kms: fix incorrect logic in DP vs eDP connector checking. | | This makes displayport work again here. Unfortunately even with that patch reverted it still crashes. Config and bootlog attached. It's the moving of radeom KMS out of staging after -rc6 that causes it, because it brought it into the scope of my testing: f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging. So at least on this box it's clearly not ready for mainline enablement yet. I've attached the revert patch further below. Ingo Attached is a patch which will fix the oops, still it's strange that CP fails to init on your config. [...] Thanks, that fixes the crash here! Tested-by: Ingo Molnar mi...@elte.hu [...] Do you have IOMMU enabled ? I haven't played with iommu stuff thus i wonder if we are missing somethings in this area. No IOMMU here - this is a 5 years old box. (beyond GART that is) Your patch fixes a bona-fide illegal-access bug in the DRM code, that's more than enough to crash the box ;-) Btw., there's a new warning in the DRM code drivers/gpu/drm/ati_pcigart.c: In function ???drm_ati_pcigart_init???: drivers/gpu/drm/ati_pcigart.c:115: warning: format ???%Lx??? expects type ???long long unsigned int???, but argument 3 has type ???dma_addr_t??? Please fix that too, the kernel build is noisy enough as-is. Thanks, Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com-- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.
* Dave Airlie airl...@gmail.com wrote: On Tue, Feb 2, 2010 at 6:35 PM, Dave Airlie airl...@gmail.com wrote: On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote: * Dave Airlie airl...@linux.ie wrote: Hi Linus, Please pull the 'drm-linus' branch from ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git drm-linus I've also added an oops fix I seem to lose off my radar to this tree. commit 17aafccab4352b422aa01fa6ebf82daff693a5b3 Author: Michel D??nzer daen...@vmware.com Date: ? Fri Jan 22 09:20:00 2010 +0100 ? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure. Wierd this suggests something else is wrong on that machine can you get me the whole dmesg? I'm guessing some iommu or swiotlb issue. I've asked Jerome to fix the oops, but really anyone with an old .config won't get hit by this, and we've booted this on quite a lot of machines at this point. Ideas keep flooding in here, did you softboot from the old UMS driver? or coldboot? does coldbooting help? Btw., i have cold booted it after full poweroff, and it crashed in the same way. Ingo -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm
* Alan Cox a...@lxorguk.ukuu.org.uk wrote: Last time they were asked that, they wanted to be free of changing their kernel/userspace interface before upstreaming. So put it in staging with a note to that effect. That'll also get it more exposure and review. Well, arguably that particular idea should have come from the people maintaining that area of code. There's this one missing driver which covers (more or less) like 40%-50% of the PC market, that's a glaringly significant issue, isnt it? It doesnt get any more significant, does it? Ingo -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[patch] Fix: 'return -ENOMEM' instead of 'return ENOMEM'
* Andrew Morton a...@linux-foundation.org wrote: @@ -3730,7 +3730,7 @@ tracing_stats_read(struct file *filp, char __user *ubuf, s = kmalloc(sizeof(*s), GFP_KERNEL); if (!s) - return ENOMEM; + return -ENOMEM; trace_seq_init(s); lol, there we go again. Andy, can we have a checkpatch rule please? Note, that will upset creative uses of error codes i guess, such as fs/xfs/. But yeah, +1 from me too. Ob'post'mortem - looked for similar patterns in the kernel and there's quite a few bugs there: include/net/inet_hashtables.h: return ENOMEM; # bug drivers/scsi/aic7xxx/aic7xxx_osm.c: return ENOMEM; # works but weird drivers/scsi/cxgb3i/cxgb3i_offload.c: return ENOMEM; # works but weird fs/ocfs2/dlm/dlmrecovery.c:return EAGAIN; # bug drivers/block/cciss_scsi.c: return ENXIO; # works but weird drivers/gpu/drm/radeon/radeon_irq.c:return EINVAL; # bug drivers/gpu/drm/radeon/radeon_irq.c:return EINVAL; # bug drivers/isdn/hardware/mISDN/hfcmulti.c: return EINVAL; # bug 5 out of 8 places look buggy - i.e. more than 60% - a checkpatch warning would avoid real bugs here. (even ignoring the cleanliness effects of using proper error propagation) Cc:-ed affected maintainers. The rightmost column are my observations. Below is the patch fixing these. Ingo Signed-off-by: Ingo Molnar mi...@elte.hu --- drivers/gpu/drm/radeon/radeon_irq.c|4 ++-- drivers/isdn/hardware/mISDN/hfcmulti.c |2 +- fs/ocfs2/dlm/dlmrecovery.c |2 +- include/net/inet_hashtables.h |2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_irq.c b/drivers/gpu/drm/radeon/radeon_irq.c index b79ecc4..fbbc0a1 100644 --- a/drivers/gpu/drm/radeon/radeon_irq.c +++ b/drivers/gpu/drm/radeon/radeon_irq.c @@ -76,7 +76,7 @@ int radeon_enable_vblank(struct drm_device *dev, int crtc) default: DRM_ERROR(tried to enable vblank on non-existent crtc %d\n, crtc); - return EINVAL; + return -EINVAL; } } else { switch (crtc) { @@ -89,7 +89,7 @@ int radeon_enable_vblank(struct drm_device *dev, int crtc) default: DRM_ERROR(tried to enable vblank on non-existent crtc %d\n, crtc); - return EINVAL; + return -EINVAL; } } diff --git a/drivers/isdn/hardware/mISDN/hfcmulti.c b/drivers/isdn/hardware/mISDN/hfcmulti.c index faed794..cfb45c9 100644 --- a/drivers/isdn/hardware/mISDN/hfcmulti.c +++ b/drivers/isdn/hardware/mISDN/hfcmulti.c @@ -2846,7 +2846,7 @@ mode_hfcmulti(struct hfc_multi *hc, int ch, int protocol, int slot_tx, int conf; if (ch 0 || ch 31) - return EINVAL; + return -EINVAL; oslot_tx = hc-chan[ch].slot_tx; oslot_rx = hc-chan[ch].slot_rx; conf = hc-chan[ch].conf; diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index d9fa3d2..0a8a6a4 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -2639,7 +2639,7 @@ int dlm_begin_reco_handler(struct o2net_msg *msg, u32 len, void *data, dlm-name, br-node_idx, br-dead_node, dlm-reco.dead_node, dlm-reco.new_master); spin_unlock(dlm-spinlock); - return EAGAIN; + return -EAGAIN; } spin_unlock(dlm-spinlock); diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index d522dcf..5e31447 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -193,7 +193,7 @@ static inline int inet_ehash_locks_alloc(struct inet_hashinfo *hashinfo) hashinfo-ehash_locks = kmalloc(size * sizeof(spinlock_t), GFP_KERNEL); if (!hashinfo-ehash_locks) - return ENOMEM; + return -ENOMEM; for (i = 0; i size; i++) spin_lock_init(hashinfo-ehash_locks[i]); } -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [patch] Fix: 'return -ENOMEM' instead of 'return ENOMEM'
* Joel Becker joel.bec...@oracle.com wrote: On Thu, Nov 12, 2009 at 11:17:58AM -0800, Joel Becker wrote: On Thu, Nov 12, 2009 at 09:10:43AM +0100, Ingo Molnar wrote: 5 out of 8 places look buggy - i.e. more than 60% - a checkpatch warning would avoid real bugs here. (even ignoring the cleanliness effects of using proper error propagation) Cc:-ed affected maintainers. The rightmost column are my observations. Below is the patch fixing these. Acked-by: Joel Becker joel.bec...@oracle.com I take that back. NAK. Sorry, I read the code wrong. This function is just a handler. The caller, dlm_send_begin_reco_message(), expects the positive EAGAIN as a non-error case. Well, at minimum the error code usage is very confused. The dlm_begin_reco_handler gets registered via o2net_register_handler(). Here's where dlm_begin_reco_handler gets registered, followed by dlm_finalize_reco_handler right afterwards: status = o2net_register_handler(DLM_BEGIN_RECO_MSG, dlm-key, sizeof(struct dlm_begin_reco), dlm_begin_reco_handler, dlm, NULL, dlm-dlm_domain_handlers); and right before that dlm_reco_data_done_handler() gets registered: status = o2net_register_handler(DLM_RECO_DATA_DONE_MSG, dlm-key, sizeof(struct dlm_reco_data_done), dlm_reco_data_done_handler, dlm, NULL, dlm-dlm_domain_handlers); And look what dlm_reco_data_done_handler() starts with: int dlm_reco_data_done_handler(struct o2net_msg *msg, u32 len, void *data, void **ret_data) { struct dlm_ctxt *dlm = data; struct dlm_reco_data_done *done = (struct dlm_reco_data_done *)msg-buf; struct dlm_reco_node_data *ndata = NULL; int ret = -EINVAL; if (!dlm_grab(dlm)) return -EINVAL; A negative error code right there. The other event handlers are seem to be similar - dlm_begin_reco_handler() is the odd one out. So while you are right and my patch is wrong and DLM_BEGIN_RECO_MSG processing works right now - at minimum this is a very dangerous/fragile pattern of error code usage. For exampe if anyone uses dlm_begin_reco_handler() to implement a new state machine handler in the future, but uses one of the other event handlers to actually use it, a hard to find bug is introduced. Ingo -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Suresh Siddha suresh.b.sid...@intel.com wrote: On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote: Hmm, at this point I already was more than a week ago: http://marc.info/?l=linux-kernelm=125537770514713w=2 OK, I also modified ioremap() and set_memory_wc() but your patch is just part of what I did there... oops! Sorry I read the later emails in this thread carefully but not the conversation with Thomas Hellstrom. Anyhow, glad that you were on top of this. Thanks. Ingo, can you please queue this patch with a sign-off from Thomas Schlichter too? Please resend the latest with all the acks/signoffs added, with the subject line changed to '[PATCH] ...' and all people Cc:-ed. Thanks, Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Suresh Siddha suresh.b.sid...@intel.com wrote: On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote: What do you think about the latest version of my patch series I just sent? I think we can simplify this by just using mtrr_add_page() and avoid all the complexity that comes with mtrr_add_unaligned(). pci_mmap_range() should call one mtrr_add_page() if the base and size are nicely aligned. Almost all the cases, this is the usage sequence here anyway. yep, keeping it simple is a must. Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Thomas Schlichter thomas.schlich...@web.de wrote: Jan Beulich wrote: Functionality-wise this looks fine to me; whether the core sysfs changes are acceptable I can't judge, though. OK, I think I should have addressed your comments. Unfortunately I had to use a little hack to make pci_mmap_page_range() work for sysfs and proc. I placed a private pointer in the beginning of both per-file private structures. So this pointer can be accessed independent from the caller. I hope this is acceptable. I dropped the ioremap() and set_memory_wc() patches, I could not implement reference counting for them and it may interact too much with existing GPU drivers. Again, this series should not change the current behavior if either MTRR is disabled or PAT is enabled. But it helps in the case that MTRR is enabled and PAT is not available. What should be done now to get this series on the right track? Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still make it work on your testbox? Once that's done i'll look at putting it into the x86 tree for testing. The acks of Suresh/Venki/Jan would be nice to have. Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Jan Beulich jbeul...@novell.com wrote: So if it's OK for Jan, I'll reduce the functionality again and use mtrr_add() instead. Btw. this simply means to drop mtrr_add_unaligned(), all the other parts are still required for reference counting and a proper mtrr_del() on file close. I'm perfectly fine with just handling the aligned case, as long as the code checks that the alignment constraints are met. It could even emit a debug warning if they are not met - that way we'll see how important the unaligned case is. Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Suresh Siddha suresh.b.sid...@intel.com wrote: On Mon, 2009-10-19 at 02:16 -0700, Jan Beulich wrote: Functionality-wise this looks fine to me If we are going to make ioremap() and set_memory_wc() add mtrr's in non-pat case, then we need to delete the added mtrr(s) in the corresponding iounmap() and set_memory_wb() aswell. hmm, this is becoming too complex. The way i915 and other graphics drivers are using set_memory_wc(), it is def a bad idea to start adding mtrr's behind the back for non-pat case. Touching MTRRs beyond working around basic bugs like non-cached RAM sounds like madness. The interactions with PAT are ... countless. Can't we just force PAT option always and we probably don't care about ioremap_wc() on processors were PAT doesn't get enabled because of known errata. We can make PAT configurability dependent on EMBEDDED-y - mind sending a patch for that? Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Thomas Schlichter thomas.schlich...@web.de wrote: We can make PAT configurability dependent on EMBEDDED-y - mind sending a patch for that? I think there already is such a patch in -tip: http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874 Yes :-) Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [RFC Patch] use MTRR for write combining if PAT is not available
* Thomas Schlichter thomas.schlich...@web.de wrote: I don't think this is a good idea, Robert Hancock wrote there may be millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum Y31) : http://marc.info/?l=linux-kernelm=125537136105246w=2 or Perhaps just try to add mtrr only for the pci mmap case like the 4th patch in this series.. I'd prefer this! ;-) Hm, we could perhaps do that - but i think we should only do that on systems that have PAT disabled. Which brings up the question of how to properly QA such a narrow segment of the market. Maybe disabling CONFIG_X86_PAT should enable that logic too. Ingo -- Come build with us! The BlackBerry(R) Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9 - 12, 2009. Register now! http://p.sf.net/sfu/devconference -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.
* Dave Airlie airl...@linux.ie wrote: there's a new build failure: drivers/built-in.o: In function `drm_irq_uninstall': (.text+0xb719e): undefined reference to `vga_client_register' drivers/built-in.o: In function `drm_irq_install': (.text+0xb7309): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_fini': (.text+0xe400f): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_init': (.text+0xe455b): undefined reference to `vga_client_register' with the attached config, introduced with upstream merge 44040f1. At first sight it appears to be due to CONFIG_DRM_RADEON relying on VGA_ARB facilities but this is not expressed in the Kconfig rules. The patch below solves this - but this is just a quick patch, i have not investigated any deeper. Review of the code suggests that i915 has a similar dependency problem - i fixed that too. The way it should work is VGA ARB should be enabled on any platforms we have PCI unless EMBEDDED turns it off, since arbitration of VGA isn't reliant on a drm device, I'm not sure what Kconfig magic this would require, and where it would need to be. This patch should at least allow builds to work until I figure out any Kconfig magic. From 8a874578cbf8b07b988e666c15fa0ba767f3c1cb Mon Sep 17 00:00:00 2001 From: Dave Airlie airl...@redhat.com Date: Tue, 22 Sep 2009 13:53:00 +1000 Subject: [PATCH] vgaarb: wrap the client register API so we can disable VGA ARB. This provides an dummy register function so everything builds if VGA arb is turned off. Signed-off-by: Dave Airlie airl...@redhat.com --- include/linux/vgaarb.h | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index e81c64a..b0feb79 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -41,7 +41,7 @@ * interrupts at any time. */ extern void vga_set_legacy_decoding(struct pci_dev *pdev, - unsigned int decodes); + unsigned int decodes); /** * vga_get - acquire locks VGA resources @@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, struct pci_dev *p2) * They driver will get a callback when VGA arbitration is first used * by userspace since we some older X servers have issues. */ +#if defined(CONFIG_VGA_ARB) int vga_client_register(struct pci_dev *pdev, void *cookie, void (*irq_set_state)(void *cookie, bool state), unsigned int (*set_vga_decode)(void *cookie, bool state)); +#else +static inline int vga_client_register(struct pci_dev *pdev, void *cookie, + void (*irq_set_state)(void *cookie, bool state), + unsigned int (*set_vga_decode)(void *cookie, bool state)); +{ + return 0; +} +#endif Yeah - making APIs config invariant is a good idea in any case, regardless of Kconfig magic. Thanks, Ingo -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.
* Dave Airlie airl...@linux.ie wrote: From 8a874578cbf8b07b988e666c15fa0ba767f3c1cb Mon Sep 17 00:00:00 2001 From: Dave Airlie airl...@redhat.com Date: Tue, 22 Sep 2009 13:53:00 +1000 Subject: [PATCH] vgaarb: wrap the client register API so we can disable VGA ARB. This provides an dummy register function so everything builds if VGA arb is turned off. Signed-off-by: Dave Airlie airl...@redhat.com --- include/linux/vgaarb.h | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) btw., i hope you have not commited this yet, there's a trivial problem with this patch: diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index e81c64a..b0feb79 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -41,7 +41,7 @@ * interrupts at any time. */ extern void vga_set_legacy_decoding(struct pci_dev *pdev, - unsigned int decodes); + unsigned int decodes); /** * vga_get - acquire locks VGA resources @@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, struct pci_dev *p2) * They driver will get a callback when VGA arbitration is first used * by userspace since we some older X servers have issues. */ +#if defined(CONFIG_VGA_ARB) int vga_client_register(struct pci_dev *pdev, void *cookie, void (*irq_set_state)(void *cookie, bool state), unsigned int (*set_vga_decode)(void *cookie, bool state)); +#else +static inline int vga_client_register(struct pci_dev *pdev, void *cookie, + void (*irq_set_state)(void *cookie, bool state), + unsigned int (*set_vga_decode)(void *cookie, bool state)); +{ + return 0; +} +#endif that ';' in the inline prototype is bogus, this wont build. Ingo -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.
here's a patch that works for me. Ingo Signed-off-by: Ingo Molnar mi...@elte.hu --- include/linux/vgaarb.h | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index e81c64a..923f904 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -41,7 +41,7 @@ * interrupts at any time. */ extern void vga_set_legacy_decoding(struct pci_dev *pdev, - unsigned int decodes); + unsigned int decodes); /** * vga_get - acquire locks VGA resources @@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, struct pci_dev *p2) * They driver will get a callback when VGA arbitration is first used * by userspace since we some older X servers have issues. */ +#if defined(CONFIG_VGA_ARB) int vga_client_register(struct pci_dev *pdev, void *cookie, void (*irq_set_state)(void *cookie, bool state), unsigned int (*set_vga_decode)(void *cookie, bool state)); +#else +static inline int vga_client_register(struct pci_dev *pdev, void *cookie, + void (*irq_set_state)(void *cookie, bool state), + unsigned int (*set_vga_decode)(void *cookie, bool state)) +{ + return 0; +} +#endif #endif /* LINUX_VGA_H */ -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
[PATCH] drm: Fix build failure in radeon and i915 drivers
* Ingo Molnar mi...@elte.hu wrote: there's a new build failure: drivers/built-in.o: In function `drm_irq_uninstall': (.text+0xb719e): undefined reference to `vga_client_register' drivers/built-in.o: In function `drm_irq_install': (.text+0xb7309): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_fini': (.text+0xe400f): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_init': (.text+0xe455b): undefined reference to `vga_client_register' with the attached config, introduced with upstream merge 44040f1. At first sight it appears to be due to CONFIG_DRM_RADEON relying on VGA_ARB facilities but this is not expressed in the Kconfig rules. The patch below solves this - but this is just a quick patch, i have not investigated any deeper. Review of the code suggests that i915 has a similar dependency problem - i fixed that too. i've looked some more and drm_irq.o depends on vga-arb too so the patch below is the more complete fix IMHO. Ingo -- From 0186c202fefd70291ef3b29e34543083d24f026d Mon Sep 17 00:00:00 2001 From: Ingo Molnar mi...@elte.hu Date: Mon, 21 Sep 2009 18:12:07 +0200 Subject: [PATCH] drm: Fix build failure in radeon and i915 drivers this build failure: drivers/built-in.o: In function `drm_irq_uninstall': (.text+0xb719e): undefined reference to `vga_client_register' drivers/built-in.o: In function `drm_irq_install': (.text+0xb7309): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_fini': (.text+0xe400f): undefined reference to `vga_client_register' drivers/built-in.o: In function `radeon_device_init': (.text+0xe455b): undefined reference to `vga_client_register' got introduced with upstream merge 44040f1. At first sight it appears to be due to CONFIG_DRM_RADEON relying on VGA_ARB facilities but this is not expressed in the Kconfig rules. drm_irq.o relies on it too - so the whole DRM facilities relies on VGA_ARB. I've added a select to express this dependency. VGA_ARB is not a simple option, it depends on PCI, so in theory select is not safe - but this is a special case since DRM itself depends on PCI too. Cc: torva...@linux-foundation.org Cc: dri-de...@lists.sf.net Cc: Dave Airlie airl...@linux.ie LKML-Reference: 20090921161207.ga9...@elte.hu Signed-off-by: Ingo Molnar mi...@elte.hu --- drivers/gpu/drm/Kconfig |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index e4d971c..6c54422 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -9,6 +9,7 @@ menuconfig DRM depends on (AGP || AGP=n) PCI !EMULATED_CMPXCHG MMU select I2C select I2C_ALGOBIT + select VGA_ARB help Kernel-level support for the Direct Rendering Infrastructure (DRI) introduced in XFree86 4.0. If you say Y here, you need to select -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases
* Thomas Hellstrom thellst...@vmware.com wrote: Bug #13884 The code was incorrectly reserving memtypes using the page virtual address instead of the physical address. Furthermore, the code was not ignoring highmem pages as it ought to. Signed-off-by: Thomas Hellstrom thellst...@vmware.com --- arch/x86/mm/pageattr.c | 30 +- 1 files changed, 21 insertions(+), 9 deletions(-) diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index d4327db..96dd4f6 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -590,9 +590,12 @@ static int __change_page_attr(struct cpa_data *cpa, int primary) unsigned int level; pte_t *kpte, old_pte; - if (cpa-flags CPA_PAGES_ARRAY) - address = (unsigned long)page_address(cpa-pages[cpa-curpage]); - else if (cpa-flags CPA_ARRAY) + if (cpa-flags CPA_PAGES_ARRAY) { + struct page *page = cpa-pages[cpa-curpage]; + if (unlikely(PageHighMem(page))) + return 0; + address = (unsigned long)page_address(page); + } else if (cpa-flags CPA_ARRAY) address = cpa-vaddr[cpa-curpage]; hm, i'm missing a description about how this bug was triggered. How did you end up getting highmem pages to a cpa call? else address = *cpa-vaddr; @@ -696,9 +699,12 @@ static int cpa_process_alias(struct cpa_data *cpa) * No need to redo, when the primary call touched the direct * mapping already: */ - if (cpa-flags CPA_PAGES_ARRAY) - vaddr = (unsigned long)page_address(cpa-pages[cpa-curpage]); - else if (cpa-flags CPA_ARRAY) + if (cpa-flags CPA_PAGES_ARRAY) { + struct page *page = cpa-pages[cpa-curpage]; + if (unlikely(PageHighMem(page))) + return 0; + vaddr = (unsigned long)page_address(page); + } else if (cpa-flags CPA_ARRAY) vaddr = cpa-vaddr[cpa-curpage]; else vaddr = *cpa-vaddr; @@ -1118,7 +1124,9 @@ int set_pages_array_uc(struct page **pages, int addrinarray) int free_idx; for (i = 0; i addrinarray; i++) { - start = (unsigned long)page_address(pages[i]); + if (PageHighMem(pages[i])) + continue; + start = page_to_pfn(pages[i]) PAGE_SHIFT; end = start + PAGE_SIZE; if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL)) goto err_out; ok, that's a bug introduced in .29 but which was latent until now: drivers/char/agp/generic.c now uses it plus (indirectly) a number of AGP drivers, since: commit 07613ba2f464f59949266f4337b75b91eb610795 Author: Dave Airlie airl...@redhat.com Date: Fri Jun 12 14:11:41 2009 +1000 agp: switch AGP to use page array instead of unsigned long array I dont see how it can end up with highmem pages though. All the graphics apperture allocations happen to lowmem AFAICS. Did GEM add the possibility for user pages (highmem amongst them) ending up in that pool? Which code does that? @@ -1131,7 +1139,9 @@ int set_pages_array_uc(struct page **pages, int addrinarray) err_out: free_idx = i; for (i = 0; i free_idx; i++) { - start = (unsigned long)page_address(pages[i]); + if (PageHighMem(pages[i])) + continue; + start = page_to_pfn(pages[i]) PAGE_SHIFT; end = start + PAGE_SIZE; free_memtype(start, end); } @@ -1160,7 +1170,9 @@ int set_pages_array_wb(struct page **pages, int addrinarray) return retval; for (i = 0; i addrinarray; i++) { - start = (unsigned long)page_address(pages[i]); + if (PageHighMem(pages[i])) + continue; + start = page_to_pfn(pages[i]) PAGE_SHIFT; end = start + PAGE_SIZE; free_memtype(start, end); In any case it's a must-have fix for .31. Possibly even a backport tag is needed, in case a distro does a .30 kernel with more recent graphics bits. Ingo -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases
* Dave Airlie airl...@linux.ie wrote: hm, i'm missing a description about how this bug was triggered. How did you end up getting highmem pages to a cpa call? GEM and TTM both allocate page arrays and just pass them to cpa, we don't know what type of pages the allocator gives us back and we really shouldn't have to, so having cpa ignore highmem pages is certainly the right option. GEM just uses shmem code to alloc the pages and TTM has its own allocator. Neither of my questions was answered though: do highmem pages ever get passed in and what were the effects of the bug and how was it noticed? If no highmem pages can be passed in then the right solution is not to ignore them silently but to add a WARN_ONCE() to enforce that they are not used in the future either. If they are used today then i'd like to know where exactly and double check that all the cache attributes are consistent: as a highmem page might be visible to user-space as well or might be mapped via the vmalloc space, etc. And yes, if it happens and all the other mapping aliases are fine then ignoring them is the right solution. Ingo -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases
* Thomas Hellstrom thellst...@vmware.com wrote: Dave Airlie wrote: hm, i'm missing a description about how this bug was triggered. How did you end up getting highmem pages to a cpa call? GEM and TTM both allocate page arrays and just pass them to cpa, we don't know what type of pages the allocator gives us back and we really shouldn't have to, so having cpa ignore highmem pages is certainly the right option. GEM just uses shmem code to alloc the pages and TTM has its own allocator. Yes, Dave is right. Although I'm not 100% sure the TTM code I was using that triggered this has made it into 2.6.31. Old AGP uses (GFP_KERNEL | GFP_DMA32 | __GFP_ZERO), which (correct me if I'm wrong) never hands back highmem pages. This means that Intel's GEM is the only likely user for 2.6.31. (please dont top-post) I'm not sure you folks noticed this bit of my mail: ok, that's a bug introduced in .29 but which was latent until now: drivers/char/agp/generic.c now uses it plus (indirectly) a number of AGP drivers, since: commit 07613ba2f464f59949266f4337b75b91eb610795 Author: Dave Airlie airl...@redhat.com Date: Fri Jun 12 14:11:41 2009 +1000 agp: switch AGP to use page array instead of unsigned long array I dont see how it can end up with highmem pages though. All the graphics apperture allocations happen to lowmem AFAICS. Did GEM add the possibility for user pages (highmem amongst them) ending up in that pool? Which code does that? Ingo -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases
* Thomas Hellstrom thellst...@vmware.com wrote: Ingo Molnar wrote: * Thomas Hellstrom thellst...@vmware.com wrote: Dave Airlie wrote: hm, i'm missing a description about how this bug was triggered. How did you end up getting highmem pages to a cpa call? GEM and TTM both allocate page arrays and just pass them to cpa, we don't know what type of pages the allocator gives us back and we really shouldn't have to, so having cpa ignore highmem pages is certainly the right option. GEM just uses shmem code to alloc the pages and TTM has its own allocator. Yes, Dave is right. Although I'm not 100% sure the TTM code I was using that triggered this has made it into 2.6.31. Old AGP uses (GFP_KERNEL | GFP_DMA32 | __GFP_ZERO), which (correct me if I'm wrong) never hands back highmem pages. This means that Intel's GEM is the only likely user for 2.6.31. (please dont top-post) I'm not sure you folks noticed this bit of my mail: ok, that's a bug introduced in .29 but which was latent until now: drivers/char/agp/generic.c now uses it plus (indirectly) a number of AGP drivers, since: commit 07613ba2f464f59949266f4337b75b91eb610795 Author: Dave Airlie airl...@redhat.com Date: Fri Jun 12 14:11:41 2009 +1000 agp: switch AGP to use page array instead of unsigned long array I dont see how it can end up with highmem pages though. All the graphics apperture allocations happen to lowmem AFAICS. Did GEM add the possibility for user pages (highmem amongst them) ending up in that pool? Which code does that? Ingo Sorry, my bad. The TTM code I tested is not in yet, and after double-checking it looks like Intel's gem is not changing caching policy before binding to AGP. This means the highmem problems that I saw were triggered by a combination of the virtual-physical bugfix and code that's not in the kernel yet, and since it's an optimization of the current code it's not likely to land in 2.6.31. The highmem fixes could thus, AFAICT be stripped out of the patch, unless GFP_DMA32 on a highmem system can actually hand back highmem pages, in which case AGP will not work correctly. As for highmem use in the future, the TTM page arrays are populated using fault(), which means that there will be an overhead ordering the pages so that we can use the set_pages_array() interface instead of set_memory() that we use today. Therefore, if possible, I'd prefer if we could pass arrays containing highmem pages to the set_pages_array() interface. There are no aliased mappings since 1) Any user space mappings to these pages are killed before changing caching policy. 2) The pages are allocated and owned by the driver. 3) kmap_atomic_prot() and vmap() are used to map these pages in kernel space. Code is in ttm_tt.c ttm_bo.c and ttm_bo_util.c ok - thanks for the explanation. Since you intentionally want to use highmem pages (and your use is safe) i concur with your original patch in its entirety - even if that planned highmem use is not upstream yet. Will get it .31-wards ASAP. Ingo -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.6.30-rc6: Reported regressions from 2.6.29
* Oleg Nesterov o...@redhat.com wrote: On 05/17, Ingo Molnar wrote: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13107 Subject : LTP 20080131 causes defunct processes w/2.6.30-rc1 Submitter : Kumar Gala ga...@kernel.crashing.org Date : 2009-04-09 15:43 (38 days old) First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b3bfa0cba867f23365b81658b47efd906830879b References: http://marc.info/?l=linux-kernelm=123929187208953w=4 http://lkml.org/lkml/2009/4/10/193 Handled-By: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com Oleg says in that thread that it's as-designed, and followup questions were not replied to (yet). Yes, I think this is false alarm. Perhaps I missed something, and I am waiting for more info from Kumar, but it looks like ltp was already changed to skip the { PTRACE_ATTACH, 1, EPERM } test on kernels after 2.6.25 Btw., why did the patch (and the revert) make any difference to the test? Timing differences look improbable. Ingo -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.6.30-rc6: Reported regressions from 2.6.29
* Rafael J. Wysocki r...@sisk.pl wrote: Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13325 Subject : 2.6.30-rc kills my box hard - and lockdep chains Submitter : Jonathan Corbet cor...@lwn.net Date : 2009-05-14 15:49 (3 days old) References: http://marc.info/?l=linux-kernelm=124231630701394w=4 Jonathan, there's a side-issue reported there, us running out of lockdep space. Could you try this commit from -tip: d80c19d: lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS (which i'll get to Linus in the next ~24 hours.) Maybe that allows lockdep to report the reason for the deadlock. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13321 Subject : kernel crash with NULL pointer when boot Submitter : Martin Bammer mr...@gmx.at Date : 2009-05-16 12:37 (1 days old) References: http://lkml.org/lkml/2009/5/16/100 that crash is in reiserfs_for_each_xattr(), during sys_unlink()'s xattr teardown. There's been a good deal of reiserfs changes in this cycle - some touch the xattr code as well. Some of them fairly late in the cycle, in the last two weeks: earth4:~/tip gll v2.6.29..linus --since=two-weeks-ago fs/reiserfs/ 2a32ceb: Fix races around the access to -s_options 677c9b2: reiserfs: remove privroot hiding in lookup b82bb72: reiserfs: dont associate security.* with xattr files ab17c4f: reiserfs: fixup xattr_root caching edcc37a: Always lookup priv_root on reiserfs mount and keep it 5a6059c: reiserfs: Expand i_mutex to enclose lookup_one_len Martin, you could try a blind revert of say ... ab17c4f, which looks the most suspect and which is also a rather large commit. Or/and you could try a bisect - perhaps accelerated via: git bisect start fs/reiserfs/ Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13297 Subject : kernel panic - not syncing : fatel exception in interupt Submitter : rob r...@housetosell.net Date : 2009-05-12 19:34 (5 days old) References: http://marc.info/?l=linux-kernelm=124216126903309w=4 tainted crash, but probably legit. It does show some badness in an old-IDE legacy codepath: [c0371865] error_code+0x65/0x6c [c0110155] do_page_fault+0x0/0x1e0 [c027dafc] ide_complete_rq+0xf/0x3b [c02870a0] cdrom_newpc_intr+0x64d/0x6cd [c0286a53] cdrom_newpc_intr+0x0/0x6cd [c027dcc2] ide_intr+0x109/0x161 [c0132298] handle_IRQ_event+0x54/0xc7 [c013354a] handle_level_irq+0x4f/0x85 [c0103df7] handle_irq+0x17/0x20 [c0103da5] do_IRQ+0x2b/0x66 [c0102be9] common_interupt+0x29/0x30 [c048] cmd40x_init+0x2ac/0x38d [c0106db3] default_idle+0x25/0x38 [c01019be] cpu_idle+0x19/0x2d [c0468907] start_kernel+0x23f/0x242 report subject line is too unspecific, it should be changed to something like: legacy IDE cmd40x related bootup crash Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13296 Subject : Lockdep violation at cleanup_workqueue_thread during suspend Submitter : Zdenek Kabelac zdenek.kabe...@gmail.com Date : 2009-05-12 7:59 (5 days old) References: http://marc.info/?l=linux-kernelm=124211522525625w=4 looks like wireless related - the dependency that connects the locks in a wrong way appears to be: - #2 (cfg80211_mutex){+.+.+.}: [80271a64] __lock_acquire+0xc64/0x10a0 [80271f38] lock_acquire+0x98/0x140 [8054e78c] __mutex_lock_common+0x4c/0x3b0 [8054ebf6] mutex_lock_nested+0x46/0x60 [a007e66a] reg_todo+0x19a/0x590 [cfg80211] [80258f18] worker_thread+0x1e8/0x3a0 [8025dc3a] kthread+0x5a/0xa0 [8020d23a] child_rip+0xa/0x20 (havent checked deeper) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13245 Subject : possible circular locking dependency detected Submitter : Miles Lane miles.l...@gmail.com Date : 2009-05-04 16:56 (13 days old) same as #13296 above. (The one above should be merged into this one i guess) Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13126 Subject : BUG: MAX_LOCKDEP_ENTRIES too low! when mounting rootfs Submitter : Alexander Beregalov a.berega...@gmail.com Date : 2009-04-15 12:43 (32 days old) References: http://marc.info/?l=linux-kernelm=123979949820538w=4 should be resolved via the lockdep space extension fix. Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13118 Subject : iptables very slow after commit 784544739a25c30637397ace5489eeb6e15d7d49 Submitter : Jeff Chua jeff.chua.li...@gmail.com Date : 2009-04-10 16:05 (37 days old) References: http://lkml.org/lkml/2009/4/10/111 http://lkml.org/lkml/2009/4/25/83 Handled-By: Eric Dumazet da...@cosmosbay.com solved by: commit 942e4a2bd680c606af0211e64eb216be2e19bf61 Author: Stephen Hemminger shemmin...@vyatta.com Date:
Re: [TIP,regression,i915] /dev/dri/card0 is no longer present
* Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote: On Mon, 2009-03-02 at 13:34 -0800, Ingo Molnar wrote: * Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote: On Sun, 2009-03-01 at 05:48 -0800, Sitsofe Wheeler wrote: On Sun, Mar 01, 2009 at 12:39:48PM +0100, Ingo Molnar wrote: Thanks, i've reverted the commit for now. Could you please send a 'dmesg' from the bootup with the failed driver? By all likelyhood an ioremap failure causes a driver failure. See below (drm says it initalized though...): [0.00] Linux version 2.6.29-rc6-6-g17581ad (@verona) (gcc version 4.2.4) #87 Sun Mar 1 13:42:28 GMT 2009 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] PAT WC disabled due to known CPU erratum. Looks like PAT is getting disabled on this platform. Can you send the output of /proc/cpuinfo please. Does this mean we failed an ioremap()? We still should not fail a device ioremap _ever_. We should just fall back to UC and be done with it. We may emit a warning if we consider it troublesome, but we should never ever break working drivers really. No. We don't break ioremap. We try to fallback on UC_MINUS here. But, there is something wrong happening in that path, that is resulting in this issue. I am looking at that right now. ah, ok. I have found another problem with this particular patch, on one of test systems. So, let us keep the revert until I fix what looks like two different problems here. ok. Ingo -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [TIP,regression,i915] /dev/dri/card0 is no longer present
* Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote: On Sun, 2009-03-01 at 05:48 -0800, Sitsofe Wheeler wrote: On Sun, Mar 01, 2009 at 12:39:48PM +0100, Ingo Molnar wrote: Thanks, i've reverted the commit for now. Could you please send a 'dmesg' from the bootup with the failed driver? By all likelyhood an ioremap failure causes a driver failure. See below (drm says it initalized though...): [0.00] Linux version 2.6.29-rc6-6-g17581ad (@verona) (gcc version 4.2.4) #87 Sun Mar 1 13:42:28 GMT 2009 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] PAT WC disabled due to known CPU erratum. Looks like PAT is getting disabled on this platform. Can you send the output of /proc/cpuinfo please. Does this mean we failed an ioremap()? We still should not fail a device ioremap _ever_. We should just fall back to UC and be done with it. We may emit a warning if we consider it troublesome, but we should never ever break working drivers really. Ingo -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [TIP,regression,i915] /dev/dri/card0 is no longer present
* Sitsofe Wheeler sits...@yahoo.com wrote: Hi, With the latest -tip /dev/dri/card0 is MIA on my EeePC 900. A bisection has tracked the probem down to the following commit: commit 17581ad812a9abb0182260374ef2e52d4a808a64 Author: Venkatesh Pallipadi venkatesh.pallip...@intel.com Date: Tue Feb 24 17:35:14 2009 -0800 gpu/drm, x86, PAT: PAT support for io_mapping_* Make io_mapping_create_wc and io_mapping_free go through PAT to make sure that there are no memory type aliases. Signed-off-by: Venkatesh Pallipadi venkatesh.pallip...@intel.com Signed-off-by: Suresh Siddha suresh.b.sid...@intel.com Cc: Dave Airlie airl...@redhat.com Cc: Jesse Barnes jbar...@virtuousgeek.org Cc: Eric Anholt e...@anholt.net Cc: Keith Packard kei...@keithp.com Signed-off-by: Ingo Molnar mi...@elte.hu Reverting this made /dev/dri/card0 reappear. Here is the bisection log: Thanks, i've reverted the commit for now. Could you please send a 'dmesg' from the bootup with the failed driver? By all likelyhood an ioremap failure causes a driver failure. Ingo -- Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise -Strategies to boost innovation and cut costs with open source participation -Receive a $600 discount off the registration fee with the source code: SFAD http://p.sf.net/sfu/XcvMzF8H -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Adding kmap_atomic_prot_pfn
* Thomas Hellström [EMAIL PROTECTED] wrote: Keith, What you actually are doing here is claiming copyright on code that other people have written, and tighten the export restrictions. kmap_atomic_prot_pfn() appeared long ago in drm git with identical code and purpose, but with different authors, and iounmap_atomic is identical to kunmap_atomic. +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn); you want to use this facility in a binary-only driver? Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: Adding kmap_atomic_prot_pfn (was: [git pull] drm patches for 2.6.27-rc1)
* Linus Torvalds [EMAIL PROTECTED] wrote: On Thu, 23 Oct 2008, Keith Packard wrote: So I'd much rather create a new linux/kmap.h or something. Or just expose this from to asm/fixmap.h or something. Let's not confuse this with highmem, even if the implementation _historically_ was due to that. Sure, we readily admit that we're abusing the highmem API. So we wrapped that abusive code in a pretty package that can be re-implemented however you desire. How (and when) would you like to see this fixed? I'm not entirely sure who wants to own up to owning that particular part of code, and is willing to make kmap_atomic_prot_pfn() also work in the absense of CONFIG_HIGHMEM. yeah, that would be the x86 maintainers. (whoever they are) I suspect it's Ingo, but I also suspect that he'd like to see a tested patch by somebody who actually _uses_ this code. yeah. I already asked Keith yesterday to send one coherent bundle of patches and i think we've got all the code sent already, you also pulled the latest DRM tree - so it all just needs sorting out and testing. [ I can possibly test the end result with a bleeding-edge Xorg which supports the new DRI APIs. (Whether X comes up fine is a regular, necessary and 'fun' component of the x86 regression testing we do anyway. We also tend to notice highmem breakages promptly.) ] So I would suspect that if you guys actually write a patch, and make sure that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send it to Ingo, good things will happen. As to the without CONFIG_HIGHMEM part: making all of this work even without HIGHMEM should literally be a matter of: - remove the '#ifdef CONFIG_HIGHMEM' in asm/fixmap_32.h, so that we have fixmap entries for the temporary kernel mappings regardless (ie FIX_KMAP_BEGIN and FIX_KMAP_END). - move the code for the atomic accesses that use those fixmap entries into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at least just kmap_atomic_prot_pfn() and kunmap_atomic(). and it probably should all work automatically. The kmap_atomic() stuff really should be almost totally independent of CONFIG_HIGHMEM, since it's really much more closely related to fixmaps than HIGHMEM. yeah. I guess there may be some debugging code that depends on HIGHMEM (eg that whole testing for whether a page is a highmem page or not), so it might be a _bit_ more than just moving code around, but I really didn't look closer. Then, there's the issue of 64-bit, and just mapping everything there, and the interface to that. I liked the trivial extension to struct resource to have a cached mapping pointer. So if we can just make it pass resources around and get a page that way (and not even need kmap() on 64-bit architections), that would be good. hm, the thing that i find the weakest aspect of that (despite having suggested this approach) is that the structure and the granularity of the resource tree is really controlled by the hardware environment. We try to map the hardware circumstances accurately at bootstrap / device discovery time, and keep it rather static from that point on. (modulo hotplug and dynamic BAR sizing) And this static property of the resource tree is _good_ IMO, because we can think about it as a hardware property - not something tweaked by the kernel. (the kernel does tweak it when need to be or when the hardware asks for it, but it's more of an exception) That means that if a driver wants to map just a portion of a BAR, (because the hardware itself compresses various different pieces of functionality into the same BAR), this abstraction becomes a limitation on usage. And it might even be _impossible_ to use the simplest form of resource-mem_cache facility with certain types of hardware: say there's a cacheable and an uncacheable window within the same BAR - and we'd map the whole thing as cacheable. The CPU is then entitled to (and will most likely) prefetch into the uncacheable region as well, causing hw lockups, etc. [In this thread it was claimed that S3 chips have such properties.] And tweaking this abstraction to cover those cases, for the ioremap to not be a full mapping of the resource looks a bit hacky/limiting as well: - we'd either have to add 'size', 'offset' properties to the window we cache in the struct resource. (and possibly an array. yuck.) - or we'd have to say dont use this facility with such quirky hardware then. - or we'd have to say split up the struct resource into two pieces artificially, when the driver starts using the resource - which violates the current rather static nature of the resource tree. Maybe i'm overcomplicating it and maybe this last option isnt all that bad after all: as the 'combined' resource in such cases _is_ artificial - and i915+ does not have such problematic BARs to begin with. (keeping separate BARs for the framebuffer is
Re: Adding kmap_atomic_prot_pfn
* Thomas Hellström [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Thomas Hellström [EMAIL PROTECTED] wrote: Keith, What you actually are doing here is claiming copyright on code that other people have written, and tighten the export restrictions. kmap_atomic_prot_pfn() appeared long ago in drm git with identical code and purpose, but with different authors, and iounmap_atomic is identical to kunmap_atomic. +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn); you want to use this facility in a binary-only driver? Ingo At this point I have no such use for it, no. The original user was a MIT style licenced driver. okay, then just to put this question to rest: i wrote the original 32-bit highmem code ~10 years ago. I wrote the first version of fixmap support - in fact i coined the term. I wrote the first version of the atomic-kmap facility as well. All of that code is licensed under the GPLv2. So if anyone wants to make any copyright claims about highmem/kmap/fixmap derivative works, consider it in that light. Regarding this new API variant that Keith wrote: it would be silly and dangerous to export it anywhere but to in-kernel drivers. The API disables preemption on 32-bit and rummages deep in the guts of the kernel as well, uses up a precious resource (fixmap slots), etc. It's internal and we eventually might want to deprecate forms of it and concentrate on the good 64-bit performance side. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Keith Packard [EMAIL PROTECTED] wrote: On Mon, 2008-10-20 at 13:58 +0200, Ingo Molnar wrote: yes but note that by caching the whole mapping on 64-bit we get everything we want: trivially lockless, works from any CPU, can be preempted at will, and there are no ugly INVLPG flushes anywhere. I was assuming that on 64-bit, the map would be created at driver init time and be left in place until the driver closed; if that's what you mean by 'caching', then yes, we should cache the map. correct. 32-bit we should handle as well but not design for it. As long as we get kmap_atomic-like performance, and we get to simplify our code, I'm up for it. okay. So ... mind sending your io_mapping patch as a generic facility? It looks all good to me in its present form, except that it should live in include/linux/io.h, not in the drivers/gpu/drm/i915/io_reserve.h file :-) also, please send at least two patches, so that we can look at (and possibly merge) the generic facility in isolation. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm patches for 2.6.27-rc1
* Keith Packard [EMAIL PROTECTED] wrote: On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote: the _real_ remapping in a graphics aperture happens on the GPU level anyway, you manage an in-RAM GPU pagetable that just works like an IOMMU, correct? Yes, a one-level linear MMU which uses BIOS-reserved memory. So, at least for a prototype, on 64-bit we can just use ioremap_wc and hold the mapping while the driver is open? Is there any huge benefit to using the kernel mapping? correct. There's no benefit to using the kernel mapping - and we can add 2MB pages support to ioremap just fine and then you'll benefit from it automatically. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Ingo Molnar [EMAIL PROTECTED] wrote: very nice! I think we need a somewhat different abstraction though. Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move to generic code. Secondly, wouldnt the right abstraction be to attach this functionality to 'struct resource' ? [or at least create a second struct that embedds struct resource] this abstraction is definitely not a PCI thing and not a detached-from-everything thing, it's an IO resource thing. We could make it a property of struct resource: struct resource { resource_size_t start; resource_size_t end; const char *name; unsigned long flags; struct resource *parent, *sibling, *child; + void *mapping; }; The APIs would be: int io_resource_init_mapping(struct resource *res); void io_resource_free_mapping(struct resource *res); void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset); void io_resource_unmap(struct resource *res, void *kaddr); Note how simple and consistent it all gets: IO resources already know their physical location and their size limits. Being able to cache an ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a relatively simple and natural extension to the concept. i think that would be quite acceptable - and the APIs could just transparently work on it. This would also allow the PCI code to automatically unmap any cached mappings from resources, when the driver deinitializes. Linus, Jesse, what do you think? the downsize would be that we'd attach a runtime property to the IORESOURCE_MEM resource tree - which is a fairly static thing right now, after the point where we finalize the resource tree. (modulo device/bridge hotplug variances) Another downside is that we might not want to map the whole thing. I.e. the structure of the IO memory space we want to map by drivers might be different from how it looks like in the resource tree. the concept of introducing resource-mapping does not feel _that_ wrong though and has a couple of upsides: it could act as a natural mapping type serializer for example and drivers wouldnt have to explicitly manage ioremap results - they could just use the resource descriptor directly and read and write to/from it. readl/writel could be extended to operate on the resource descriptor transparently, getting rid of a source of resource mismatches and overmapping, etc. etc. We could even safety check IO space accesses this way. and we'd get rid of the complication that your APIs introduced, the need to introduce a separate io_mapping type, etc. Dunno, i might be missing some obvious downside why this wasnt done like that until now. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Keith Packard [EMAIL PROTECTED] wrote: On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote: Note how simple and consistent it all gets: IO resources already know their physical location and their size limits. Being able to cache an ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a relatively simple and natural extension to the concept. I'm not sure I see any value in caching mappings here; we're mostly interested in copying lots of data onto the card and so we use a lot of different mappings; atomic mappings are easy to use, and efficient enough. yes but note that by caching the whole mapping on 64-bit we get everything we want: trivially lockless, works from any CPU, can be preempted at will, and there are no ugly INVLPG flushes anywhere. you'll even get 2MB mappings automatically, if the BAR is aligned and sized correctly. 32-bit we should handle as well but not design for it. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Eric Anholt [EMAIL PROTECTED] wrote: The APIs would be: int io_resource_init_mapping(struct resource *res); void io_resource_free_mapping(struct resource *res); void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset); void io_resource_unmap(struct resource *res, void *kaddr); Note how simple and consistent it all gets: IO resources already know their physical location and their size limits. Being able to cache an ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a relatively simple and natural extension to the concept. i think that would be quite acceptable - and the APIs could just transparently work on it. This would also allow the PCI code to automatically unmap any cached mappings from resources, when the driver deinitializes. Linus, Jesse, what do you think? i think we need to finalize the API names and their abstraction level, and then could even merge those APIs into v2.6.28 on a fast path, to enable you to use it. It does not interact with anything else so it should be safe to do. This API needs the cacheability control, which I don't see in it currently. [...] yes, these two should do the trick: int io_resource_init_mapping_wc(struct resource *res); int io_resource_init_mapping_wb(struct resource *res); Second, we need to know when we're doing a mapping whether we're affected by atomic scheduling restrictions. Right now our plan has been to try doing page-by-page io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if we fail at that at some point (map returns NULL or we get a partial completion from copy_from_user_inatomic) then fall back to io_map_wc() and copy_from_user() the whole thing at once. That gets us good performance on both x86 with highmem and x86-64, and not too shabby performance on x86 non-highmem. that gets ugly very fast. I think we should not use atomic kmaps but NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically atomic kmaps but without the preemption restrictions). We could take/drop the mutex and statistically you'll stay on the same CPU and wont ever contend on that lock in practice. Also, while it's rare, there have been graphics cards (looking at you, S3) where BARs were expensive for some reason and they stuffed both the framebuffer and registers into one PCI BAR, where you want the FB to be WC and the registers to be UC. Not sure if they would be supportable with this API or not. And if it's not, I'm not sure how much we care to design for them, but it's something to potentially consider. yes, this is a weakness of this API - you cannot mix multiple cachability domains within the same BAR. and that can happen on non-graphics as well: some storage controller that has regular control registers in one portion of the BAR, which all need to be consistently accessed via UC and properly POST-ed - while it could also have some large mailbox structure at the end of the BAR, which could be mapped both cacheable or perhaps WC. So ... i guess we can go back to the io_mapping API proposed by Keith, but not make it atomic kmap based but fixmap + mutex based - for good 32-bit performance. (and the fixmap would not be used on 64-bit at all) Finally, I'm confused by the pfn and offset args to io_resource_map, when I expected something parallel to ioremap but with our resource arg added. ok. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Ingo Molnar [EMAIL PROTECTED] wrote: that gets ugly very fast. I think we should not use atomic kmaps but NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically atomic kmaps but without the preemption restrictions). We could take/drop the mutex and statistically you'll stay on the same CPU and wont ever contend on that lock in practice. peterz pointed it out that we need to be careful with the TLBs in that case. I think it's solvable: a small non-default special-case in switch_to() would invlpg any pending such page. (and no two such mappings could be held at the same time) So as long as we dont leave the CPU we wont ever do TLB flushes, and the flush will be local even if we do. Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)
* Keith Packard [EMAIL PROTECTED] wrote: On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote: On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote: Mind sending patches for this? :-) Here's a patch for the i915 driver that includes the new API. Tested on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in the i915 directory for this patch, but it should go elsewhere. The new APIs are: io_reserve_create_wc io_reserve_free io_reserve_map_atomic_wc io_reserve_unmap_atomic io_reserve_map_wc io_reserve_unmap very nice! I think we need a somewhat different abstraction though. Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move to generic code. Secondly, wouldnt the right abstraction be to attach this functionality to 'struct resource' ? [or at least create a second struct that embedds struct resource] this abstraction is definitely not a PCI thing and not a detached-from-everything thing, it's an IO resource thing. We could make it a property of struct resource: struct resource { resource_size_t start; resource_size_t end; const char *name; unsigned long flags; struct resource *parent, *sibling, *child; + void *mapping; }; The APIs would be: int io_resource_init_mapping(struct resource *res); void io_resource_free_mapping(struct resource *res); void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset); void io_resource_unmap(struct resource *res, void *kaddr); Note how simple and consistent it all gets: IO resources already know their physical location and their size limits. Being able to cache an ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a relatively simple and natural extension to the concept. i think that would be quite acceptable - and the APIs could just transparently work on it. This would also allow the PCI code to automatically unmap any cached mappings from resources, when the driver deinitializes. Linus, Jesse, what do you think? i think we need to finalize the API names and their abstraction level, and then could even merge those APIs into v2.6.28 on a fast path, to enable you to use it. It does not interact with anything else so it should be safe to do. (i'd not suggest to merge the i915 bits just yet - but that's obviously not my call.) Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm patches for 2.6.27-rc1
* Linus Torvalds [EMAIL PROTECTED] wrote: [ The *non-atomic* kmap() functions are fairly high-overhead, in that they want to keep track of cached mappings and remember page addresses etc. So those are the ones we don't want to support for non-HIGHMEM setups. But the atomic kmaps are pretty simple, and really only need some trivial FIXMAP support. We could easily extend it for x86-64, methinks, and do it for x86-32 even when we don't do HIGHMEM. Ingo? ] agreed, and there's certainly no resistance from the x86 architecture side to extend our mapping APIs in such directions. But i think the direction of the new GEM code is subtly wrong here, because it tries to manage memory even on 64-bit systems. IMO it should just map the _whole_ graphics aperture (non-cached) and be done with it. There's no faster method at managing pages than the CPU doing a TLB fill from pagetables. The only real API need i see is on 32-bit: with a 1GB or 2GB graphics aperture we just cannot map that permanently, so kmap_atomic() is a necessity. We can certainly extend that to non-highmem as well. But this should be an ad-hoc transitionary thing for 32-bit, and on 64-bit we really should not be using any form of kmap. Especially with large vertex buffers or textures, mapping a lot of pages via kmap is not going to be trivial overhead - even if INVLPG is faster than a full TLB flush, it's still on the order of 100-200 cycles - and with a lot of pages that mounts up quickly. And if i understood your workload correctly you want to do tens of thousand of map/unmap/remap events per frame generated - depending on the type of the 3D app/engine. Or am i missing something subtle? Why do you want the overhead of kmap on 64-bit? Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [git pull] drm patches for 2.6.27-rc1
* Keith Packard [EMAIL PROTECTED] wrote: On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote: But i think the direction of the new GEM code is subtly wrong here, because it tries to manage memory even on 64-bit systems. IMO it should just map the _whole_ graphics aperture (non-cached) and be done with it. There's no faster method at managing pages than the CPU doing a TLB fill from pagetables. Yeah, we're stuck thinking that we can't map the aperture because it's too large, but with a 64-bit kernel, we should be able to keep it mapped permanently. Of course, the io_reserve_pci_resource and io_map_atomic functions could do precisely that, as kmap_atomic does on non-HIGHMEM systems today. okay, so basically what we need is a shared API that does per page kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the impression that you were suggesting to extend kmap_atomic() to 64-bit - which would be wrong. So, in terms of the 4 APIs you suggest: struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev, int bar, int prot); void io_mapping_free(struct io_mapping *mapping); void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn); void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn); here is what we'd do on 64-bit: - io_reserve_pci_resource() would just do an ioremap(), and would save the ioremap-ed memory into struct io_mapping. - io_mapping_free() does the iounmap() - io_map_atomic(): just arithmetics, returns mapping-base + pfn - no TLB activities at all. - io_unmap_atomic(): NOP. it's as fast as it gets: zero overhead in essence. Note that it's also shared between all CPUs and there's no aliasing trouble. And we could make it even faster: if you think we could even use 2MB TLBs for the _linear_ ioremap()s here, hm? There's plenty of address space on 64-bit so we can align to 2MB just fine - and aperture sizes are 2MB sized anyway. Or we could go one step further and install these aperture mappings into the _kernel linear_ address space. That would be even faster, because we'd have a constant offset. We have the (2MB mappings aware) mechanism for that already. (Yinghai Cc:-ed - he did a lot of great work to generalize this area.) (In fact if we installed it into the linear kernel address space, and if the aperture is 1GB aligned, we will automatically use gbpages for it. Were Intel to support gbpages in the future ;-) the _real_ remapping in a graphics aperture happens on the GPU level anyway, you manage an in-RAM GPU pagetable that just works like an IOMMU, correct? on 32-bit we'd have what you use in the GEM code today: - io_reserve_pci_resource(): a NOP in essence - io_mapping_free(): a NOP - io_map_atomic(): does a kmap_atomic(pfn) - io_unmap_atomic(): does a kunmap_atomic(pfn) so on 32-bit we have the INVLPG TLB overhead and preemption restrictions - but we knew that. We'd have to allow atomic_kmap() on non-highmem as well but that's fair. Mind sending patches for this? :-) Ingo - This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100url=/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: CFS review
* Rene Herman [EMAIL PROTECTED] wrote: Realised the BUGs may mean the kernel DRM people could want to be in CC... and note that the schedule() call in there is not part of the crash backtrace: Call Trace: [c10fa54b] drm_lock+0x255/0x2de [c10ff28d] mga_dma_buffers+0x0/0x2e3 [c10f87fc] drm_ioctl+0x142/0x18a [c1005973] do_IRQ+0x97/0xb0 [c10f86ba] drm_ioctl+0x0/0x18a [c10f86ba] drm_ioctl+0x0/0x18a [c105b0d7] do_ioctl+0x87/0x9f [c105b32c] vfs_ioctl+0x23d/0x250 [c11b533e] schedule+0x2d0/0x2e6 [c105b372] sys_ioctl+0x33/0x4d [c1003d1e] syscall_call+0x7/0xb it just happened to be on the kernel stack. Nor is the do_IRQ() entry real. Both are frequent functions (and were executed recently) that's why they were still in the stackframe. Ingo - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ -- ___ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Fernando Pablo Lopez-Lezcano [EMAIL PROTECTED] wrote: Once I boot I can get, for example, two instances of glxgears happily going on at once. But if I start jack with realtime priority from within qjackctl I get a hard lock very soon, the machine does not even react to the sysrq magic key. _But_, you can see the two glxgears instances still drawing to the screen... but they seem to take turns, first one, then the other and so on and so forth (frequency of switch not perfectly regular, between 0.5 and 1 second, I'd say). Sorry the description is not very precise. if you want them to timeslice more finegrained, try to run them with nice +19 - does that make them smoother? Ingo --- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=5047alloc_id=10808op=click -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Dave Airlie [EMAIL PROTECTED] wrote: I can't find the lock from a quick code inspection.. I'll add the reschedule to a test drm and I'll build my kernel with all the debugging on... there doesnt seem to be any further lock other than the BKL: 0001 0.558ms (+0.000ms): delay_tsc (__delay) 0001 0.560ms (+0.001ms): __const_udelay (radeon_do_wait_for_fifo) 0001 0.560ms (+0.000ms): __delay (radeon_do_wait_for_fifo) 0001 0.560ms (+0.000ms): delay_tsc (__delay) 0001 0.561ms (+0.001ms): __const_udelay (radeon_do_wait_for_fifo) 0001 0.561ms (+0.000ms): __delay (radeon_do_wait_for_fifo) 0001 0.561ms (+0.000ms): delay_tsc (__delay) so unless the BKL is being relied on, it's safe to add the conditional reschedule. Ingo --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
i'll put the patch below into the -P8 patch. (change voluntary_resched() to cond_resched() if you apply this to a vanilla kernel.) Ingo --- linux/drivers/char/drm/drm_os_linux.h.orig +++ linux/drivers/char/drm/drm_os_linux.h @@ -14,7 +14,7 @@ #define DRM_ERR(d) -(d) /** Current process ID */ #define DRM_CURRENTPID current-pid -#define DRM_UDELAY(d) udelay(d) +#define DRM_UDELAY(d) do { voluntary_resched(); udelay(d); } while (0) /** Read a byte from a MMIO region */ #define DRM_READ8(map, offset) readb(((unsigned long)(map)-handle) + (offset)) /** Read a dword from a MMIO region */ --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Jon Smirl [EMAIL PROTECTED] wrote: Previously all of the IOCTL calls were protected by the Big Kernel Lock. If we break that aren't we allowing multiple callers into the IOCTL code on SMP machines that weren't allowed in before? Doesn't that mean that we have to check everything to make sure it is SMP safe? as Alan mentioned it before, the BKL did not protect much, since the DRM code was freely doing copying from and to userspace, and hence dropping the BKL at those places. but yes, best would be to drop the BKL as soon as possible and check everything. Ingo --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Mike Mestnik [EMAIL PROTECTED] wrote: as Alan mentioned it before, the BKL did not protect much, since the DRM code was freely doing copying from and to userspace, and hence dropping the BKL at those places. Add sleeping(usleep) to this list, no? The big Q I have is why are IOCTL's protected by the BKL if it gets droped so easily? usleep is for short delays, it uses busy-looping. msleep is the one that schedules away. but yes, best would be to drop the BKL as soon as possible and check everything. By check everything, you mean make sure shared resources(memory) are used safely? yes. This means some sort of locking has to be added (if it's not present already). The easiest solution would be to add a semaphore (mutex) to each device and use that to serialize multiple users and API calls. A mutex doesnt get dropped when you schedule away. Ingo --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Jon Smirl [EMAIL PROTECTED] wrote: --- Ingo Molnar [EMAIL PROTECTED] wrote: the cleanest and highest performing solution would be to have an interrupt source for 3D completion - do you have such an interrupt handler handy? If yes then you can sleep precisely up to completion: Interrupts are not a good solution. The hardware would generate millions of them per second. This is the same problem network cards have, you can interrupt the machine to death. you'd only have to enable interrupt generation when you are not busy-polling. In 99.9% of the cases (when !need_resched()) you could busy poll as much as you want, and keep IRQ generation off. Very likely IRQ generation can be turned on/off via a single PCI write on most 3D hardware. Anyway, IRQ generation would just be a 'nice' thing. But the following property is a must: I would expect the best solution is to make a few passes through the loop delaying a couple usec to catch the common case of quick completion. Then switch to doing need_resched(). no. If need_resched() is set then the code *must* try to schedule ASAP. There is no excuse for not doing so - 'performance will suffer' is not a correct argument becase _if_ need_resched() is true then the system (and the user) doesnt want the 3D app to run right now. There will be no performance difference to the 3D app, since by having high latencies the DRI code does not give any extra performance to 3D apps, it only increases the scheduling latency! The timeslices of the 3D app are used up depending on how long the 3D app runs - so if it burns cycles busy-polling it will in fact get less CPU time on average. (if the CPU is contended.) Anyway, need_resched() set is not a common condition, and if it happens then kernel code _must_ react. (In fact most kernel code will be preempted right away - but the DRI code runs under spinlocks.) So the correct solution is to add something like this to _at least_ the busy-polling code: if (need_resched()) { ... drop locks ... cond_resched(); ... reacquire locks ... } or, if there's a single spinlock held (the BKL doesnt count) then: cond_resched_lock(driver-lock); (but the locking obviously has to be correct and safe.) ok? Ingo --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: 2.4.8.1+P6: radeon, dri xruns
* Jon Smirl [EMAIL PROTECTED] wrote: A better solution might be to loop twenty times to pick up the very short commands. After 20us switch to a loop that allows the kernel to schedule. You don't want to immediately schedule since that will kill graphics performance. You can busy-loop just fine as long as you stay preemptible. Also, if need_resched() is true then you should get out of your locks ASAP. the _ability_ to preempt quickly should not affect 3D performance in any noticeable way: if the only app running is a 3D app, or it has a high priority (because the user wants good 3D performance) then it will perform just fine. Conversely, if an audio app has high priority then it must see low latencies. So this is not an act of balancing, this is an act of _enabling_ low latencies. Looping in the kernel for even 1 msec while holding locks is very, very bad and should be avoided at every cost. In fact you should shoot for as quick preemptability as possible whenever you are looping for 3D completion, because your code is not doing anything productive in fact. 'quick preemptability' != scheduling away, it only means 'scheduling away if need_resched() is set'. What's the right way to write a loop like this that meets the above requirements and also satisfies the audio needs? for ( i = 0 ; i dev_priv-usec_timeout ; i++ ) { if ( !(RADEON_READ( RADEON_RBBM_STATUS ) RADEON_RBBM_ACTIVE) ) { radeon_do_pixcache_flush( dev_priv ); return 0; } DRM_UDELAY( 1 ); add: if (need_resched()) { break locks ... cond_resched(); reacquire locks ... } and make sure the breaking of the locks is safe at that point (no other 3D application should be able to interfere with your polling, etc.). this will solve all the audio complaints. as a bonus, if you want to be 'nice' to lower prio processes as well, you can also do: if ((i 127) == 127) { drop locks ... msleep(1); reqacquire locks ... } this will busy-loop up to 127 usecs, and will sleep for 1 msec afterwards. the cleanest and highest performing solution would be to have an interrupt source for 3D completion - do you have such an interrupt handler handy? If yes then you can sleep precisely up to completion: if ((i 127) == 127) { drop locks ... wait_event(driver-wait_queue, RADEON_READ(RADEON_RBBM_STATUS)); reqacquire locks ... break; } and in the IRQ handler, do a: wake_up(driver-wait_queue); the wakeup doesnt even have to be precise - i.e. you can wake up early or for all events - wait_event()'s second field, the 'is the wait done' takes care of it and re-suspends the task if the 3D engine is still busy. (to implement a timeout and signal-awareness as well you can use wait_event_interruptible_timeout().) Ingo --- SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33 Save 50% off Retail on Ink Toner - Free Shipping and Free Gift. http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285 -- ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] mesa 4.1 branch / NO go on 2.5.48
On Wed, 20 Nov 2002, Linus Torvalds wrote: On Thu, 21 Nov 2002, Dieter Nützel wrote: Option AGPFastWrite 1 This just makes the machine lock up for me at X startup. same here, instant and nasty lockup. Ingo --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
[Dri-devel] Caught signal 11, with bleeding-edge, radeon 8500 LE
i'm getting segfaults with the latest bleeding-edge XFree86 binary. the system is using the latest Red Hat rawhide rpms, i've used the latest extras snapshot (and the libxaa from the extras tarball), plus the r200-20021116 radeon driver. I've attached: - rpm -qa output - the XF86Config - strace -f output of the crash [not attached due to size.] - XFree86.0.log - kernel log messages here's the md5 checksums of the extras and installed files, everything is in place i think: # md5sum XFree86 /usr/X11R6/bin/XFree86 7c1cb25d8b5a42c9e1e6b30530b9d469 XFree86 7c1cb25d8b5a42c9e1e6b30530b9d469 /usr/X11R6/bin/XFree86 # md5sum libpcidata.a /usr/X11R6/lib/modules/libpcidata.a 663b45e969a6454a7a7f4d28b30d0784 libpcidata.a 663b45e969a6454a7a7f4d28b30d0784 /usr/X11R6/lib/modules/libpcidata.a # md5sum libxaa.a /usr/X11R6/lib/modules/libxaa.a ea563cb34e447cab7fe5800d29c6df1e libxaa.a ea563cb34e447cab7fe5800d29c6df1e /usr/X11R6/lib/modules/libxaa.a X appears to crash when the first X client tries to connect to the server - ie. if i start up X standalone it works, but when i start the first xterm, it crashes. if i use the orignal XFree86 binary (ie. XFree86-4.2.99.2-0.20021110.3) then DRI works, sortof. (simple things like glxgears work, but there are a number of bad artifacts in Ultima1 techdemo 251 for example. Other things, like tuxracer seems to work fine.) Btw., if i use the libxaa in the extras package then a more complex app, like the Ultima1 framerates decrases visibly - is this due to debugging enabled in the extras libxaa, while the rawhide version not? let me know if there's anything else i should/could provide, to get this debugged. Ingo rpm-qa.out.bz2 Description: BZip2 compressed data XF86Config.bz2 Description: BZip2 compressed data XFree86.0.log.bz2 Description: BZip2 compressed data agpgart: Maximum main memory to use for agp memory: 439M agpgart: Detected Intel i845 chipset agpgart: AGP aperture is 64M @ 0xf800 [drm] AGP 0.99 on Intel i845 @ 0xf800 64MB [drm] Initialized radeon 1.7.0 20020828 on minor 0 [drm] Loading R200 Microcode
Re: [Dri-devel] R200 testing
On Sun, 14 Jul 2002, David Bronaugh wrote: It seems that the kernel module was installed correctly, etc. He used a binary snapshot (20020708 or so). 0709 had some problems for me (tuxracer segfaulting occasionally), but 0712 works well so far. glxgears ran, gave about 2800 fps in a window of default size. However, when the window was enlarged to fullscreen, the framerate dropped to about 200fps. That doesn't sound quite normal. here it goes to about 2500 fps default, 300 fps full-size. But this happens with other 3d accelerated cards as well - the full size window is at least 10 times larger, so a 1:10 performance drop does not look abnormal. Next, tuxracer ran, but the menus were basically unusable. Sounds like he hit a whack of software fallbacks - frame rate on menus must have been less than 1fps. He didn't have the patience to get into the game. this happens to me as well, but only if i restart the system after installing the beta r200 driver. Re-running the install.sh script solves the problem and the tuxracer menusystem is as fast as expected. I suspect this is a problem of default dri kernel modules not being overwritten (?), but the install reloads them properly. It's not a big issue though. Apparently there were also some artifacts in 2d, but those are of less concern. havent seen any 2d artifacts so far. Ingo --- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel