Re: 2.6.35-rc4-git3: Reported regressions from 2.6.34

2010-07-09 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

  Bug-Entry ? ? ? : http://bugzilla.kernel.org/show_bug.cgi?id=16346
  Subject ? ? ? ? : 2.6.35-rc3-git8 - include/linux/fdtable.h:88 invoked 
  rcu_dereference_check() without protection!
  Submitter ? ? ? : Miles Lane miles.l...@gmail.com
  Date ? ? ? ? ? ?: 2010-07-04 22:04 (5 days old)
  Message-ID ? ? ?: 
  aanlktinof0k28rk4rmr66aubxcrl2rfa5zearj1lq...@mail.gmail.com
  References ? ? ?: http://marc.info/?l=linux-kernelm=127828107815930w=2
 
 I'm not entirely sure if these RCU proving things should count as 
 regressions.

Generally not - and we've delayed at least one more complex (cgroups) fix to 
v2.6.36 because the patch itself was riskier than the warning.

Still most of the warning fixes turned out to be simple, so we merged the 
very-low-risk ones and right now we seem to be on top of them.

But in general the default rule is that we delay these fixes to v2.6.36.

 Sure, the option to enable RCU proving is new, but the things it reports 
 about generally are not new - and they are usually not even bugs in the 
 sense that they necessarily cause any real problems.
 
 That particular one is in the single-thread optimizated case for fget_light, 
 ie
 
 if (likely((atomic_read(files-count) == 1))) {
 file = fcheck_files(files, fd);
 
 where I think it should be entirely safe in all ways without any locking. So 
 I think it's a false positive too.

Yeah, it's a bit like with lockdep (and it's a bit like with compiler warning 
fixes): we had to punch through a large stack of false positives that 
accumulated in the past 10 years.

( Because real bugs eventually get fixed, while false positives always just
  accumulate. So almost by definition we always start with a very assymetric
  collection of warnings and a large stack of false positives. )

Having said that, it appears we got most of the false positives and are 
beginning to be in a more representative equilibrium now. If v2.6.35 isnt 
going to be warning-free then v2.6.36 certainly will be and new warnings will 
have a much higher likelyhood of being real (and new) bugs (not just 
accumulated false-positives).

Ingo

--
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

2010-06-11 Thread Ingo Molnar

* Jens Axboe jax...@fusionio.com wrote:

 On 2010-06-11 10:32, Ingo Molnar wrote:
  
  * Jens Axboe jax...@fusionio.com wrote:
  
  On 2010-06-09 03:53, Linus Torvalds wrote:
  Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=16129
  Subject  : BUG: using smp_processor_id() in preemptible [] code: 
  jbd2/sda2
  Submitter: Jan Kreuzer kontrolla...@gmx.de
  Date : 2010-06-05 06:15 (4 days old)
 
  This seems to have been introduced by
 
commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
Author: Ingo Molnar mi...@elte.hu
Date:   Sat Nov 8 17:05:38 2008 +0100
 
sched: optimize sched_clock() a bit
  
sched_clock() uses cycles_2_ns() needlessly - which is an 
  irq-disabling
variant of __cycles_2_ns().
  
Most of the time sched_clock() is called with irqs disabled already.
The few places that call it with irqs enabled need to be updated.
  
Signed-off-by: Ingo Molnar mi...@elte.hu
 
  and this seems to be one of those calling cases that need to be updated..
 
  Ingo? The call trace is:
 
BUG: using smp_processor_id() in preemptible [] code: 
  jbd2/sda2-8/337
caller is native_sched_clock+0x3c/0x68
Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
Call Trace:
[812362c5] debug_smp_processor_id+0xc9/0xe4
[8101059d] native_sched_clock+0x3c/0x68
[8101043d] sched_clock+0x9/0xd
[81212d7a] blk_rq_init+0x97/0xa3
[81214d71] get_request+0x1c4/0x2d0
[81214ea6] get_request_wait+0x29/0x1a6
[81215537] __make_request+0x338/0x45b
[812147c2] generic_make_request+0x2bb/0x330
[81214909] submit_bio+0xd2/0xef
[811413cb] submit_bh+0xf4/0x116
[81144853] block_write_full_page_endio+0x89/0x96
[81144875] block_write_full_page+0x15/0x17
[8119b00a] ext4_writepage+0x356/0x36b
[810e1f91] __writepage+0x1a/0x39
[810e32a6] write_cache_pages+0x20d/0x346
[810e3406] generic_writepages+0x27/0x29
[811ca279] journal_submit_data_buffers+0x110/0x17d
[811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d
[811d0cba] kjournald2+0x147/0x37a
 
  (from the bugzilla thing)
 
  This should be fixed by commit 28f4197e which was merged on friday.
  
  Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e. With 
  some 
  configs i get bad spinlock warnings during bootup:
  
  [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750 
  usecs
  [   28.972003] calling  b44_init+0x0/0x55 @ 1
  [   28.976009] bus: 'pci': add driver b44
  [   28.976374]  sda:
  [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
  [   28.98]  lock: 7e1c5bbc, .magic: , .owner: none/-1, 
  .owner_cpu: 0
  [   28.98] Pid: 117, comm: async/0 Not tainted 
  2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
  [   28.98] Call Trace:
  [   28.98]  [41ba6d55] ? printk+0x20/0x24
  [   28.98]  [4134b7b7] spin_bug+0x7c/0x87
  [   28.98]  [4134b853] do_raw_spin_lock+0x1e/0x123
  [   28.98]  [41ba92ca] ? _raw_spin_lock_irqsave+0x12/0x20
  [   28.98]  [41ba92d2] _raw_spin_lock_irqsave+0x1a/0x20
  [   28.98]  [4133476f] blkiocg_update_io_add_stats+0x25/0xfb
  [   28.98]  [41335dae] ? cfq_prio_tree_add+0xb1/0xc1
  [   28.98]  [41337bc7] cfq_insert_request+0x8c/0x425
  [   28.98]  [41ba9271] ? _raw_spin_unlock_irqrestore+0x17/0x23
  [   28.98]  [41ba9271] ? _raw_spin_unlock_irqrestore+0x17/0x23
  [   28.98]  [41329225] elv_insert+0x107/0x1a0
  [   28.98]  [41329354] __elv_add_request+0x96/0x9d
  [   28.98]  [4132bb8c] ? drive_stat_acct+0x9d/0xc6
  [   28.98]  [4132dd64] __make_request+0x335/0x376
  [   28.98]  [4132c726] generic_make_request+0x336/0x39d
  [   28.98]  [410ad422] ? kmem_cache_alloc+0xa1/0x105
  [   28.98]  [41089285] ? mempool_alloc_slab+0xe/0x10
  [   28.98]  [41089285] ? mempool_alloc_slab+0xe/0x10
  [   28.98]  [41089285] ? mempool_alloc_slab+0xe/0x10
  [   28.98]  [41089347] ? mempool_alloc+0x57/0xe2
  [   28.98]  [4132c804] submit_bio+0x77/0x8f
  [   28.98]  [410d2cbc] ? bio_alloc_bioset+0x37/0x94
  [   28.98]  [410ceb90] submit_bh+0xc3/0xe2
  [   28.98]  [410d1474] block_read_full_page+0x249/0x259
  [   28.98]  [410d31fb] ? blkdev_get_block+0x0/0xc6
  [   28.98]  [41087bfa] ? add_to_page_cache_locked+0x94/0xb5
  [   28.98]  [410d3d92] blkdev_readpage+0xf/0x11
  [   28.98]  [41088823] do_read_cache_page+0x7d/0x11a
  [   28.98]  [410d3d83] ? blkdev_readpage+0x0/0x11
  [   28.98]  [410888f4] read_cache_page_async+0x16/0x1b
  [   28.98]  [41088904] read_cache_page+0xb/0x12
  [   28.98]  [410e80e1] read_dev_sector+0x2a/0x63
  [   28.98]  [410e92e8] adfspart_check_ICS+0x2e/0x166
  [   28.98]  [41ba6d55] ? printk+0x20/0x24
  [   28.98

Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

2010-06-09 Thread Ingo Molnar

* Jens Axboe jax...@fusionio.com wrote:

 On 2010-06-09 03:53, Linus Torvalds wrote:
  Bug-Entry  : http://bugzilla.kernel.org/show_bug.cgi?id=16129
  Subject: BUG: using smp_processor_id() in preemptible [] code: 
  jbd2/sda2
  Submitter  : Jan Kreuzer kontrolla...@gmx.de
  Date   : 2010-06-05 06:15 (4 days old)
  
  This seems to have been introduced by
  
  commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
  Author: Ingo Molnar mi...@elte.hu
  Date:   Sat Nov 8 17:05:38 2008 +0100
  
  sched: optimize sched_clock() a bit
  
  sched_clock() uses cycles_2_ns() needlessly - which is an 
  irq-disabling
  variant of __cycles_2_ns().
  
  Most of the time sched_clock() is called with irqs disabled already.
  The few places that call it with irqs enabled need to be updated.
  
  Signed-off-by: Ingo Molnar mi...@elte.hu
  
  and this seems to be one of those calling cases that need to be updated.
  
  Ingo? The call trace is:
  
  BUG: using smp_processor_id() in preemptible [] code: 
  jbd2/sda2-8/337
  caller is native_sched_clock+0x3c/0x68
  Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
  Call Trace:
  [812362c5] debug_smp_processor_id+0xc9/0xe4
  [8101059d] native_sched_clock+0x3c/0x68
  [8101043d] sched_clock+0x9/0xd
  [81212d7a] blk_rq_init+0x97/0xa3
  [81214d71] get_request+0x1c4/0x2d0
  [81214ea6] get_request_wait+0x29/0x1a6
  [81215537] __make_request+0x338/0x45b
  [812147c2] generic_make_request+0x2bb/0x330
  [81214909] submit_bio+0xd2/0xef
  [811413cb] submit_bh+0xf4/0x116
  [81144853] block_write_full_page_endio+0x89/0x96
  [81144875] block_write_full_page+0x15/0x17
  [8119b00a] ext4_writepage+0x356/0x36b
  [810e1f91] __writepage+0x1a/0x39
  [810e32a6] write_cache_pages+0x20d/0x346
  [810e3406] generic_writepages+0x27/0x29
  [811ca279] journal_submit_data_buffers+0x110/0x17d
  [811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d
  [811d0cba] kjournald2+0x147/0x37a
  
  (from the bugzilla thing)
 
 This should be fixed by commit 28f4197e which was merged on friday.

The scheduler commit adding local_clock() (for .36) is:

  c676329: sched_clock: Add local_clock() API and improve documentation

So once that is upstream the block IO statistics code can use that.

Thanks,

Ingo

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.6.35-rc2-git2: Reported regressions from 2.6.34

2010-06-08 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

  Bug-Entry   : http://bugzilla.kernel.org/show_bug.cgi?id=16129
  Subject : BUG: using smp_processor_id() in preemptible [] code: 
  jbd2/sda2
  Submitter   : Jan Kreuzer kontrolla...@gmx.de
  Date: 2010-06-05 06:15 (4 days old)
 
 This seems to have been introduced by
 
   commit 7cbaef9c83e58bbd4bdd534b09052b6c5ec457d5
   Author: Ingo Molnar mi...@elte.hu
   Date:   Sat Nov 8 17:05:38 2008 +0100
 
   sched: optimize sched_clock() a bit
 
   sched_clock() uses cycles_2_ns() needlessly - which is an 
 irq-disabling
   variant of __cycles_2_ns().
 
   Most of the time sched_clock() is called with irqs disabled already.
   The few places that call it with irqs enabled need to be updated.
 
   Signed-off-by: Ingo Molnar mi...@elte.hu
 
 and this seems to be one of those calling cases that need to be updated.

That's a commit from 2008.

 Ingo? The call trace is:
 
   BUG: using smp_processor_id() in preemptible [] code: 
 jbd2/sda2-8/337
   caller is native_sched_clock+0x3c/0x68
   Pid: 337, comm: jbd2/sda2-8 Not tainted 2.6.35-rc1jan+ #4
   Call Trace:
   [812362c5] debug_smp_processor_id+0xc9/0xe4
   [8101059d] native_sched_clock+0x3c/0x68
   [8101043d] sched_clock+0x9/0xd
   [81212d7a] blk_rq_init+0x97/0xa3
   [81214d71] get_request+0x1c4/0x2d0
   [81214ea6] get_request_wait+0x29/0x1a6
   [81215537] __make_request+0x338/0x45b
   [812147c2] generic_make_request+0x2bb/0x330
   [81214909] submit_bio+0xd2/0xef
   [811413cb] submit_bh+0xf4/0x116
   [81144853] block_write_full_page_endio+0x89/0x96
   [81144875] block_write_full_page+0x15/0x17
   [8119b00a] ext4_writepage+0x356/0x36b
   [810e1f91] __writepage+0x1a/0x39
   [810e32a6] write_cache_pages+0x20d/0x346
   [810e3406] generic_writepages+0x27/0x29
   [811ca279] journal_submit_data_buffers+0x110/0x17d
   [811ca986] jbd2_journal_commit_transaction+0x4cb/0x156d
   [811d0cba] kjournald2+0x147/0x37a
 
 (from the bugzilla thing)

The warning was introduced by this fresh commit (and a followup commit) merged 
in the .35 merge window:

 | commit 9195291e5f05e01d67f9a09c756b8aca8f009089
 | Author: Divyesh Shah dps...@google.com
 | Date:   Thu Apr 1 15:01:41 2010 -0700
 | 
 | blkio: Increment the blkio cgroup stats for real now

IIRC Jens posted a fix for the regression. Jens, what's the status of that?

As the code there started using a raw sched_clock() call for block statistics 
purposes, which was a poorly thought out (and buggy) approach:

 - it takes timestamps on different cpus and then compares then, but doesnt 
   consider that sched_clock() is not comparable between CPUs without extra 
   care

 - it doesnt consider the possibility for the sched_clock() result going
   backwards on certain platforms (such as x86)

 - it doesnt consider preemptability

(There's work ongoing to add a clock variant that can be used for such 
purposes, but that's .36 fodder.)

Thanks,

Ingo

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm request 3

2010-03-05 Thread Ingo Molnar

* Mike Galbraith efa...@gmx.de wrote:

 On the bright side, all this hubbub sends a very positive message to the 
 noveau development crew.  Folks, your work is important.  I'd be proud as a 
 peacock :)

Heh, most definitely so!

Noveau really is a game-changer i think, it's a big break-through for Xorg 
IMO.

For the first time in Linux history we support 90%+ of graphics hardware in a 
more or less acceptable way. That is a big deal really and the xorg/drm folks 
should be proud of that accomplishment.

It solves the nvidia.ko mess to a large degree and moves all these things into 
an actionable technical domain. I'm so much happier to argue with OSS folks 
who write this code than having to look at nvidia.ko tainted kernel crash 
logs.

Ingo

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm request 3

2010-03-04 Thread Ingo Molnar

* Ben Skeggs skeg...@gmail.com wrote:

  But here's the thing: if you expect people to do development, they _need_ 
  to be able to mix things. A kernel developer needs to be able to update 
  their kernel. And a kernel _tester_ needs to be able to test that kernel.

 Here's the thing.  *You* pushed for nouveau to go into staging before any of 
 the developers were ready for it.  Two of the big reasons (from my POV) for 
 not requesting inclusion were the context programs (which have since been 
 dealt with) and that yes, we have no intention of keeping crusty APIs around 
 when they aren't what we require.
 
 The idea of staging was to allow for exactly the second problem, so why are 
 you surprised?  The fact Fedora ships nouveau is irrelevant, we also expect 
 that for the most part people will be using our packages, which deal with 
 the ABI issues.

Here is my experience with the development of various ABIs - and i've been on 
both sides of the fence, i've done 'flag day' ABI changes during development 
myself, and i've done gradual ABI development as well.

One experience i can tell you with 100% certainty: no matter whether a project 
is small or large, simple or complex, whether the old ABI is the ugliest wart 
on this planet or just hit by an unfortunate limitation that needs to be 
eliminated.

The conclusion is crystal clear, breaking an ABI via a flag day 
cleanup/feature/etc is:

 - wrong

 - harmful

 - limits the developer base

 - limits the tester base

 - wastes time and effort. (fewer developers/testers means that while _this_ 
   feature was easier to add, all your _future_ features will be a bit harder 
   to do. It compounds up.)

 - so it hurts even the very developer who is most convinced that this was the 
   right thing to do

It's a bad technical decision throughout. It's masochistic and often suicidal 
to just about any project in essence. I've seen projects that did it once and 
died just due to that single act of stupidity. I've seen projects that have 
done it a few times and took the usage hit, limped along with the wounds and 
never grew to the size they could have achieved. I've seen projects that did 
it once, took the hit, learned from it and never did it again.

How many times does DRM want to take that bullet head on?

I have _never_ seen a situation where in hindsight breaking the ABI of a 
widely deployed project could be considered 'good', for just about any sane 
definition of 'good'.

It's really that simple IMO. There's very few unconditional rules in OSS, but 
this is one of them.

Ingo

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm request 3

2010-03-04 Thread Ingo Molnar

* Pekka Enberg penb...@cs.helsinki.fi wrote:

 On Fri, Mar 5, 2010 at 8:49 AM, Ingo Molnar mi...@elte.hu wrote:
  The conclusion is crystal clear, breaking an ABI via a flag day
  cleanup/feature/etc is:
 
  ?- wrong
 
  ?- harmful
 
  ?- limits the developer base
 
  ?- limits the tester base
 
  ?- wastes time and effort. (fewer developers/testers means that while _this_
  ? feature was easier to add, all your _future_ features will be a bit harder
  ? to do. It compounds up.)
 
  ?- so it hurts even the very developer who is most convinced that this was 
  the
  ? right thing to do
 
  It's a bad technical decision throughout. It's masochistic and often 
  suicidal
  to just about any project in essence. I've seen projects that did it once 
  and
  died just due to that single act of stupidity. I've seen projects that have
  done it a few times and took the usage hit, limped along with the wounds and
  never grew to the size they could have achieved. I've seen projects that did
  it once, took the hit, learned from it and never did it again.
 
 Agreed. What bothers me in this discussion is that people keep bringing up 
 the fact that nouveau is mostly developed by volunteers and thus it doesn't 
 make sense to make sure it's backwards (or forwards) compatible. But the way 
 I see it, it's the complete opposite. It's _more_ important to support ABIs 
 for community-driven efforts because you're relying on people who by 
 definition don't have time to waste. While the nouveau people might have 
 good intentions, I'm afraid they might be severely limiting their developer 
 and tester base because they're not focused on real world problems (like the 
 ones Linus is seeing).

Yeah. I've seen a few other bad arguments as well:

   'exploding test matrix'

This is often the result of _another_ bad technical decision: 
over-modularization.

Xorg, mesa/libdrm and the kernel DRM drivers pretty share this signature:

 - it's developed by the same tightly knit developer base who often cross
   between these packages. Features often need changes in each component.

 - a developer to be able to do real work has to have the latest sources
   of all these components.

 - a user just uses whatever horizontal version cut the distro did and never
   truly 'mixes' these components as a conscious decision.

 - distros just try to get the latest and most capable but still stable
   version. Desperately so. Often they will create a version mix that was
   never tested by developers in that form. They'll expose users to ABI
   combinations that were never really intended. They have trouble
   bootstrapping and stabilizing those essentially random combinations and
   then have trouble applying stability and security fixes.

The thing is, if development has such characteristics then it's pretty clearly 
not 3-4 separate projects but _one_ abstract project. [*]

So the 'exploding test matrix' is simply the result of: creating ABIs between 
3-4 _artificial components of the same project_ and then going through 
developer hell living with that mistake. [**]

It's a bit as if we split up the kernel into 'microkernel' components, did a 
VFS ABI, MM ABI, drivers ABI, scheduler ABI, networking ABI and arch ABIs, and 
then tried to develop them as separate components.

If we did then then Linux kernel development would slow down massively while 
in reality everyone would _still_ have to have the latest and greatest source 
checked out to do some real development work and to be able to implement 
features that affect the whole kernel ...

Linux would become an epic fail of historic proportions if we ever did that.

Ingo

[*]  I realize that it's possibly hard for Xorg to merge with mesa and the 
 kernel for license reasons, but my technical observation still stands.

[**] I realize that modularization and many small packages were a clear 
 advantage when we were still all downloading bits via 14.4k modems, but 
 in this day and age of megabit connections that has become a non-issue.
 Rampant over-modularization and the resulting loss of focus on the end
 result (and the resulting explosion of a test matrix) is hurting us _far 
 more_ than the disadvantages of any monolithic could ever hurt. We are 
 seeing the proof of that all day with a 10+ MLOC kernel.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm request 3

2010-03-04 Thread Ingo Molnar

* C. Bergstr?m cbergst...@pathscale.com wrote:

 Pekka Enberg wrote:
 On Fri, Mar 5, 2010 at 8:49 AM, Ingo Molnar mi...@elte.hu wrote:
 The conclusion is crystal clear, breaking an ABI via a flag day
 cleanup/feature/etc is:
 
  - wrong
 
  - harmful
 
  - limits the developer base
 
  - limits the tester base
 
  - wastes time and effort. (fewer developers/testers means that while _this_
   feature was easier to add, all your _future_ features will be a bit harder
   to do. It compounds up.)
 
  - so it hurts even the very developer who is most convinced that this was 
  the
   right thing to do
 
 It's a bad technical decision throughout. It's masochistic and often 
 suicidal
 to just about any project in essence. I've seen projects that did it once 
 and
 died just due to that single act of stupidity. I've seen projects that have
 done it a few times and took the usage hit, limped along with the wounds and
 never grew to the size they could have achieved. I've seen projects that did
 it once, took the hit, learned from it and never did it again.
 
 Agreed. What bothers me in this discussion is that people keep
 bringing up the fact that nouveau is mostly developed by volunteers
 and thus it doesn't make sense to make sure it's backwards (or
 forwards) compatible. But the way I see it, it's the complete
 opposite. It's _more_ important to support ABIs for community-driven
 efforts because you're relying on people who by definition don't have
 time to waste. While the nouveau people might have good intentions,
 I'm afraid they might be severely limiting their developer and tester
 base because they're not focused on real world problems (like the ones
 Linus is seeing).
 staging != stable
 
 Nobody guaranteed a stable API for staging and in fact it was stated 
 previously it needed to be changed.  Please lets just get back to work and 
 stop declaring the sky is falling.

I dont think you understood the argument.

The (very simple) argument was: no matter how a project is developed, whether 
it's been freshly announced (not even in staging), in staging or been upstream 
for years, breaking ABIs is _technically wrong_.

No ifs and when. A released ABI that is in use cannot be so messy to make it 
worth breaking. You've got users. You've got developers. You've got yourself.

You can still phase it out gradually (and even do that quickly), one or two 
stable releases down the road you can even print out the final ABI removal 
patch on paper, make a bonfire out of it and jump on its ashes in joy, but if 
you are interested in running a successful OSS project then the current ABI is 
sacrosanct.

Ingo

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-05 Thread Ingo Molnar

* Dave Airlie airl...@linux.ie wrote:

 
  If it now does not boot up if all its sub-options are enabled, even of some 
  of those sub-options are new, does that count as a driver regression? Sure 
  it 
  does to me ...
 
 But it doesn't to anyone else under any reasonable meaning of the word 
 regression.

There are reactions in this thread that contradict your 'anyone else' point.

 The config option states
 Choose this option if you want kernel modesetting enabled by default,
   and you have a new enough userspace to support this. Running old
   userspaces with this enabled will cause pain.
 
 Will cause pain sounds painful to me, I can make it seem much worse if 
 you'd like.

Except you are missing that the hang (and the first crash as well) happens on 
brand-new user-space just as much - not just on 'old userspaces'.

The bugs i've triggered are independent of any user-space component - it 
happens with a fresh distro just as much.

As i suggested before, at least the text should be updated to include what 
has been written about CONFIG_DRM_RADEON_KMS in this thread before:

  This is a completely new driver.  It's only part of the existing drm for 
  compatibility reasons.  It requires an entirely different graphics stack 
  above it and works very differently from the old drm stack.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-05 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

 On Fri, Feb 5, 2010 at 7:00 PM, Ingo Molnar mi...@elte.hu wrote:
 
  * Dave Airlie airl...@linux.ie wrote:
 
 
   If it now does not boot up if all its sub-options are enabled, even of 
   some
   of those sub-options are new, does that count as a driver regression? 
   Sure it
   does to me ...
 
  But it doesn't to anyone else under any reasonable meaning of the word
  regression.
 
  There are reactions in this thread that contradict your 'anyone else' point.
 
  The config option states
  Choose this option if you want kernel modesetting enabled by default,
  ? ? ? ? ? and you have a new enough userspace to support this. Running old
  ? ? ? ? ? userspaces with this enabled will cause pain.
 
  Will cause pain sounds painful to me, I can make it seem much worse if
  you'd like.
 
  Except you are missing that the hang (and the first crash as well) happens 
  on
  brand-new user-space just as much - not just on 'old userspaces'.
 
  The bugs i've triggered are independent of any user-space component - it
  happens with a fresh distro just as much.
 
  As i suggested before, at least the text should be updated to include what
  has been written about CONFIG_DRM_RADEON_KMS in this thread before:
 
  ? This is a completely new driver. ?It's only part of the existing drm for
  ? compatibility reasons. ?It requires an entirely different graphics stack
  ? above it and works very differently from the old drm stack.
 
 
 Okay I've attached a patch with a revised Kconfig in it.
 
 Does this sound more like reality?

Looks perfect to me!

Acked-by: Ingo Molnar mi...@elte.hu

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Matthew Garrett mj...@srcf.ucam.org wrote:

 On Thu, Feb 04, 2010 at 08:17:05AM +0100, Ingo Molnar wrote:
 
  btw., i just found another bug activated via this same commit, a boot 
  hang after DRM init:
 
 The commit in question didn't cause the hang, so reverting it isn't the 
 appropriate fix.

Well, once i applied the revert i got no more hangs or crashes today, in lots 
of bootups. This is fully repeatable - if i re-apply that commit with the 
config i sent the hang happens again.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Linus Torvalds torva...@linux-foundation.org wrote:

 On Thu, 4 Feb 2010, Ingo Molnar wrote:
 
  Well, once i applied the revert i got no more hangs or crashes today, in 
  lots of bootups. This is fully repeatable - if i re-apply that commit 
  with the config i sent the hang happens again.
 
 But that's just because when it was in staging, you'd never enable it, 
 right? Because your config generator always turns off staging stuff. IOW, 
 it's not the code - it's more an interaction with how your test-bed works.

Correct. Staging drivers can crash and there's no guarantee of quick 
regression fixes (although fixes are generally quick so this isnt a 
complaint), so i exclude them from boot testing.

Anyway, i've essentially moved it back to staging as far as my testing goes, 
that solves the problem too.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Matthew Garrett mj...@srcf.ucam.org wrote:

 On Thu, Feb 04, 2010 at 07:12:18PM +0100, Ingo Molnar wrote:
  
  * Matthew Garrett mj...@srcf.ucam.org wrote:
   The reason the option was in staging (as has been mentioned before) was 
   because the ABI wasn't felt to be stable enough. Upstream is now willing 
   to 
   commit to that stability, so now seems as good a time to move it as any. 
   There's no code change and there's no default configuration change, so I 
   really can't see any way that it can be classed as a regression.
  
  But that argument in essence renders the regression policy meaningless for 
  such code: just about any new driver feature under the sun could be shaped 
  as 
  a Kconfig option, introduced via a drivers/staging Kconfig entry, and then 
  activated via a twoliner commit in a later -rc.
 
 Before this patch, CONFIG_DRM_RADEON_KMS=y would crash your system on boot. 
 [...]

Hm, in what way does that observation address the concerns i've outlined?

Before this patch i could enable CONFIG_DRM_RADEON_KMS=y only if i enabled 
CONFIG_STAGING, which i dont, because doing so would taint my kernel with 
TAINT_CRAP, and the kernel log would contain:

 %s: module is from the staging directory, the quality is unknown, you have 
been warned.,

 [...] After this patch, CONFIG_DRM_RADEON_KMS=y still crashes your system. 
 [...]

After this patch i suddenly get a new body of code with a default-off option 
that would only show up before if i had CONFIG_STAGING=y enabled before.

Do you see my argument why any user who is hit by this would categorize this 
as a kernel regression in an existing driver?

Moving driver functionality from drivers/staging/ to drivers/ might be 
justified, it might be pragmatic, but you dont try to justify it and you dont 
try to outline the pragmatic reasons - from all i can see you seem to argue 
that this is all perfectly fine in late -rc's, which has me worried somewhat.

[ And if that is really fine i'd like to hear Linus's amen on that as well, 
  because i'm sure others would like to use that mechanism too to enable
  new functionality in late -rc's. ]

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Matthew Garrett mj...@srcf.ucam.org wrote:

 On Thu, Feb 04, 2010 at 07:56:03PM +0100, Ingo Molnar wrote:
 
  Do you see my argument why any user who is hit by this would categorize 
  this as a kernel regression in an existing driver?
 
 No. If a user changes configuration and gets a hang, that's a bug but not a 
 regression.

Only if it's some brand-new driver or a brand-new kernel feature for which 
no-one can have any prior expectations of stability. Especially if it's added 
in the merge window when many new drivers are added.

But isnt it a regression to a user if it's shipped in -rc7 appearing as a new 
sub-option of an existing driver?

I'd wager that most main-street Linux users would consider that a regression.
 
As i see it is that you are trying to have it both ways: claim it's a new 
driver when it comes to handling regressions, but also try to have the 
benefits (and adoption flux) of an old driver when it comes to facing it to 
users.

Some info about that in the Kconfig would be helpful IMO - so that people are 
less surprised if it happens to break while the radeon driver worked fine for 
them before.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Alex Deucher alexdeuc...@gmail.com wrote:

 On Thu, Feb 4, 2010 at 2:06 PM, Ingo Molnar mi...@elte.hu wrote:
 
  * Alex Deucher alexdeuc...@gmail.com wrote:
 
  On Thu, Feb 4, 2010 at 1:12 PM, Ingo Molnar mi...@elte.hu wrote:
  
   * Matthew Garrett mj...@srcf.ucam.org wrote:
  
   On Thu, Feb 04, 2010 at 06:54:45PM +0100, Ingo Molnar wrote:
  
But you could claim that it's not a regression because 1) technically 
the
code got introduced in drivers/staging/, and staging drivers are not 
on
the regression list 2) the Kconfig value is default-off so it can only
harm those who got lured by a new Kconfig value popping up in -rc7 in 
a
well working driver they already have enabled.
   
So the moving of driver functionality from drivers/staging/ to 
drivers/
is a grey area it appears. Wouldnt it have been better to do this in 
the
next merge window, as all other drivers do? It's not new hardware
enablement either, it's feature enablement for an existing driver.
  
   The reason the option was in staging (as has been mentioned before) was
   because the ABI wasn't felt to be stable enough. Upstream is now 
   willing to
   commit to that stability, so now seems as good a time to move it as any.
   There's no code change and there's no default configuration change, so I
   really can't see any way that it can be classed as a regression.
  
   But that argument in essence renders the regression policy meaningless 
   for
   such code: just about any new driver feature under the sun could be 
   shaped as
   a Kconfig option, introduced via a drivers/staging Kconfig entry, and 
   then
   activated via a twoliner commit in a later -rc.
  
   IMHO the point of tracking regressions is to reduce the bugginess of the
   kernel and thus to help users, not to give ground for legalistic 
   arguments.
  
   There _are_ common-sense exceptions from the regression rules, such as 
   the
   introduction of a new piece of hardware that was previously unsupported
   (hence there's no expectation of stability) - but the tweaking of an
   existing, widely used driver (even if the new opion is default-off) 
   hardly
   seems to qualify for that.
  
 
  This is a completely new driver. ?It's only part of the existing drm for
  compatibility reasons. ?It requires an entirely different graphics stack
  above it and works very differently from the old drm stack.
 
  Will the user know? IMHO what matters in the end is user expectation.
 
  Lets walk through what a current kernel tester of the drm/radeon driver sees
  when he types 'make oldconfig' after installing the (to-be-released) .33-rc7
  kernel. Firstly, the user with a brand-new distro already has this enabled:
 
  ?CONFIG_DRM_RADEON=y
 
  and knows the driver, and it performs adequately. Then in -rc7 he gets a new
  option:
 
  ?ATI Radeon (DRM_RADEON) [Y/n/?] y
  ? ?Enable modesetting on radeon by default (DRM_RADEON_KMS) [N/y/?] (NEW)
 
  The user might easily go: Hey this is a driver i already have, and there's 
  a
  new sub-option for this well-working driver. Sure, enable it, these kernel
  folks know what they are doing and i rarely see any crashes past -rc2
  kernels.
 
  Does this new option tell him what you just told me, that:
 
  ? This is a completely new driver. ?It's only part of the existing drm for
  ? compatibility reasons. ?It requires an entirely different graphics stack
  ? above it and works very differently from the old drm stack.
 
  ?
 
  it doesnt. Even if he types '?', it tells:
 
  ?CONFIG_DRM_RADEON_KMS:
 
  ?Choose this option if you want kernel modesetting enabled by default,
  ?and you have a new enough userspace to support this. Running old
  ?userspaces with this enabled will cause pain.
 
  The user will likely go cool I have a fresh distro with recent Xorg, lets
  try it.
 
 
 And if it crashes, he'll report a bug and we'll fix it.

Nobody has reacted to my related boot hang bugreport yet - and it's detailed 
and fully reproducible (so i can test any proposed fixes as well in short 
order). I.e. my limited testing has triggered two separate bugs in the same 
driver - and this will show up in -rc7.

It might be all OK and no-one else will see trouble. Or past patterns might 
repeat themselves and i might simply be an early bird for trouble to come.

My (oft repeated) point is that adding new sub-features to existing drivers 
is not what we do in late -rc's: there's simply not enough time to shake out 
bugs/regressions in them.

We introduce new functionality to existing drivers in the merge window - in 
the two weeks following a stable kernel's release.

In late -rc's we only try to fix regressions. Sometimes we make exceptions 
for pragmatic reasons, but then we are straightforward about those reasons 
and try to warn users about our zeal to help them with cool, new, 
not-to-be-missed GPU functionality ;-)

Thanks,

Ingo

Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Jesse Barnes jbar...@virtuousgeek.org wrote:

 On Thu, 4 Feb 2010 20:32:32 +0100
 Ingo Molnar mi...@elte.hu wrote:
  Nobody has reacted to my related boot hang bugreport yet - and it's
  detailed and fully reproducible (so i can test any proposed fixes as
  well in short order). I.e. my limited testing has triggered two
  separate bugs in the same driver - and this will show up in -rc7.
  
  It might be all OK and no-one else will see trouble. Or past patterns
  might repeat themselves and i might simply be an early bird for
  trouble to come.
  
  My (oft repeated) point is that adding new sub-features to existing
  drivers is not what we do in late -rc's: there's simply not enough
  time to shake out bugs/regressions in them.
  
  We introduce new functionality to existing drivers in the merge
  window - in the two weeks following a stable kernel's release.
 
 This is the .config issue right?  It doesn't sound like the bug is new, 
 you're just seeing now it because of the way you run tests.  It shouldn't 
 affect any more or fewer users than it did before, and reverting the move 
 radeon KMS out of staging won't fix the bug at all or prevent anyone from 
 seeing it.  People using KMS will still use KMS and people without it 
 won't, [...]

I think you are missing my point. My point is very simple: existing non-KMS 
users of CONFIG_DRM_RADON=y (a pre-existing driver) might turn on the new 
sub-feature (CONFIG_DRM_RADEON_KMS=y), in the expectation that this is a safe 
addition to his currently well-working driver.

( I have to confess i do that all the time for drivers that work well for me, 
  and if it pops up in a late -rc i sure expect it to be safe to enable. I 
  dont even read the help text most of the time - if the single-line summary 
  sounds useful i enable it. Especially if the Kconfig help entry says it's 
  safe with a new distro, it's not CONFIG_EXPERIMENTAL, it's not marked 
  CONFIG_BROKEN, it's not in CONFIG_STAGING, etc. )

That action might hang or crash his kernel, and if that user then reports:

   Hey, -rc7 just hung on me after enabling this new .config option it 
offered for the radeon driver i am using, please add this to the list of 
regressions. 

is this really the right kind of reply:

  Since we moved it from drivers/staging/ to drivers/ this hang you are 
   seeing is technically not a regression, we might or might not fix it. 

?

I doubt the user would be overly enthusiastic about that kind of reply ;-)

Guys, you should really _think_ about it a minute and realize what the 
purpose of a regression policy is.

It's not to be a PITA to subsystem maintainers, it's not an annoyance just to 
keep you from doing cool stuff. It's not something which you should try to 
lawyer your way out of via an as narrow interpretation as you can.

A regression policy is something that generally helps the quality of Linux, 
so it's worth interpreting broadly and generously in spirit not just in 
letter. If there's a single most prominent complaint i hear about the 
upstream kernel is that it breaks too often. (right after 'it doesnt support 
my graphics hardware' - so i sure can relate to the pragmatic reasons of 
pushing KMS strongly!)

If i run into a crash and a hang, you can bet that others will as well.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Jerome Glisse gli...@freedesktop.org wrote:

 On Thu, Feb 04, 2010 at 08:19:35PM +0100, Ingo Molnar wrote:
  
  * Matthew Garrett mj...@srcf.ucam.org wrote:
  
   On Thu, Feb 04, 2010 at 07:56:03PM +0100, Ingo Molnar wrote:
   
Do you see my argument why any user who is hit by this would categorize 
this as a kernel regression in an existing driver?
   
   No. If a user changes configuration and gets a hang, that's a bug but not 
   a 
   regression.
  
  Only if it's some brand-new driver or a brand-new kernel feature for which 
  no-one can have any prior expectations of stability. Especially if it's 
  added 
  in the merge window when many new drivers are added.
  
  But isnt it a regression to a user if it's shipped in -rc7 appearing as a 
  new 
  sub-option of an existing driver?
  
  I'd wager that most main-street Linux users would consider that a 
  regression.
   
  As i see it is that you are trying to have it both ways: claim it's a new 
  driver when it comes to handling regressions, but also try to have the 
  benefits (and adoption flux) of an old driver when it comes to facing it to 
  users.
 
 We have been treating KMS regression as regression, [...]

Great!

 [...] i fixed numerous regressions since it was first merged as an staging 
 driver, and i keep doing so, i try to be as much reactive as i can. I am 
 sorry you have a bad experience about it. I just wanted to add that we 
 planed to move KMS out of staging in 2.6.33 long time ago and yes maybe we 
 should have done it earlier, but no matter when we do the change you will 
 still face this bug until we fix it.

I dont think you'll ever see my complain about a bug. I dont, and i introduce 
far too many of them to have any moral basis for complaint in any case ;-)

I only questioned the validity of this initial reaction by Dave Arlie:

 | Its not enabled by default so reverting this doesn't make much sense.
 |
 | We can just treat this as a normal driver bugreport.

 So on fixing the issue front, one question do you also enable radeonfb ? if 
 so then its likely the root issue of this bug, i think kconfig should 
 forbid having both radeon kms + radeonfb but i am not sure how allyesconfig 
 behave in respect of such constraint.

Please see the bugreprt i made in this thread, under the following subject:

  hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of 
staging.

It has all that info and more. (i've bounced it to you privately as well)

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

 On Wed, Feb 3, 2010 at 1:44 AM, Ingo Molnar mi...@elte.hu wrote:
 
  * Dave Airlie airl...@linux.ie wrote:
 
   It's the moving of radeom KMS out of staging after -rc6 that causes it,
   because it brought it into the scope of my testing:
  
   ?f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging.
  
   So at least on this box it's clearly not ready for mainline enablement
   yet. I've attached the revert patch further below.
 
  Its not enabled by default so reverting this doesn't make much sense.
 
  I boot allyesconfig kernels regularly, which testing method works fine 
  with another 2000+ upstream drivers. (including the dozens of drivers 
  which match to active hardware components on that box)
 
 Okay this was something I wondered about, since these are *not* 
 allyesconfig .configs, I've generated some and CONFIG_FB_RADEON is always 
 on here, and you seem to not have that enabled (not that enabling it is a 
 good idea it is in fact a really bad idea).

These were random configs - the size doesnt match an allyesconfig, those are 
way bigger. My above comment related to the first crash, and to my argument 
that all other drivers are fine during bootup - and there's a lot of them.

 So do you have something you are running after allyesconfig to fix things? 
 or have you just got a config that is close enough to allyesconfig.
 
 I'm building kernels with your .config now and boot testing them on the 
 full range of hardware I have/

Thanks. Is there something i can enable to get a better log for you to find 
out where (and why) it's hanging? It's still early during bootup so the box 
is not particularly debuggable - so i'm not sure i can get a task list dump, 
etc., unfortunately.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Jesse Barnes jbar...@virtuousgeek.org wrote:

[...]
  That action might hang or crash his kernel, and if that user then 
  reports:
  
 Hey, -rc7 just hung on me after enabling this new .config option
  it offered for the radeon driver i am using, please add this to the
  list of regressions. 
  
  is this really the right kind of reply:
  
Since we moved it from drivers/staging/ to drivers/ this hang you
  are seeing is technically not a regression, we might or might not fix
  it. 
  
  ?
  
  I doubt the user would be overly enthusiastic about that kind of
  reply ;-)
 
 Whether or not it's a regression is mostly irrelevant, it's a real bug and 
 the radeon guys are working on fixing it. [...]

Fortunately it's being worked on.

I beg to differ with your argument about it not mattering whether a bug is 
categorized as a regression: Rafael's regression list is far more prominent 
and the bugs listed there get fixed with a high likelhood.

Note that there's clear evidence of that in this very thread: the hang bug 
was ignored as a plain DRM non-regression bugreport, _despite_ the prior 
scrutiny in the thread, up to the moment Linus pointed it out and turned it 
into a de-facto regression ...

There's also another purpose of categorizing bugs as regressions: tester 
timeliness. We tend to treat bugs as 'plain bugs' when they are reported too 
late after a few kernel releases of the bug having been in the wild. We do 
this to encourage testers to test earlier -rc's as well, as there's a real 
tangible benefit of the 'we dont do regressions' policy: bugs get fixed and 
testers feel involved, and it's also the stage were we _can_ fix bugs with a 
lower cost to all parties involved.

But what 'timeliness of testing' can there be if new features are added in a 
late -rc and bugs are explicitly categorized as 'not a regression'?

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Matthew Garrett mj...@srcf.ucam.org wrote:

 On Thu, Feb 04, 2010 at 09:22:54PM +0100, Ingo Molnar wrote:
 
 Hey, -rc7 just hung on me after enabling this new .config option it 
  offered for the radeon driver i am using, please add this to the list 
  of 
  regressions. 
 
 If the same configuration options hang on both an old kernel and a new 
 kernel, how is that in any plausible way a regression? What's regressed?

Regressions are not limited to 'same config' kernels, last i checked. If that 
has changed (or if i'm misunderstanding it) then it would be nice to hear a 
clarification about that from Linus.

The way i understand it is that there are narrow exceptions from the 
regression rules, such as completely new drivers for which there can be no 
prior expectation of stability by users. (but for even them we are generally 
on the safer side to list bugs in them as regressions as well - especially if 
we expect many users to enable it.)

AFAIK there's no exception for new sub-features of existing facilities or 
drivers, even if it's default-disabled.

This issue materially affects quite a few bugs i'm handling as a maintainer. 
Many of them are under default-off config options - most new aspects to 
existing code are introduced in such a way. It would remove quite a bit of 
urgent-workload from my workflow if i could strike them from Rafael's list 
and could deprioritize them as plain bugs, to be fixed as time permits.

IMHO it would be rather counter-productive to kernel quality if we did that 
kind of regression-lawyering though.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

 
  These were random configs - the size doesnt match an allyesconfig, those 
  are
  way bigger. My above comment related to the first crash, and to my argument
  that all other drivers are fine during bootup - and there's a lot of them.
 
  So do you have something you are running after allyesconfig to fix things?
  or have you just got a config that is close enough to allyesconfig.
 
  I'm building kernels with your .config now and boot testing them on the
  full range of hardware I have/
 
  Thanks. Is there something i can enable to get a better log for you to find
  out where (and why) it's hanging? It's still early during bootup so the box
  is not particularly debuggable - so i'm not sure i can get a task list 
  dump,
  etc., unfortunately.
 
  Do you have NMI watchdog enabled? (does it work that early)
 
  a backtrace of where it hangs would be nice,
 
  Also a dmesg from booting with drm.debug=15 might help narrow it down
  also.
 
 
 Okay I've booted this on the rv370 + rv380 machines I have, I've no old 
 Athlon's though so I'm trying to get RHTS to give me access to one or two 
 internally,
 
 Another question, that came to mind, what is there any monitors plugged in? 
 or a KVM or something? if yes can you try without and if no can you try 
 with?
 
 Also can you add CONFIG_FRAMEBUFFER_CONSOLE as well since I don't think we 
 test often without it.

ok - i'll try your suggestions and send an update - will probably have to 
wait until next week. Feel free to deprioritize the bug until then (i have my 
revert as a short-term band-aid), unless others report similar problems too.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: hung bootup with drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-04 Thread Ingo Molnar

* Matthew Garrett mj...@srcf.ucam.org wrote:

 On Thu, Feb 04, 2010 at 10:05:59PM +0100, Ingo Molnar wrote:
 
  Regressions are not limited to 'same config' kernels, last i checked. If 
  that 
  has changed (or if i'm misunderstanding it) then it would be nice to hear a 
  clarification about that from Linus.
 
 If an option has *never* worked on a given configuration, then it's not a 
 regression. [...]

The *main driver* (CONFIG_DRM_RADEON=y, as far as the user is concerned) is 
many years old and it certainly worked just fine for tens of thousands of 
test iterations on this box, up until that commit.

That box alone has done in excess of half a million boot iterations in the 
past 2+ years. About 28% of the tests had CONFIG_DRM_RADEON=y, so the number 
of successful bootups with CONFIG_DRM_RADEON=y is in excess of one hundred 
thousand. There was not a single failed bootup in those two years due to the 
CONFIG_DRM_RADEON driver that i can remember.

If it now does not boot up if all its sub-options are enabled, even of some 
of those sub-options are new, does that count as a driver regression? Sure it 
does to me ...

IMHO you are trying to put a narrow technical distinction into it which does 
not exist for users. You argue that it's a new default-off sub-option of an 
existing driver, hence it cannot be a regression. The option shows up as a 
sub-option to an existing driver in make oldconfig, with a fairly innocious 
sounding help text, so to a user it certainly looks as if it's one unit and 
that it is the radeon driver that regressed.

*If* we make a driver feature available in a popular driver and make it 
Kconfig visible without obvious warnings (i.e. lure the user to enable it 
with the notion that it's just one coherent unit with a trusted, existing 
driver), we should also hold up the other side of the deal and treat the 
*bugs* as a coherent unit as well.

I.e. treat the driver as a coherent whole not only when it's convenient to 
us, but also when it's somewhat inconvenient to do. We cannot have it both 
ways.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-03 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

 On Wed, Feb 3, 2010 at 1:46 AM, Ingo Molnar mi...@elte.hu wrote:
 
  * Dave Airlie airl...@gmail.com wrote:
 
  On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote:
  
   * Dave Airlie airl...@linux.ie wrote:
  
Hi Linus,
   
Please pull the 'drm-linus' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git 
drm-linus
   
  
   I've also added an oops fix I seem to lose off my radar to this tree.
  
   commit 17aafccab4352b422aa01fa6ebf82daff693a5b3
   Author: Michel D??nzer daen...@vmware.com
   Date: ? Fri Jan 22 09:20:00 2010 +0100
  
   ? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
  
 
  Wierd this suggests something else is wrong on that machine can you get me
  the whole dmesg? I'm guessing some iommu or swiotlb issue.
 
  This box has no known hardware or software problems, just this week it 
  booted
  in excess of 1000 kernels so i'd exclude that angle for now.
 
  I have bisected the crash back to the DRM tree and the crash went away with
  the Kconfig revert i applied - and it got fixed by Jerome's patch. I posted
  my config and i posted the relevant boot log as well. Find below the full
  bootlog as well with vanilla -git (ab65832) and the config. (i dont think it
  matters)
 
  I've asked Jerome to fix the oops, but really anyone with an old .config
  won't get hit by this, and we've booted this on quite a lot of machines at
  this point.
 
  I dont see the commit in yesterday's linux-next. It has very fresh
  timestamps:
 
  ?commit f71d0187987e691516cd10c2702f002c0e2f0edc
  ?Author: ? ? Dave Airlie airl...@redhat.com
  ?AuthorDate: Mon Feb 1 11:35:47 2010 +1000
  ?Commit: ? ? Dave Airlie airl...@redhat.com
  ?CommitDate: Mon Feb 1 11:35:47 2010 +1000
 
  What kind of widespread testing could this commit have gotten in the less
  than 24 hours before it hit mainline?
 
 
 Its shipping in a major distro by default, its planned to be shipped in an 
 even more major distro. Its been boot tested on 1000s of machines by 1000s 
 of ppl.

Well but that's not the precise tree you sent to Linus, is it?

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-03 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

  On Wed, Feb 3, 2010 at 1:46 AM, Ingo Molnar mi...@elte.hu wrote:
  
   * Dave Airlie airl...@gmail.com wrote:
  
   On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote:
   
* Dave Airlie airl...@linux.ie wrote:
   
 Hi Linus,

 Please pull the 'drm-linus' branch from
 ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git
  drm-linus

   
I've also added an oops fix I seem to lose off my radar to this tree.
   
commit 17aafccab4352b422aa01fa6ebf82daff693a5b3
Author: Michel D??nzer daen...@vmware.com
Date: ? Fri Jan 22 09:20:00 2010 +0100
   
? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
   
  
   Wierd this suggests something else is wrong on that machine can you get 
   me
   the whole dmesg? I'm guessing some iommu or swiotlb issue.
  
   This box has no known hardware or software problems, just this week it 
   booted
   in excess of 1000 kernels so i'd exclude that angle for now.
  
   I have bisected the crash back to the DRM tree and the crash went away 
   with
   the Kconfig revert i applied - and it got fixed by Jerome's patch. I 
   posted
   my config and i posted the relevant boot log as well. Find below the full
   bootlog as well with vanilla -git (ab65832) and the config. (i dont 
   think it
   matters)
  
   I've asked Jerome to fix the oops, but really anyone with an old .config
   won't get hit by this, and we've booted this on quite a lot of machines 
   at
   this point.
  
   I dont see the commit in yesterday's linux-next. It has very fresh
   timestamps:
  
   ?commit f71d0187987e691516cd10c2702f002c0e2f0edc
   ?Author: ? ? Dave Airlie airl...@redhat.com
   ?AuthorDate: Mon Feb 1 11:35:47 2010 +1000
   ?Commit: ? ? Dave Airlie airl...@redhat.com
   ?CommitDate: Mon Feb 1 11:35:47 2010 +1000
  
   What kind of widespread testing could this commit have gotten in the less
   than 24 hours before it hit mainline?
  
 
  Its shipping in a major distro by default, its planned to be shipped in an
  even more major distro. Its been boot tested on 1000s of machines by 1000s
  of ppl.
 
  Well but that's not the precise tree you sent to Linus, is it?
 
 It pretty much is. If I could blame your crash on any of the recent patches 
 I would but its something new and unfun. [...]

You dont seem to realize the plain and simple fact that the bug (and some 
other bug) was obscure before because this particular KMS aspect of the 
radeon driver was in drivers/staging/, and it became more prominent via this 
post-rc6 commit:

 | From f71d0187987e691516cd10c2702f002c0e2f0edc Mon Sep 17 00:00:00 2001
 | From: Dave Airlie airl...@redhat.com
 | Date: Mon, 1 Feb 2010 11:35:47 +1000
 | Subject: [PATCH] drm/radeon/kms: move radeon KMS on/off switch out of 
staging.
 |
 | We are happy enough that the KMS driver is stable enough for enough people
 | for the kms enable/disable to leave staging. Distros can now contemplate
 | turning this on.
 |
 | Signed-off-by: Dave Airlie airl...@redhat.com
 | ---
 |  drivers/gpu/drm/Kconfig |2 ++
 |  drivers/staging/Kconfig |2 --
 |  2 files changed, 2 insertions(+), 2 deletions(-)

I never claimed (and still dont claim) that the bug is 'new' per se, so why 
do you keep beating down on that straw man argument? I said it in my very 
first mail that this bug got brought upon us by the Kconfig commit above:

  It's the moving of radeom KMS out of staging after -rc6 that causes it, 
  because it brought it into the scope of my testing:
 
f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging.
  
  So at least on this box it's clearly not ready for mainline enablement 
  yet.

I dont mind reporting bugs and testing patches (as i did), all i said is that 
from a QA angle it's somewhat late to do that in -rc7. (It's not even a 
completely new driver either, which people would know to stay away from - 
it's a new config option of an existing driver, so i'd expect many people to 
turn it on when they see it in the oldconfig - even though it's default-off.)

You made the bug more prominent by moving it into the driver proper, after 
-rc6, and while i dont mind reporting and working on bugs, your constant 
denial is somewhat counter-productive, as (beyond the waste of time on these 
emails) it suggests that we might see repeat incidents of this kind in the 
future.

Anyway, with two bugs in a row this commit is clearly too problematic for me 
so i have reverted f71d018 from -tip.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com

Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-02 Thread Ingo Molnar

* Dave Airlie airl...@linux.ie wrote:

  It's the moving of radeom KMS out of staging after -rc6 that causes it, 
  because it brought it into the scope of my testing:
  
   f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging.
  
  So at least on this box it's clearly not ready for mainline enablement 
  yet. I've attached the revert patch further below.
 
 Its not enabled by default so reverting this doesn't make much sense.

I boot allyesconfig kernels regularly, which testing method works fine with 
another 2000+ upstream drivers. (including the dozens of drivers which match 
to active hardware components on that box)

 We can just treat this as a normal driver bugreport.

Commit f71d0187 is post-rc6, so i'm not sure it fits your this is business 
as usual, lets move on characterisation.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-02 Thread Ingo Molnar

* Jerome Glisse gli...@freedesktop.org wrote:

 On Tue, Feb 02, 2010 at 09:17:27AM +0100, Ingo Molnar wrote:
  
  * Dave Airlie airl...@linux.ie wrote:
  
Hi Linus,

Please pull the 'drm-linus' branch from
ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git 
drm-linus

   
   I've also added an oops fix I seem to lose off my radar to this tree.
   
   commit 17aafccab4352b422aa01fa6ebf82daff693a5b3
   Author: Michel D??nzer daen...@vmware.com
   Date:   Fri Jan 22 09:20:00 2010 +0100
   
   drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
  
  FYI, this drm pull into mainline has triggered quick boot crashes in -tip 
  testing (even with the above fix applied), on an Athlon64 whitebox PC with:
  
   01:00.0 VGA compatible controller: ATI Technologies Inc RV370 5B60 [Radeon 
  X300 (PCIE)]
   01:00.1 Display controller: ATI Technologies Inc RV370 [Radeon X300SE]
  
  the crash is:
  
  [7.111003] radeon :01:00.0: Disabling GPU acceleration
  [7.273547] Failed to wait GUI idle while programming pipes. Bad things 
  might happen.
  [7.436296] [drm:r100_cp_fini] *ERROR* Wait for CP idle timeout, 
  shutting down CP.
  [7.598755] Failed to wait GUI idle while programming pipes. Bad things 
  might happen.
  [7.599306] BUG: unable to handle kernel paging request at f838
  [7.59] IP: [c149f0de] rv370_pcie_gart_set_page+0x2d/0x3c
  [7.59] *pde = 36d44067 *pte = 
  [7.59] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
  [7.59] last sysfs file:
  
  i have bisected it back to:
  
   | 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0 is the first bad commit
   | commit 97b94ccb9aa1b82ed7a9a045d0ae5b32c99b84a0
   | Author: Dave Airlie airl...@redhat.com
   | Date:   Fri Jan 29 15:31:47 2010 +1000
   |
   | drm/radeon/kms: fix incorrect logic in DP vs eDP connector checking.
   |
   | This makes displayport work again here.
  
  Unfortunately even with that patch reverted it still crashes. Config and 
  bootlog attached.
  
  It's the moving of radeom KMS out of staging after -rc6 that causes it, 
  because it brought it into the scope of my testing:
  
   f71d018: drm/radeon/kms: move radeon KMS on/off switch out of staging.
  
  So at least on this box it's clearly not ready for mainline enablement yet. 
  I've attached the revert patch further below.
  
  Ingo
  
 
 Attached is a patch which will fix the oops, still it's strange that CP 
 fails to init on your config. [...]

Thanks, that fixes the crash here!

Tested-by: Ingo Molnar mi...@elte.hu

 [...] Do you have IOMMU enabled ? I haven't played with iommu stuff thus i 
 wonder if we are missing somethings in this area.

No IOMMU here - this is a 5 years old box. (beyond GART that is)

Your patch fixes a bona-fide illegal-access bug in the DRM code, that's more 
than enough to crash the box ;-)

Btw., there's a new warning in the DRM code

 drivers/gpu/drm/ati_pcigart.c: In function ???drm_ati_pcigart_init???:
 drivers/gpu/drm/ati_pcigart.c:115: warning: format ???%Lx??? expects type 
???long long unsigned int???, but argument 3 has type ???dma_addr_t???

Please fix that too, the kernel build is noisy enough as-is.

Thanks,

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [crash, PATCH] Revert drm/radeon/kms: move radeon KMS on/off switch out of staging.

2010-02-02 Thread Ingo Molnar

* Dave Airlie airl...@gmail.com wrote:

 On Tue, Feb 2, 2010 at 6:35 PM, Dave Airlie airl...@gmail.com wrote:
  On Tue, Feb 2, 2010 at 6:17 PM, Ingo Molnar mi...@elte.hu wrote:
 
  * Dave Airlie airl...@linux.ie wrote:
 
   Hi Linus,
  
   Please pull the 'drm-linus' branch from
   ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git 
   drm-linus
  
 
  I've also added an oops fix I seem to lose off my radar to this tree.
 
  commit 17aafccab4352b422aa01fa6ebf82daff693a5b3
  Author: Michel D??nzer daen...@vmware.com
  Date: ? Fri Jan 22 09:20:00 2010 +0100
 
  ? ? drm/radeon/kms: Fix oops after radeon_cs_parser_init() failure.
 
 
  Wierd this suggests something else is wrong on that machine can you get me
  the whole dmesg? I'm guessing some iommu or swiotlb issue.
 
  I've asked Jerome to fix the oops, but really anyone with an old .config 
  won't get hit by this, and we've booted this on quite a lot of machines 
  at this point.
 
 Ideas keep flooding in here, did you softboot from the old UMS driver? or 
 coldboot? does coldbooting help?

Btw., i have cold booted it after full poweroff, and it crashed in the same 
way.

Ingo

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm

2009-12-10 Thread Ingo Molnar

* Alan Cox a...@lxorguk.ukuu.org.uk wrote:

  Last time they were asked that, they wanted to be free of changing their
  kernel/userspace interface before upstreaming.
 
 So put it in staging with a note to that effect. That'll also get it 
 more exposure and review.

Well, arguably that particular idea should have come from the people 
maintaining that area of code. There's this one missing driver which 
covers (more or less) like 40%-50% of the PC market, that's a glaringly 
significant issue, isnt it? It doesnt get any more significant, does it?

Ingo

--
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[patch] Fix: 'return -ENOMEM' instead of 'return ENOMEM'

2009-11-12 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

  @@ -3730,7 +3730,7 @@ tracing_stats_read(struct file *filp, char __user 
  *ubuf,
   
  s = kmalloc(sizeof(*s), GFP_KERNEL);
  if (!s)
  -   return ENOMEM;
  +   return -ENOMEM;
   
  trace_seq_init(s);
   
 
 lol, there we go again.
 
 Andy, can we have a checkpatch rule please?

Note, that will upset creative uses of error codes i guess, such as 
fs/xfs/.

But yeah, +1 from me too.

Ob'post'mortem - looked for similar patterns in the kernel and there's 
quite a few bugs there:

 include/net/inet_hashtables.h:  return ENOMEM; # bug
 drivers/scsi/aic7xxx/aic7xxx_osm.c: return ENOMEM; # works 
but weird
 drivers/scsi/cxgb3i/cxgb3i_offload.c:   return ENOMEM; # works 
but weird
 fs/ocfs2/dlm/dlmrecovery.c:return EAGAIN;  # bug
 drivers/block/cciss_scsi.c: return ENXIO;  # works 
but weird
 drivers/gpu/drm/radeon/radeon_irq.c:return EINVAL; # bug
 drivers/gpu/drm/radeon/radeon_irq.c:return EINVAL; # bug
 drivers/isdn/hardware/mISDN/hfcmulti.c: return EINVAL; # bug

5 out of 8 places look buggy - i.e. more than 60% - a checkpatch warning 
would avoid real bugs here. (even ignoring the cleanliness effects of 
using proper error propagation)

Cc:-ed affected maintainers. The rightmost column are my observations. 
Below is the patch fixing these.

Ingo

Signed-off-by: Ingo Molnar mi...@elte.hu
---
 drivers/gpu/drm/radeon/radeon_irq.c|4 ++--
 drivers/isdn/hardware/mISDN/hfcmulti.c |2 +-
 fs/ocfs2/dlm/dlmrecovery.c |2 +-
 include/net/inet_hashtables.h  |2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_irq.c 
b/drivers/gpu/drm/radeon/radeon_irq.c
index b79ecc4..fbbc0a1 100644
--- a/drivers/gpu/drm/radeon/radeon_irq.c
+++ b/drivers/gpu/drm/radeon/radeon_irq.c
@@ -76,7 +76,7 @@ int radeon_enable_vblank(struct drm_device *dev, int crtc)
default:
DRM_ERROR(tried to enable vblank on non-existent crtc 
%d\n,
  crtc);
-   return EINVAL;
+   return -EINVAL;
}
} else {
switch (crtc) {
@@ -89,7 +89,7 @@ int radeon_enable_vblank(struct drm_device *dev, int crtc)
default:
DRM_ERROR(tried to enable vblank on non-existent crtc 
%d\n,
  crtc);
-   return EINVAL;
+   return -EINVAL;
}
}
 
diff --git a/drivers/isdn/hardware/mISDN/hfcmulti.c 
b/drivers/isdn/hardware/mISDN/hfcmulti.c
index faed794..cfb45c9 100644
--- a/drivers/isdn/hardware/mISDN/hfcmulti.c
+++ b/drivers/isdn/hardware/mISDN/hfcmulti.c
@@ -2846,7 +2846,7 @@ mode_hfcmulti(struct hfc_multi *hc, int ch, int protocol, 
int slot_tx,
int conf;
 
if (ch  0 || ch  31)
-   return EINVAL;
+   return -EINVAL;
oslot_tx = hc-chan[ch].slot_tx;
oslot_rx = hc-chan[ch].slot_rx;
conf = hc-chan[ch].conf;
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index d9fa3d2..0a8a6a4 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2639,7 +2639,7 @@ int dlm_begin_reco_handler(struct o2net_msg *msg, u32 
len, void *data,
 dlm-name, br-node_idx, br-dead_node,
 dlm-reco.dead_node, dlm-reco.new_master);
spin_unlock(dlm-spinlock);
-   return EAGAIN;
+   return -EAGAIN;
}
spin_unlock(dlm-spinlock);
 
diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index d522dcf..5e31447 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -193,7 +193,7 @@ static inline int inet_ehash_locks_alloc(struct 
inet_hashinfo *hashinfo)
hashinfo-ehash_locks = kmalloc(size * sizeof(spinlock_t),
GFP_KERNEL);
if (!hashinfo-ehash_locks)
-   return ENOMEM;
+   return -ENOMEM;
for (i = 0; i  size; i++)
spin_lock_init(hashinfo-ehash_locks[i]);
}

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [patch] Fix: 'return -ENOMEM' instead of 'return ENOMEM'

2009-11-12 Thread Ingo Molnar

* Joel Becker joel.bec...@oracle.com wrote:

 On Thu, Nov 12, 2009 at 11:17:58AM -0800, Joel Becker wrote:
  On Thu, Nov 12, 2009 at 09:10:43AM +0100, Ingo Molnar wrote:
   5 out of 8 places look buggy - i.e. more than 60% - a checkpatch warning 
   would avoid real bugs here. (even ignoring the cleanliness effects of 
   using proper error propagation)
   
   Cc:-ed affected maintainers. The rightmost column are my observations. 
   Below is the patch fixing these.
  
  Acked-by: Joel Becker joel.bec...@oracle.com
 
 I take that back.  NAK.
 
 Sorry, I read the code wrong.  This function is just a handler. The 
 caller, dlm_send_begin_reco_message(), expects the positive EAGAIN as 
 a non-error case.

Well, at minimum the error code usage is very confused. The 
dlm_begin_reco_handler gets registered via o2net_register_handler(). 
Here's where dlm_begin_reco_handler gets registered, followed by 
dlm_finalize_reco_handler right afterwards:

status = o2net_register_handler(DLM_BEGIN_RECO_MSG, dlm-key,
sizeof(struct dlm_begin_reco),
dlm_begin_reco_handler,
dlm, NULL, dlm-dlm_domain_handlers);

and right before that dlm_reco_data_done_handler() gets registered:

status = o2net_register_handler(DLM_RECO_DATA_DONE_MSG, dlm-key,
sizeof(struct dlm_reco_data_done),
dlm_reco_data_done_handler,
dlm, NULL, dlm-dlm_domain_handlers);

And look what dlm_reco_data_done_handler() starts with:

int dlm_reco_data_done_handler(struct o2net_msg *msg, u32 len, void *data,
   void **ret_data)
{
struct dlm_ctxt *dlm = data;
struct dlm_reco_data_done *done = (struct dlm_reco_data_done *)msg-buf;
struct dlm_reco_node_data *ndata = NULL;
int ret = -EINVAL;

if (!dlm_grab(dlm))
return -EINVAL;

A negative error code right there. The other event handlers are seem to 
be similar - dlm_begin_reco_handler() is the odd one out.

So while you are right and my patch is wrong and DLM_BEGIN_RECO_MSG 
processing works right now - at minimum this is a very dangerous/fragile 
pattern of error code usage. For exampe if anyone uses 
dlm_begin_reco_handler() to implement a new state machine handler in the 
future, but uses one of the other event handlers to actually use it, a 
hard to find bug is introduced.

Ingo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-23 Thread Ingo Molnar

* Suresh Siddha suresh.b.sid...@intel.com wrote:

 On Fri, 2009-10-23 at 00:24 -0700, Thomas Schlichter wrote:
  Hmm, at this point I already was more than a week ago:
http://marc.info/?l=linux-kernelm=125537770514713w=2
  
  OK, I also modified ioremap() and set_memory_wc() but your patch is just 
  part 
  of what I did there...
 
 oops! Sorry I read the later emails in this thread carefully but not 
 the conversation with Thomas Hellstrom. Anyhow, glad that you were on 
 top of this. Thanks.
 
 Ingo, can you please queue this patch with a sign-off from Thomas 
 Schlichter too?

Please resend the latest with all the acks/signoffs added, with the 
subject line changed to '[PATCH] ...' and all people Cc:-ed.

Thanks,

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-21 Thread Ingo Molnar

* Suresh Siddha suresh.b.sid...@intel.com wrote:

 On Tue, 2009-10-20 at 13:35 -0700, Thomas Schlichter wrote:
  What do you think about the latest version of my patch series I just sent?
 
 I think we can simplify this by just using mtrr_add_page() and avoid 
 all the complexity that comes with mtrr_add_unaligned().
 
 pci_mmap_range() should call one mtrr_add_page() if the base and size 
 are nicely aligned. Almost all the cases, this is the usage sequence 
 here anyway.

yep, keeping it simple is a must.

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-21 Thread Ingo Molnar

* Thomas Schlichter thomas.schlich...@web.de wrote:

 Jan Beulich wrote:

  Functionality-wise this looks fine to me; whether the core sysfs 
  changes are acceptable I can't judge, though.
 
 OK, I think I should have addressed your comments. Unfortunately I had 
 to use a little hack to make pci_mmap_page_range() work for sysfs 
 and proc. I placed a private pointer in the beginning of both 
 per-file private structures. So this pointer can be accessed 
 independent from the caller. I hope this is acceptable.
 
 I dropped the ioremap() and set_memory_wc() patches, I could not 
 implement reference counting for them and it may interact too much 
 with existing GPU drivers.
 
 Again, this series should not change the current behavior if either 
 MTRR is disabled or PAT is enabled. But it helps in the case that MTRR 
 is enabled and PAT is not available.
 
 What should be done now to get this series on the right track?

Can we eliminate mtrr_add_unaligned() as Suresh suggested, and still 
make it work on your testbox?

Once that's done i'll look at putting it into the x86 tree for testing. 
The acks of Suresh/Venki/Jan would be nice to have.

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-21 Thread Ingo Molnar

* Jan Beulich jbeul...@novell.com wrote:

  So if it's OK for Jan, I'll reduce the functionality again and use 
  mtrr_add() instead. Btw. this simply means to drop 
  mtrr_add_unaligned(), all the other parts are still required for 
  reference counting and a proper mtrr_del() on file close.
 
 I'm perfectly fine with just handling the aligned case, as long as the 
 code checks that the alignment constraints are met.

It could even emit a debug warning if they are not met - that way we'll 
see how important the unaligned case is.

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-19 Thread Ingo Molnar

* Suresh Siddha suresh.b.sid...@intel.com wrote:

 On Mon, 2009-10-19 at 02:16 -0700, Jan Beulich wrote:
  Functionality-wise this looks fine to me
 
 If we are going to make ioremap() and set_memory_wc() add mtrr's in 
 non-pat case, then we need to delete the added mtrr(s) in the 
 corresponding iounmap() and set_memory_wb() aswell.
 
 hmm, this is becoming too complex. The way i915 and other graphics 
 drivers are using set_memory_wc(), it is def a bad idea to start 
 adding mtrr's behind the back for non-pat case.

Touching MTRRs beyond working around basic bugs like non-cached RAM 
sounds like madness. The interactions with PAT are ... countless.

 Can't we just force PAT option always and we probably don't care about 
 ioremap_wc() on processors were PAT doesn't get enabled because of 
 known errata.

We can make PAT configurability dependent on EMBEDDED-y - mind sending a 
patch for that?

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-19 Thread Ingo Molnar

* Thomas Schlichter thomas.schlich...@web.de wrote:

  We can make PAT configurability dependent on EMBEDDED-y - mind 
  sending a patch for that?
 
 I think there already is such a patch in -tip: 
 http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commitdiff;h=c03cb3149daed3e411657e3212d05ae27cf1a874

Yes :-)

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [RFC Patch] use MTRR for write combining if PAT is not available

2009-10-19 Thread Ingo Molnar

* Thomas Schlichter thomas.schlich...@web.de wrote:

 I don't think this is a good idea, Robert Hancock wrote there may be 
 millions of such Laptops (Core Solo/Duo erratum AE7, Pentium M erratum 
 Y31) :

   http://marc.info/?l=linux-kernelm=125537136105246w=2
 
  or Perhaps just try to add mtrr only for the pci mmap case like the 
  4th patch in this series..
 
 I'd prefer this! ;-)

Hm, we could perhaps do that - but i think we should only do that on 
systems that have PAT disabled.

Which brings up the question of how to properly QA such a narrow segment 
of the market. Maybe disabling CONFIG_X86_PAT should enable that logic 
too.

Ingo

--
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.

2009-09-22 Thread Ingo Molnar

* Dave Airlie airl...@linux.ie wrote:

  there's a new build failure:
  
  drivers/built-in.o: In function `drm_irq_uninstall':
  (.text+0xb719e): undefined reference to `vga_client_register'
  drivers/built-in.o: In function `drm_irq_install':
  (.text+0xb7309): undefined reference to `vga_client_register'
  drivers/built-in.o: In function `radeon_device_fini':
  (.text+0xe400f): undefined reference to `vga_client_register'
  drivers/built-in.o: In function `radeon_device_init':
  (.text+0xe455b): undefined reference to `vga_client_register'
  
  with the attached config, introduced with upstream merge 44040f1.
  
  At first sight it appears to be due to CONFIG_DRM_RADEON relying on 
  VGA_ARB facilities but this is not expressed in the Kconfig rules. The 
  patch below solves this - but this is just a quick patch, i have not 
  investigated any deeper.
  
  Review of the code suggests that i915 has a similar dependency problem - 
  i fixed that too.
 
 The way it should work is VGA ARB should be enabled on any platforms we
 have PCI unless EMBEDDED turns it off, since arbitration of VGA isn't 
 reliant on a drm device, I'm not sure what Kconfig magic this would 
 require, and where it would need to be. This patch should at least allow
 builds to work until I figure out any Kconfig magic.
 
 From 8a874578cbf8b07b988e666c15fa0ba767f3c1cb Mon Sep 17 00:00:00 2001
 From: Dave Airlie airl...@redhat.com
 Date: Tue, 22 Sep 2009 13:53:00 +1000
 Subject: [PATCH] vgaarb: wrap the client register API so we can disable VGA 
 ARB.
 
 This provides an dummy register function so everything builds
 if VGA arb is turned off.
 
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  include/linux/vgaarb.h |   11 ++-
  1 files changed, 10 insertions(+), 1 deletions(-)
 
 diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
 index e81c64a..b0feb79 100644
 --- a/include/linux/vgaarb.h
 +++ b/include/linux/vgaarb.h
 @@ -41,7 +41,7 @@
   * interrupts at any time.
   */
  extern void vga_set_legacy_decoding(struct pci_dev *pdev,
 - 
 unsigned int decodes);
 + unsigned int decodes);
  
  /**
   * vga_get - acquire  locks VGA resources
 @@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, 
 struct pci_dev *p2)
   * They driver will get a callback when VGA arbitration is first used
   * by userspace since we some older X servers have issues.
   */
 +#if defined(CONFIG_VGA_ARB)
  int vga_client_register(struct pci_dev *pdev, void *cookie,
   void (*irq_set_state)(void *cookie, bool state),
   unsigned int (*set_vga_decode)(void *cookie, bool 
 state));
 +#else
 +static inline int vga_client_register(struct pci_dev *pdev, void *cookie,
 +   void (*irq_set_state)(void *cookie, bool 
 state),
 +   unsigned int (*set_vga_decode)(void 
 *cookie, bool state));
 +{
 + return 0;
 +}
 +#endif

Yeah - making APIs config invariant is a good idea in any case, 
regardless of Kconfig magic.

Thanks,

Ingo

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.

2009-09-22 Thread Ingo Molnar

* Dave Airlie airl...@linux.ie wrote:

 From 8a874578cbf8b07b988e666c15fa0ba767f3c1cb Mon Sep 17 00:00:00 2001
 From: Dave Airlie airl...@redhat.com
 Date: Tue, 22 Sep 2009 13:53:00 +1000
 Subject: [PATCH] vgaarb: wrap the client register API so we can disable VGA 
 ARB.
 
 This provides an dummy register function so everything builds
 if VGA arb is turned off.
 
 Signed-off-by: Dave Airlie airl...@redhat.com
 ---
  include/linux/vgaarb.h |   11 ++-
  1 files changed, 10 insertions(+), 1 deletions(-)

btw., i hope you have not commited this yet, there's a trivial problem 
with this patch:

 diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
 index e81c64a..b0feb79 100644
 --- a/include/linux/vgaarb.h
 +++ b/include/linux/vgaarb.h
 @@ -41,7 +41,7 @@
   * interrupts at any time.
   */
  extern void vga_set_legacy_decoding(struct pci_dev *pdev,
 - 
 unsigned int decodes);
 + unsigned int decodes);
  
  /**
   * vga_get - acquire  locks VGA resources
 @@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, 
 struct pci_dev *p2)
   * They driver will get a callback when VGA arbitration is first used
   * by userspace since we some older X servers have issues.
   */
 +#if defined(CONFIG_VGA_ARB)
  int vga_client_register(struct pci_dev *pdev, void *cookie,
   void (*irq_set_state)(void *cookie, bool state),
   unsigned int (*set_vga_decode)(void *cookie, bool 
 state));
 +#else
 +static inline int vga_client_register(struct pci_dev *pdev, void *cookie,
 +   void (*irq_set_state)(void *cookie, bool 
 state),
 +   unsigned int (*set_vga_decode)(void 
 *cookie, bool state));
 +{
 + return 0;
 +}
 +#endif

that ';' in the inline prototype is bogus, this wont build.

Ingo

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [origin tree build failure] [PATCH] Re: [git pull] drm tree.

2009-09-22 Thread Ingo Molnar

here's a patch that works for me.

Ingo

Signed-off-by: Ingo Molnar mi...@elte.hu
---
 include/linux/vgaarb.h |   11 ++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
index e81c64a..923f904 100644
--- a/include/linux/vgaarb.h
+++ b/include/linux/vgaarb.h
@@ -41,7 +41,7 @@
  * interrupts at any time.
  */
 extern void vga_set_legacy_decoding(struct pci_dev *pdev,
-   
unsigned int decodes);
+   unsigned int decodes);
 
 /**
  * vga_get - acquire  locks VGA resources
@@ -193,8 +193,17 @@ static inline int vga_conflicts(struct pci_dev *p1, struct 
pci_dev *p2)
  * They driver will get a callback when VGA arbitration is first used
  * by userspace since we some older X servers have issues.
  */
+#if defined(CONFIG_VGA_ARB)
 int vga_client_register(struct pci_dev *pdev, void *cookie,
void (*irq_set_state)(void *cookie, bool state),
unsigned int (*set_vga_decode)(void *cookie, bool 
state));
+#else
+static inline int vga_client_register(struct pci_dev *pdev, void *cookie,
+ void (*irq_set_state)(void *cookie, bool 
state),
+ unsigned int (*set_vga_decode)(void 
*cookie, bool state))
+{
+   return 0;
+}
+#endif
 
 #endif /* LINUX_VGA_H */

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


[PATCH] drm: Fix build failure in radeon and i915 drivers

2009-09-21 Thread Ingo Molnar

* Ingo Molnar mi...@elte.hu wrote:

 there's a new build failure:
 
 drivers/built-in.o: In function `drm_irq_uninstall':
 (.text+0xb719e): undefined reference to `vga_client_register'
 drivers/built-in.o: In function `drm_irq_install':
 (.text+0xb7309): undefined reference to `vga_client_register'
 drivers/built-in.o: In function `radeon_device_fini':
 (.text+0xe400f): undefined reference to `vga_client_register'
 drivers/built-in.o: In function `radeon_device_init':
 (.text+0xe455b): undefined reference to `vga_client_register'
 
 with the attached config, introduced with upstream merge 44040f1.
 
 At first sight it appears to be due to CONFIG_DRM_RADEON relying on 
 VGA_ARB facilities but this is not expressed in the Kconfig rules. The 
 patch below solves this - but this is just a quick patch, i have not 
 investigated any deeper.
 
 Review of the code suggests that i915 has a similar dependency problem 
 - i fixed that too.

i've looked some more and drm_irq.o depends on vga-arb too so the patch 
below is the more complete fix IMHO.

Ingo

--
From 0186c202fefd70291ef3b29e34543083d24f026d Mon Sep 17 00:00:00 2001
From: Ingo Molnar mi...@elte.hu
Date: Mon, 21 Sep 2009 18:12:07 +0200
Subject: [PATCH] drm: Fix build failure in radeon and i915 drivers

this build failure:

drivers/built-in.o: In function `drm_irq_uninstall':
(.text+0xb719e): undefined reference to `vga_client_register'
drivers/built-in.o: In function `drm_irq_install':
(.text+0xb7309): undefined reference to `vga_client_register'
drivers/built-in.o: In function `radeon_device_fini':
(.text+0xe400f): undefined reference to `vga_client_register'
drivers/built-in.o: In function `radeon_device_init':
(.text+0xe455b): undefined reference to `vga_client_register'

got introduced with upstream merge 44040f1.

At first sight it appears to be due to CONFIG_DRM_RADEON relying on
VGA_ARB facilities but this is not expressed in the Kconfig rules.

drm_irq.o relies on it too - so the whole DRM facilities relies
on VGA_ARB. I've added a select to express this dependency.

VGA_ARB is not a simple option, it depends on PCI, so in theory
select is not safe - but this is a special case since DRM itself
depends on PCI too.

Cc: torva...@linux-foundation.org
Cc: dri-de...@lists.sf.net
Cc: Dave Airlie airl...@linux.ie
LKML-Reference: 20090921161207.ga9...@elte.hu
Signed-off-by: Ingo Molnar mi...@elte.hu
---
 drivers/gpu/drm/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index e4d971c..6c54422 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -9,6 +9,7 @@ menuconfig DRM
depends on (AGP || AGP=n)  PCI  !EMULATED_CMPXCHG  MMU
select I2C
select I2C_ALGOBIT
+   select VGA_ARB
help
  Kernel-level support for the Direct Rendering Infrastructure (DRI)
  introduced in XFree86 4.0. If you say Y here, you need to select

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases

2009-08-03 Thread Ingo Molnar

* Thomas Hellstrom thellst...@vmware.com wrote:

 Bug #13884
 The code was incorrectly reserving memtypes using the page
 virtual address instead of the physical address. Furthermore,
 the code was not ignoring highmem pages as it ought to.
 
 Signed-off-by: Thomas Hellstrom thellst...@vmware.com
 ---
  arch/x86/mm/pageattr.c |   30 +-
  1 files changed, 21 insertions(+), 9 deletions(-)
 
 diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
 index d4327db..96dd4f6 100644
 --- a/arch/x86/mm/pageattr.c
 +++ b/arch/x86/mm/pageattr.c
 @@ -590,9 +590,12 @@ static int __change_page_attr(struct cpa_data *cpa, int 
 primary)
   unsigned int level;
   pte_t *kpte, old_pte;
  
 - if (cpa-flags  CPA_PAGES_ARRAY)
 - address = (unsigned long)page_address(cpa-pages[cpa-curpage]);
 - else if (cpa-flags  CPA_ARRAY)
 + if (cpa-flags  CPA_PAGES_ARRAY) {
 + struct page *page = cpa-pages[cpa-curpage];
 + if (unlikely(PageHighMem(page)))
 + return 0;
 + address = (unsigned long)page_address(page);
 + } else if (cpa-flags  CPA_ARRAY)
   address = cpa-vaddr[cpa-curpage];

hm, i'm missing a description about how this bug was 
triggered. How did you end up getting highmem pages to a cpa 
call?

   else
   address = *cpa-vaddr;
 @@ -696,9 +699,12 @@ static int cpa_process_alias(struct cpa_data *cpa)
* No need to redo, when the primary call touched the direct
* mapping already:
*/
 - if (cpa-flags  CPA_PAGES_ARRAY)
 - vaddr = (unsigned long)page_address(cpa-pages[cpa-curpage]);
 - else if (cpa-flags  CPA_ARRAY)
 + if (cpa-flags  CPA_PAGES_ARRAY) {
 + struct page *page = cpa-pages[cpa-curpage];
 + if (unlikely(PageHighMem(page)))
 + return 0;
 + vaddr = (unsigned long)page_address(page);
 + } else if (cpa-flags  CPA_ARRAY)
   vaddr = cpa-vaddr[cpa-curpage];
   else
   vaddr = *cpa-vaddr;
 @@ -1118,7 +1124,9 @@ int set_pages_array_uc(struct page **pages, int 
 addrinarray)
   int free_idx;
  
   for (i = 0; i  addrinarray; i++) {
 - start = (unsigned long)page_address(pages[i]);
 + if (PageHighMem(pages[i]))
 + continue;
 + start = page_to_pfn(pages[i])  PAGE_SHIFT;
   end = start + PAGE_SIZE;
   if (reserve_memtype(start, end, _PAGE_CACHE_UC_MINUS, NULL))
   goto err_out;

ok, that's a bug introduced in .29 but which was latent until now: 
drivers/char/agp/generic.c now uses it plus (indirectly) a number of 
AGP drivers, since:

 commit 07613ba2f464f59949266f4337b75b91eb610795
 Author: Dave Airlie airl...@redhat.com
 Date:   Fri Jun 12 14:11:41 2009 +1000

 agp: switch AGP to use page array instead of unsigned long array

I dont see how it can end up with highmem pages though. All 
the graphics apperture allocations happen to lowmem AFAICS. 
Did GEM add the possibility for user pages (highmem amongst 
them) ending up in that pool? Which code does that?

 @@ -1131,7 +1139,9 @@ int set_pages_array_uc(struct page **pages, int 
 addrinarray)
  err_out:
   free_idx = i;
   for (i = 0; i  free_idx; i++) {
 - start = (unsigned long)page_address(pages[i]);
 + if (PageHighMem(pages[i]))
 + continue;
 + start = page_to_pfn(pages[i])  PAGE_SHIFT;
   end = start + PAGE_SIZE;
   free_memtype(start, end);
   }
 @@ -1160,7 +1170,9 @@ int set_pages_array_wb(struct page **pages, int 
 addrinarray)
   return retval;
  
   for (i = 0; i  addrinarray; i++) {
 - start = (unsigned long)page_address(pages[i]);
 + if (PageHighMem(pages[i]))
 + continue;
 + start = page_to_pfn(pages[i])  PAGE_SHIFT;
   end = start + PAGE_SIZE;
   free_memtype(start, end);

In any case it's a must-have fix for .31. Possibly even a backport 
tag is needed, in case a distro does a .30 kernel with more recent 
graphics bits.

Ingo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases

2009-08-03 Thread Ingo Molnar

* Dave Airlie airl...@linux.ie wrote:

  hm, i'm missing a description about how this bug was triggered. 
  How did you end up getting highmem pages to a cpa call?
 
 GEM and TTM both allocate page arrays and just pass them to cpa, 
 we don't know what type of pages the allocator gives us back and 
 we really shouldn't have to, so having cpa ignore highmem pages is 
 certainly the right option.
 
 GEM just uses shmem code to alloc the pages and TTM has its own 
 allocator.

Neither of my questions was answered though: do highmem pages ever 
get passed in and what were the effects of the bug and how was it 
noticed?

If no highmem pages can be passed in then the right solution is not 
to ignore them silently but to add a WARN_ONCE() to enforce that 
they are not used in the future either.

If they are used today then i'd like to know where exactly and 
double check that all the cache attributes are consistent: as a 
highmem page might be visible to user-space as well or might be 
mapped via the vmalloc space, etc.

And yes, if it happens and all the other mapping aliases are fine 
then ignoring them is the right solution.

Ingo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases

2009-08-03 Thread Ingo Molnar

* Thomas Hellstrom thellst...@vmware.com wrote:

 Dave Airlie wrote:
 hm, i'm missing a description about how this bug was triggered. How 
 did you end up getting highmem pages to a cpa call?
 

 GEM and TTM both allocate page arrays and just pass them to cpa,
 we don't know what type of pages the allocator gives us back and we 
 really shouldn't have to, so having cpa ignore highmem pages is 
 certainly the right option.

 GEM just uses shmem code to alloc the pages and TTM has its own allocator.

   
 Yes, Dave is right.

 Although I'm not 100% sure the TTM code I was using that triggered this  
 has made it into 2.6.31.
 Old AGP uses (GFP_KERNEL | GFP_DMA32 | __GFP_ZERO), which (correct me if  
 I'm wrong) never hands back highmem pages. This means that Intel's GEM  
 is the only likely user for 2.6.31.

(please dont top-post) I'm not sure you folks noticed this bit of my 
mail:

 ok, that's a bug introduced in .29 but which was latent until now:  
 drivers/char/agp/generic.c now uses it plus (indirectly) a number of  
 AGP drivers, since:

  commit 07613ba2f464f59949266f4337b75b91eb610795
  Author: Dave Airlie airl...@redhat.com
  Date:   Fri Jun 12 14:11:41 2009 +1000

  agp: switch AGP to use page array instead of unsigned long array

 I dont see how it can end up with highmem pages though. All the 
 graphics apperture allocations happen to lowmem AFAICS. Did GEM add 
 the possibility for user pages (highmem amongst them) ending up in 
 that pool? Which code does that?

Ingo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [PATCH] x86: Fix CPA memtype reserving in the set_pages_array cases

2009-08-03 Thread Ingo Molnar

* Thomas Hellstrom thellst...@vmware.com wrote:

 Ingo Molnar wrote:
 * Thomas Hellstrom thellst...@vmware.com wrote:

   
 Dave Airlie wrote:
 
 hm, i'm missing a description about how this bug was triggered. 
 How did you end up getting highmem pages to a cpa call?
 
 GEM and TTM both allocate page arrays and just pass them to cpa,
 we don't know what type of pages the allocator gives us back and we 
 really shouldn't have to, so having cpa ignore highmem pages is  
 certainly the right option.

 GEM just uses shmem code to alloc the pages and TTM has its own allocator.

 
 Yes, Dave is right.

 Although I'm not 100% sure the TTM code I was using that triggered 
 this  has made it into 2.6.31.
 Old AGP uses (GFP_KERNEL | GFP_DMA32 | __GFP_ZERO), which (correct me 
 if  I'm wrong) never hands back highmem pages. This means that 
 Intel's GEM  is the only likely user for 2.6.31.
 

 (please dont top-post) I'm not sure you folks noticed this bit of my  
 mail:

   
 ok, that's a bug introduced in .29 but which was latent until 
 now:  drivers/char/agp/generic.c now uses it plus (indirectly) a 
 number of  AGP drivers, since:

  commit 07613ba2f464f59949266f4337b75b91eb610795
  Author: Dave Airlie airl...@redhat.com
  Date:   Fri Jun 12 14:11:41 2009 +1000

  agp: switch AGP to use page array instead of unsigned long array

 I dont see how it can end up with highmem pages though. All the  
 graphics apperture allocations happen to lowmem AFAICS. Did GEM 
 add the possibility for user pages (highmem amongst them) ending 
 up in that pool? Which code does that?
 

  Ingo
   
 Sorry, my bad.

 The TTM code I tested is not in yet, and after double-checking it 
 looks like Intel's gem is not changing caching policy before 
 binding to AGP.

 This means the highmem problems that I saw were triggered by a 
 combination of the virtual-physical bugfix and code that's not in 
 the kernel yet, and since it's an optimization of the current code 
 it's not likely to land in 2.6.31. The highmem fixes could thus, 
 AFAICT be stripped out of the patch, unless GFP_DMA32 on a highmem 
 system can actually hand back highmem pages, in which case AGP 
 will not work correctly.

 As for highmem use in the future, the TTM page arrays are 
 populated using fault(), which means that there will be an 
 overhead ordering the pages so that we can use the 
 set_pages_array() interface instead of set_memory() that we use 
 today. Therefore, if possible, I'd prefer if we could pass arrays 
 containing highmem pages to the set_pages_array() interface.

 There are no aliased mappings since

 1) Any user space mappings to these pages are killed before changing  
caching policy.
 2) The pages are allocated and owned by the driver.
 3) kmap_atomic_prot() and vmap() are used to map these pages in kernel  
 space.

 Code is in ttm_tt.c ttm_bo.c and ttm_bo_util.c

ok - thanks for the explanation. Since you intentionally want to use 
highmem pages (and your use is safe) i concur with your original 
patch in its entirety - even if that planned highmem use is not 
upstream yet. Will get it .31-wards ASAP.

Ingo

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.6.30-rc6: Reported regressions from 2.6.29

2009-05-18 Thread Ingo Molnar

* Oleg Nesterov o...@redhat.com wrote:

 On 05/17, Ingo Molnar wrote:
 
   Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13107
   Subject   : LTP 20080131 causes defunct processes w/2.6.30-rc1
   Submitter : Kumar Gala ga...@kernel.crashing.org
   Date  : 2009-04-09 15:43 (38 days old)
   First-Bad-Commit: 
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b3bfa0cba867f23365b81658b47efd906830879b
   References: http://marc.info/?l=linux-kernelm=123929187208953w=4
   http://lkml.org/lkml/2009/4/10/193
   Handled-By: Sukadev Bhattiprolu suka...@linux.vnet.ibm.com
 
  Oleg says in that thread that it's as-designed, and followup
  questions were not replied to (yet).
 
 Yes, I think this is false alarm. Perhaps I missed something, and 
 I am waiting for more info from Kumar, but it looks like ltp was 
 already changed to skip the { PTRACE_ATTACH, 1, EPERM } test on 
 kernels after 2.6.25

Btw., why did the patch (and the revert) make any difference to the 
test? Timing differences look improbable.

Ingo

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.6.30-rc6: Reported regressions from 2.6.29

2009-05-17 Thread Ingo Molnar

* Rafael J. Wysocki r...@sisk.pl wrote:

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13325
 Subject   : 2.6.30-rc kills my box hard - and lockdep chains
 Submitter : Jonathan Corbet cor...@lwn.net
 Date  : 2009-05-14 15:49 (3 days old)
 References: http://marc.info/?l=linux-kernelm=124231630701394w=4

Jonathan, there's a side-issue reported there, us running out of 
lockdep space. Could you try this commit from -tip:

d80c19d: lockdep: increase MAX_LOCKDEP_ENTRIES and MAX_LOCKDEP_CHAINS

(which i'll get to Linus in the next ~24 hours.) Maybe that allows 
lockdep to report the reason for the deadlock.

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13321
 Subject   : kernel crash with NULL pointer when boot
 Submitter : Martin Bammer mr...@gmx.at
 Date  : 2009-05-16 12:37 (1 days old)
 References: http://lkml.org/lkml/2009/5/16/100

that crash is in reiserfs_for_each_xattr(), during sys_unlink()'s 
xattr teardown.

There's been a good deal of reiserfs changes in this cycle - some 
touch the xattr code as well. Some of them fairly late in the cycle, 
in the last two weeks:

 earth4:~/tip gll v2.6.29..linus --since=two-weeks-ago fs/reiserfs/
 2a32ceb: Fix races around the access to -s_options
 677c9b2: reiserfs: remove privroot hiding in lookup
 b82bb72: reiserfs: dont associate security.* with xattr files
 ab17c4f: reiserfs: fixup xattr_root caching
 edcc37a: Always lookup priv_root on reiserfs mount and keep it
 5a6059c: reiserfs: Expand i_mutex to enclose lookup_one_len

Martin, you could try a blind revert of say ... ab17c4f, which looks 
the most suspect and which is also a rather large commit.

Or/and you could try a bisect - perhaps accelerated via:

   git bisect start fs/reiserfs/

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13297
 Subject   : kernel panic -  not syncing : fatel exception in 
 interupt
 Submitter : rob r...@housetosell.net
 Date  : 2009-05-12 19:34 (5 days old)
 References: http://marc.info/?l=linux-kernelm=124216126903309w=4

tainted crash, but probably legit. It does show some badness in an 
old-IDE legacy codepath:

[c0371865] error_code+0x65/0x6c
[c0110155] do_page_fault+0x0/0x1e0
[c027dafc] ide_complete_rq+0xf/0x3b
[c02870a0] cdrom_newpc_intr+0x64d/0x6cd
[c0286a53] cdrom_newpc_intr+0x0/0x6cd
[c027dcc2] ide_intr+0x109/0x161
[c0132298] handle_IRQ_event+0x54/0xc7
[c013354a] handle_level_irq+0x4f/0x85
[c0103df7] handle_irq+0x17/0x20
[c0103da5] do_IRQ+0x2b/0x66
[c0102be9] common_interupt+0x29/0x30
[c048] cmd40x_init+0x2ac/0x38d
[c0106db3] default_idle+0x25/0x38
[c01019be] cpu_idle+0x19/0x2d
[c0468907] start_kernel+0x23f/0x242

report subject line is too unspecific, it should be changed to 
something like:

  legacy IDE cmd40x related bootup crash

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13296
 Subject   : Lockdep violation at cleanup_workqueue_thread during 
 suspend
 Submitter : Zdenek Kabelac zdenek.kabe...@gmail.com
 Date  : 2009-05-12 7:59 (5 days old)
 References: http://marc.info/?l=linux-kernelm=124211522525625w=4

looks like wireless related - the dependency that connects the 
locks in a wrong way appears to be:

 - #2 (cfg80211_mutex){+.+.+.}:
[80271a64] __lock_acquire+0xc64/0x10a0
[80271f38] lock_acquire+0x98/0x140
[8054e78c] __mutex_lock_common+0x4c/0x3b0
[8054ebf6] mutex_lock_nested+0x46/0x60
[a007e66a] reg_todo+0x19a/0x590 [cfg80211]
[80258f18] worker_thread+0x1e8/0x3a0
[8025dc3a] kthread+0x5a/0xa0
[8020d23a] child_rip+0xa/0x20

(havent checked deeper)

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13245
 Subject   : possible circular locking dependency detected
 Submitter : Miles Lane miles.l...@gmail.com
 Date  : 2009-05-04 16:56 (13 days old)

same as #13296 above. (The one above should be merged into this one 
i guess)

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13126
 Subject   : BUG: MAX_LOCKDEP_ENTRIES too low! when mounting rootfs
 Submitter : Alexander Beregalov a.berega...@gmail.com
 Date  : 2009-04-15 12:43 (32 days old)
 References: http://marc.info/?l=linux-kernelm=123979949820538w=4

should be resolved via the lockdep space extension fix.

 Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13118
 Subject   : iptables very slow after commit 
 784544739a25c30637397ace5489eeb6e15d7d49
 Submitter : Jeff Chua jeff.chua.li...@gmail.com
 Date  : 2009-04-10 16:05 (37 days old)
 References: http://lkml.org/lkml/2009/4/10/111
 http://lkml.org/lkml/2009/4/25/83
 Handled-By: Eric Dumazet da...@cosmosbay.com

solved by:

 commit 942e4a2bd680c606af0211e64eb216be2e19bf61
 Author: Stephen Hemminger shemmin...@vyatta.com
 Date:   

Re: [TIP,regression,i915] /dev/dri/card0 is no longer present

2009-03-02 Thread Ingo Molnar

* Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote:

 On Mon, 2009-03-02 at 13:34 -0800, Ingo Molnar wrote:
  * Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote:
  
   On Sun, 2009-03-01 at 05:48 -0800, Sitsofe Wheeler wrote:
On Sun, Mar 01, 2009 at 12:39:48PM +0100, Ingo Molnar wrote:

 Thanks, i've reverted the commit for now. Could you please send
 a 'dmesg' from the bootup with the failed driver? By all
 likelyhood an ioremap failure causes a driver failure.


See below (drm says it initalized though...):

[0.00] Linux version 2.6.29-rc6-6-g17581ad (@verona) (gcc 
version 4.2.4) #87 Sun Mar 1 13:42:28 GMT 2009
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00] PAT WC disabled due to known CPU erratum.
   
   Looks like PAT is getting disabled on this platform. Can you 
   send the output of /proc/cpuinfo please.
  
  Does this mean we failed an ioremap()? We still should not fail 
  a device ioremap _ever_. We should just fall back to UC and be 
  done with it. We may emit a warning if we consider it 
  troublesome, but we should never ever break working drivers 
  really.
  
 
 No. We don't break ioremap. We try to fallback on UC_MINUS 
 here. But, there is something wrong happening in that path, 
 that is resulting in this issue. I am looking at that right 
 now.

ah, ok.

 I have found another problem with this particular patch, on 
 one of test systems. So, let us keep the revert until I fix 
 what looks like two different problems here.

ok.

Ingo

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [TIP,regression,i915] /dev/dri/card0 is no longer present

2009-03-02 Thread Ingo Molnar

* Pallipadi, Venkatesh venkatesh.pallip...@intel.com wrote:

 On Sun, 2009-03-01 at 05:48 -0800, Sitsofe Wheeler wrote:
  On Sun, Mar 01, 2009 at 12:39:48PM +0100, Ingo Molnar wrote:
  
   Thanks, i've reverted the commit for now. Could you please send
   a 'dmesg' from the bootup with the failed driver? By all
   likelyhood an ioremap failure causes a driver failure.
  
  
  See below (drm says it initalized though...):
  
  [0.00] Linux version 2.6.29-rc6-6-g17581ad (@verona) (gcc 
  version 4.2.4) #87 Sun Mar 1 13:42:28 GMT 2009
  [0.00] KERNEL supported cpus:
  [0.00]   Intel GenuineIntel
  [0.00] PAT WC disabled due to known CPU erratum.
 
 Looks like PAT is getting disabled on this platform. Can you 
 send the output of /proc/cpuinfo please.

Does this mean we failed an ioremap()? We still should not fail 
a device ioremap _ever_. We should just fall back to UC and be 
done with it. We may emit a warning if we consider it 
troublesome, but we should never ever break working drivers 
really.

Ingo

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [TIP,regression,i915] /dev/dri/card0 is no longer present

2009-03-01 Thread Ingo Molnar

* Sitsofe Wheeler sits...@yahoo.com wrote:

 Hi,
 
 With the latest -tip /dev/dri/card0 is MIA on my EeePC 900. A bisection
 has tracked the probem down to the following commit:
 
 commit 17581ad812a9abb0182260374ef2e52d4a808a64
 Author: Venkatesh Pallipadi venkatesh.pallip...@intel.com
 Date:   Tue Feb 24 17:35:14 2009 -0800
 
 gpu/drm, x86, PAT: PAT support for io_mapping_*
 
 Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
 that there are no memory type aliases.
 
 Signed-off-by: Venkatesh Pallipadi venkatesh.pallip...@intel.com
 Signed-off-by: Suresh Siddha suresh.b.sid...@intel.com
 Cc: Dave Airlie airl...@redhat.com
 Cc: Jesse Barnes jbar...@virtuousgeek.org
 Cc: Eric Anholt e...@anholt.net
 Cc: Keith Packard kei...@keithp.com
 Signed-off-by: Ingo Molnar mi...@elte.hu
 
 Reverting this made /dev/dri/card0 reappear. Here is the 
 bisection log:

Thanks, i've reverted the commit for now. Could you please send 
a 'dmesg' from the bootup with the failed driver? By all 
likelyhood an ioremap failure causes a driver failure.

Ingo

--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Adding kmap_atomic_prot_pfn

2008-10-24 Thread Ingo Molnar

* Thomas Hellström [EMAIL PROTECTED] wrote:

 Keith,

 What you actually are doing here is claiming copyright on code that 
 other people have written, and tighten the export restrictions.  
 kmap_atomic_prot_pfn() appeared long ago in drm git with identical 
 code and purpose, but with different authors, and iounmap_atomic is 
 identical to kunmap_atomic.

 +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);

you want to use this facility in a binary-only driver?

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Adding kmap_atomic_prot_pfn (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-24 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 On Thu, 23 Oct 2008, Keith Packard wrote:
  
   So I'd much rather create a new linux/kmap.h or something. Or just 
   expose this from to asm/fixmap.h or something. Let's not confuse this 
   with highmem, even if the implementation _historically_ was due to that.
  
  Sure, we readily admit that we're abusing the highmem API. So we wrapped
  that abusive code in a pretty package that can be re-implemented however
  you desire.
  
  How (and when) would you like to see this fixed?
 
 I'm not entirely sure who wants to own up to owning that particular 
 part of code, and is willing to make kmap_atomic_prot_pfn() also work 
 in the absense of CONFIG_HIGHMEM.

yeah, that would be the x86 maintainers. (whoever they are)

 I suspect it's Ingo, but I also suspect that he'd like to see a tested 
 patch by somebody who actually _uses_ this code.

yeah. I already asked Keith yesterday to send one coherent bundle of 
patches and i think we've got all the code sent already, you also pulled 
the latest DRM tree - so it all just needs sorting out and testing. 

[ I can possibly test the end result with a bleeding-edge Xorg which 
  supports the new DRI APIs. (Whether X comes up fine is a regular, 
  necessary and 'fun' component of the x86 regression testing we do 
  anyway. We also tend to notice highmem breakages promptly.) ]

 So I would suspect that if you guys actually write a patch, and make 
 sure that it works on x86-32 even _without_ CONFIG_HIGHMEM, and send 
 it to Ingo, good things will happen.
 
 As to the without CONFIG_HIGHMEM part: making all of this work even 
 without HIGHMEM should literally be a matter of:
 
  - remove the '#ifdef CONFIG_HIGHMEM' in asm/fixmap_32.h, so that we 
have fixmap entries for the temporary kernel mappings regardless (ie 
FIX_KMAP_BEGIN and FIX_KMAP_END).
 
  - move the code for the atomic accesses that use those fixmap entries 
into a file that gets compiled regardless of CONFIG_HIGHMEM. Or at 
least just kmap_atomic_prot_pfn() and kunmap_atomic().
 
 and it probably should all work automatically. The kmap_atomic() stuff 
 really should be almost totally independent of CONFIG_HIGHMEM, since 
 it's really much more closely related to fixmaps than HIGHMEM.

yeah.

 I guess there may be some debugging code that depends on HIGHMEM (eg 
 that whole testing for whether a page is a highmem page or not), so it 
 might be a _bit_ more than just moving code around, but I really 
 didn't look closer.
 
 Then, there's the issue of 64-bit, and just mapping everything there, 
 and the interface to that. I liked the trivial extension to struct 
 resource to have a cached mapping pointer. So if we can just make 
 it pass resources around and get a page that way (and not even need 
 kmap() on 64-bit architections), that would be good.

hm, the thing that i find the weakest aspect of that (despite having 
suggested this approach) is that the structure and the granularity of 
the resource tree is really controlled by the hardware environment. We 
try to map the hardware circumstances accurately at bootstrap / device 
discovery time, and keep it rather static from that point on. (modulo 
hotplug and dynamic BAR sizing)

And this static property of the resource tree is _good_ IMO, because we 
can think about it as a hardware property - not something tweaked by the 
kernel. (the kernel does tweak it when need to be or when the hardware 
asks for it, but it's more of an exception)

That means that if a driver wants to map just a portion of a BAR, 
(because the hardware itself compresses various different pieces of 
functionality into the same BAR), this abstraction becomes a limitation 
on usage.

And it might even be _impossible_ to use the simplest form of 
resource-mem_cache facility with certain types of hardware: say there's 
a cacheable and an uncacheable window within the same BAR - and we'd map 
the whole thing as cacheable. The CPU is then entitled to (and will most 
likely) prefetch into the uncacheable region as well, causing hw 
lockups, etc. [In this thread it was claimed that S3 chips have such 
properties.]

And tweaking this abstraction to cover those cases, for the ioremap to 
not be a full mapping of the resource looks a bit hacky/limiting as 
well:

 - we'd either have to add 'size', 'offset' properties to the window we 
   cache in the struct resource. (and possibly an array. yuck.)

 - or we'd have to say dont use this facility with such quirky hardware 
   then.

 - or we'd have to say split up the struct resource into two pieces 
   artificially, when the driver starts using the resource - which 
   violates the current rather static nature of the resource tree.

Maybe i'm overcomplicating it and maybe this last option isnt all that 
bad after all: as the 'combined' resource in such cases _is_ artificial 
- and i915+ does not have such problematic BARs to begin with. (keeping 
separate BARs for the framebuffer is 

Re: Adding kmap_atomic_prot_pfn

2008-10-24 Thread Ingo Molnar

* Thomas Hellström [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 * Thomas Hellström [EMAIL PROTECTED] wrote:

   
 Keith,

 What you actually are doing here is claiming copyright on code that  
 other people have written, and tighten the export restrictions.   
 kmap_atomic_prot_pfn() appeared long ago in drm git with identical  
 code and purpose, but with different authors, and iounmap_atomic is  
 identical to kunmap_atomic.
 

   
 +EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
   

 you want to use this facility in a binary-only driver?

  Ingo
   
 At this point I have no such use for it, no.
 The original user was a MIT style licenced driver.

okay, then just to put this question to rest: i wrote the original 
32-bit highmem code ~10 years ago. I wrote the first version of fixmap 
support - in fact i coined the term. I wrote the first version of the 
atomic-kmap facility as well.

All of that code is licensed under the GPLv2. So if anyone wants to make 
any copyright claims about highmem/kmap/fixmap derivative works, 
consider it in that light.

Regarding this new API variant that Keith wrote: it would be silly and 
dangerous to export it anywhere but to in-kernel drivers. The API 
disables preemption on 32-bit and rummages deep in the guts of the 
kernel as well, uses up a precious resource (fixmap slots), etc. It's 
internal and we eventually might want to deprecate forms of it and 
concentrate on the good 64-bit performance side.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-22 Thread Ingo Molnar

* Keith Packard [EMAIL PROTECTED] wrote:

 On Mon, 2008-10-20 at 13:58 +0200, Ingo Molnar wrote:
 
  yes but note that by caching the whole mapping on 64-bit we get 
  everything we want: trivially lockless, works from any CPU, can be 
  preempted at will, and there are no ugly INVLPG flushes anywhere.
 
 I was assuming that on 64-bit, the map would be created at driver init 
 time and be left in place until the driver closed; if that's what you 
 mean by 'caching', then yes, we should cache the map.

correct.

  32-bit we should handle as well but not design for it.
 
 As long as we get kmap_atomic-like performance, and we get to simplify 
 our code, I'm up for it.

okay. So ... mind sending your io_mapping patch as a generic facility? 
It looks all good to me in its present form, except that it should live 
in include/linux/io.h, not in the drivers/gpu/drm/i915/io_reserve.h file
:-)

also, please send at least two patches, so that we can look at (and 
possibly merge) the generic facility in isolation.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm patches for 2.6.27-rc1

2008-10-20 Thread Ingo Molnar

* Keith Packard [EMAIL PROTECTED] wrote:

 On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
 
  the _real_ remapping in a graphics aperture happens on the GPU level 
  anyway, you manage an in-RAM GPU pagetable that just works like an 
  IOMMU, correct?
 
 Yes, a one-level linear MMU which uses BIOS-reserved memory.
 
 So, at least for a prototype, on 64-bit we can just use ioremap_wc and 
 hold the mapping while the driver is open? Is there any huge benefit 
 to using the kernel mapping?

correct. There's no benefit to using the kernel mapping - and we can add 
2MB pages support to ioremap just fine and then you'll benefit from it 
automatically.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-20 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 very nice!
 
 I think we need a somewhat different abstraction though.
 
 Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to 
 move to generic code.
 
 Secondly, wouldnt the right abstraction be to attach this 
 functionality to 'struct resource' ? [or at least create a second 
 struct that embedds struct resource]
 
 this abstraction is definitely not a PCI thing and not a 
 detached-from-everything thing, it's an IO resource thing. We could 
 make it a property of struct resource:
 
 struct resource {
 resource_size_t start;
 resource_size_t end;
 const char *name;
 unsigned long flags;
 struct resource *parent, *sibling, *child;
 +   void *mapping;
 };
 
 The APIs would be:
 
   int   io_resource_init_mapping(struct resource *res);
  void   io_resource_free_mapping(struct resource *res);
  void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long 
 offset);
  void   io_resource_unmap(struct resource *res, void *kaddr);
 
 Note how simple and consistent it all gets: IO resources already know 
 their physical location and their size limits. Being able to cache an 
 ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is 
 a relatively simple and natural extension to the concept.
 
 i think that would be quite acceptable - and the APIs could just 
 transparently work on it. This would also allow the PCI code to 
 automatically unmap any cached mappings from resources, when the 
 driver deinitializes.
 
 Linus, Jesse, what do you think?

the downsize would be that we'd attach a runtime property to the 
IORESOURCE_MEM resource tree - which is a fairly static thing right now, 
after the point where we finalize the resource tree. (modulo 
device/bridge hotplug variances)

Another downside is that we might not want to map the whole thing. I.e. 
the structure of the IO memory space we want to map by drivers might be 
different from how it looks like in the resource tree.

the concept of introducing resource-mapping does not feel _that_ wrong 
though and has a couple of upsides: it could act as a natural mapping 
type serializer for example and drivers wouldnt have to explicitly 
manage ioremap results - they could just use the resource descriptor 
directly and read and write to/from it. readl/writel could be 
extended to operate on the resource descriptor transparently, getting 
rid of a source of resource mismatches and overmapping, etc. etc. We 
could even safety check IO space accesses this way.

and we'd get rid of the complication that your APIs introduced, the need 
to introduce a separate io_mapping type, etc.

Dunno, i might be missing some obvious downside why this wasnt done like 
that until now.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-20 Thread Ingo Molnar

* Keith Packard [EMAIL PROTECTED] wrote:

 On Sun, 2008-10-19 at 19:53 +0200, Ingo Molnar wrote:
 
  Note how simple and consistent it all gets: IO resources already 
  know their physical location and their size limits. Being able to 
  cache an ioremap in a mapping [and being able to use atomic kmaps on 
  32-bit] is a relatively simple and natural extension to the concept.
 
 I'm not sure I see any value in caching mappings here; we're mostly 
 interested in copying lots of data onto the card and so we use a lot 
 of different mappings; atomic mappings are easy to use, and efficient 
 enough.

yes but note that by caching the whole mapping on 64-bit we get 
everything we want: trivially lockless, works from any CPU, can be 
preempted at will, and there are no ugly INVLPG flushes anywhere.

you'll even get 2MB mappings automatically, if the BAR is aligned and 
sized correctly.

32-bit we should handle as well but not design for it.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-20 Thread Ingo Molnar

* Eric Anholt [EMAIL PROTECTED] wrote:

  The APIs would be:
  
int   io_resource_init_mapping(struct resource *res);
   void   io_resource_free_mapping(struct resource *res);
   void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long 
  offset);
   void   io_resource_unmap(struct resource *res, void *kaddr);
  
  Note how simple and consistent it all gets: IO resources already 
  know their physical location and their size limits. Being able to 
  cache an ioremap in a mapping [and being able to use atomic kmaps on 
  32-bit] is a relatively simple and natural extension to the concept.
  
  i think that would be quite acceptable - and the APIs could just 
  transparently work on it. This would also allow the PCI code to 
  automatically unmap any cached mappings from resources, when the 
  driver deinitializes.
  
  Linus, Jesse, what do you think?
  
  i think we need to finalize the API names and their abstraction 
  level, and then could even merge those APIs into v2.6.28 on a fast 
  path, to enable you to use it. It does not interact with anything 
  else so it should be safe to do.
 
 This API needs the cacheability control, which I don't see in it 
 currently. [...]

yes, these two should do the trick:

 int   io_resource_init_mapping_wc(struct resource *res);
 int   io_resource_init_mapping_wb(struct resource *res);

 Second, we need to know when we're doing a mapping whether we're 
 affected by atomic scheduling restrictions.  Right now our plan has 
 been to try doing page-by-page 
 io_map_atomic_wc()/copy_from_user_inatomic()/io_unmap_atomic(), and if 
 we fail at that at some point (map returns NULL or we get a partial 
 completion from copy_from_user_inatomic) then fall back to io_map_wc() 
 and copy_from_user() the whole thing at once.  That gets us good 
 performance on both x86 with highmem and x86-64, and not too shabby 
 performance on x86 non-highmem.

that gets ugly very fast. I think we should not use atomic kmaps but 
NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically 
atomic kmaps but without the preemption restrictions). We could 
take/drop the mutex and statistically you'll stay on the same CPU and 
wont ever contend on that lock in practice.

 Also, while it's rare, there have been graphics cards (looking at you, 
 S3) where BARs were expensive for some reason and they stuffed both 
 the framebuffer and registers into one PCI BAR, where you want the FB 
 to be WC and the registers to be UC.  Not sure if they would be 
 supportable with this API or not.  And if it's not, I'm not sure how 
 much we care to design for them, but it's something to potentially 
 consider.

yes, this is a weakness of this API - you cannot mix multiple 
cachability domains within the same BAR.

and that can happen on non-graphics as well: some storage controller 
that has regular control registers in one portion of the BAR, which all 
need to be consistently accessed via UC and properly POST-ed - while it 
could also have some large mailbox structure at the end of the BAR, 
which could be mapped both cacheable or perhaps WC.

So ... i guess we can go back to the io_mapping API proposed by Keith, 
but not make it atomic kmap based but fixmap + mutex based - for good 
32-bit performance. (and the fixmap would not be used on 64-bit at all)

 Finally, I'm confused by the pfn and offset args to io_resource_map, 
 when I expected something parallel to ioremap but with our resource 
 arg added.

ok.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-20 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 that gets ugly very fast. I think we should not use atomic kmaps but 
 NR_CPUS _fixmaps_ with a per CPU array of mutexes (this is basically 
 atomic kmaps but without the preemption restrictions). We could 
 take/drop the mutex and statistically you'll stay on the same CPU and 
 wont ever contend on that lock in practice.

peterz pointed it out that we need to be careful with the TLBs in that 
case.

I think it's solvable: a small non-default special-case in switch_to() 
would invlpg any pending such page. (and no two such mappings could be 
held at the same time)

So as long as we dont leave the CPU we wont ever do TLB flushes, and the 
flush will be local even if we do.

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


io resources and cached mappings (was: [git pull] drm patches for 2.6.27-rc1)

2008-10-19 Thread Ingo Molnar

* Keith Packard [EMAIL PROTECTED] wrote:

 On Sat, 2008-10-18 at 21:14 -0700, Keith Packard wrote:
  On Sun, 2008-10-19 at 00:32 +0200, Ingo Molnar wrote:
  
   Mind sending patches for this? :-)
 
 Here's a patch for the i915 driver that includes the new API. Tested 
 on x86_32+HIGHMEM and x86_64. I stuck a new 'io_reserve.h' header in 
 the i915 directory for this patch, but it should go elsewhere.
 
 The new APIs are:
 
 io_reserve_create_wc
 io_reserve_free
 io_reserve_map_atomic_wc
 io_reserve_unmap_atomic
 io_reserve_map_wc
 io_reserve_unmap

very nice!

I think we need a somewhat different abstraction though.

Firstly, regarding drivers/gpu/drm/i915/io_reserve.h, that needs to move 
to generic code.

Secondly, wouldnt the right abstraction be to attach this functionality 
to 'struct resource' ? [or at least create a second struct that embedds 
struct resource]

this abstraction is definitely not a PCI thing and not a 
detached-from-everything thing, it's an IO resource thing. We could make 
it a property of struct resource:

struct resource {
resource_size_t start;
resource_size_t end;
const char *name;
unsigned long flags;
struct resource *parent, *sibling, *child;
+   void *mapping;
};

The APIs would be:

  int   io_resource_init_mapping(struct resource *res);
 void   io_resource_free_mapping(struct resource *res);
 void * io_resource_map(struct resource *res, pfn_t pfn, unsigned long offset);
 void   io_resource_unmap(struct resource *res, void *kaddr);

Note how simple and consistent it all gets: IO resources already know 
their physical location and their size limits. Being able to cache an 
ioremap in a mapping [and being able to use atomic kmaps on 32-bit] is a 
relatively simple and natural extension to the concept.

i think that would be quite acceptable - and the APIs could just 
transparently work on it. This would also allow the PCI code to 
automatically unmap any cached mappings from resources, when the driver 
deinitializes.

Linus, Jesse, what do you think?

i think we need to finalize the API names and their abstraction level, 
and then could even merge those APIs into v2.6.28 on a fast path, to 
enable you to use it. It does not interact with anything else so it 
should be safe to do.

(i'd not suggest to merge the i915 bits just yet - but that's obviously 
not my call.)

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm patches for 2.6.27-rc1

2008-10-18 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

 [ The *non-atomic* kmap() functions are fairly high-overhead, in that they 
   want to keep track of cached mappings and remember page addresses etc. 
   So those are the ones we don't want to support for non-HIGHMEM setups. 
 
   But the atomic kmaps are pretty simple, and really only need some 
   trivial FIXMAP support. We could easily extend it for x86-64, methinks, 
   and do it for x86-32 even when we don't do HIGHMEM.
 
   Ingo? ]

agreed, and there's certainly no resistance from the x86 architecture 
side to extend our mapping APIs in such directions.

But i think the direction of the new GEM code is subtly wrong here, 
because it tries to manage memory even on 64-bit systems. IMO it should 
just map the _whole_ graphics aperture (non-cached) and be done with it. 
There's no faster method at managing pages than the CPU doing a TLB fill 
from pagetables.

The only real API need i see is on 32-bit: with a 1GB or 2GB graphics 
aperture we just cannot map that permanently, so kmap_atomic() is a 
necessity. We can certainly extend that to non-highmem as well.

But this should be an ad-hoc transitionary thing for 32-bit, and on 
64-bit we really should not be using any form of kmap.

Especially with large vertex buffers or textures, mapping a lot of pages 
via kmap is not going to be trivial overhead - even if INVLPG is faster 
than a full TLB flush, it's still on the order of 100-200 cycles - and 
with a lot of pages that mounts up quickly. And if i understood your 
workload correctly you want to do tens of thousand of map/unmap/remap 
events per frame generated - depending on the type of the 3D app/engine.

Or am i missing something subtle? Why do you want the overhead of kmap 
on 64-bit?

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [git pull] drm patches for 2.6.27-rc1

2008-10-18 Thread Ingo Molnar

* Keith Packard [EMAIL PROTECTED] wrote:

 On Sat, 2008-10-18 at 22:37 +0200, Ingo Molnar wrote:
 
  But i think the direction of the new GEM code is subtly wrong here, 
  because it tries to manage memory even on 64-bit systems. IMO it 
  should just map the _whole_ graphics aperture (non-cached) and be 
  done with it. There's no faster method at managing pages than the 
  CPU doing a TLB fill from pagetables.
 
 Yeah, we're stuck thinking that we can't map the aperture because 
 it's too large, but with a 64-bit kernel, we should be able to keep it 
 mapped permanently.
 
 Of course, the io_reserve_pci_resource and io_map_atomic functions 
 could do precisely that, as kmap_atomic does on non-HIGHMEM systems 
 today.

okay, so basically what we need is a shared API that does per page 
kmap_atomic on 32-bit, and just an ioremap() on 64-bit. I had the 
impression that you were suggesting to extend kmap_atomic() to 64-bit - 
which would be wrong.

So, in terms of the 4 APIs you suggest:

  struct io_mapping *io_reserve_pci_resource(struct pci_dev *dev,
 int bar,
 int prot);
  void io_mapping_free(struct io_mapping *mapping);

  void *io_map_atomic(struct io_mapping *mapping, unsigned long pfn);
  void io_unmap_atomic(struct io_mapping *mapping, unsigned long pfn);

here is what we'd do on 64-bit:

  - io_reserve_pci_resource() would just do an ioremap(), and would save 
the ioremap-ed memory into struct io_mapping.

  - io_mapping_free() does the iounmap()

  - io_map_atomic(): just arithmetics, returns mapping-base + pfn - no 
TLB activities at all.

  - io_unmap_atomic(): NOP.

it's as fast as it gets: zero overhead in essence. Note that it's also 
shared between all CPUs and there's no aliasing trouble.

And we could make it even faster: if you think we could even use 2MB 
TLBs for the _linear_ ioremap()s here, hm? There's plenty of address 
space on 64-bit so we can align to 2MB just fine - and aperture sizes 
are 2MB sized anyway.

Or we could go one step further and install these aperture mappings into 
the _kernel linear_ address space. That would be even faster, because 
we'd have a constant offset. We have the (2MB mappings aware) mechanism 
for that already. (Yinghai Cc:-ed - he did a lot of great work to 
generalize this area.)

(In fact if we installed it into the linear kernel address space, and if 
the aperture is 1GB aligned, we will automatically use gbpages for it. 
Were Intel to support gbpages in the future ;-)

the _real_ remapping in a graphics aperture happens on the GPU level 
anyway, you manage an in-RAM GPU pagetable that just works like an 
IOMMU, correct?

on 32-bit we'd have what you use in the GEM code today:

  - io_reserve_pci_resource(): a NOP in essence

  - io_mapping_free(): a NOP

  - io_map_atomic(): does a kmap_atomic(pfn)

  - io_unmap_atomic(): does a kunmap_atomic(pfn)

so on 32-bit we have the INVLPG TLB overhead and preemption restrictions 
- but we knew that. We'd have to allow atomic_kmap() on non-highmem as 
well but that's fair.

Mind sending patches for this? :-)

Ingo

-
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK  win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100url=/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: CFS review

2007-08-30 Thread Ingo Molnar

* Rene Herman [EMAIL PROTECTED] wrote:

 Realised the BUGs may mean the kernel DRM people could want to be in CC...

and note that the schedule() call in there is not part of the crash 
backtrace:

 Call Trace:
  [c10fa54b] drm_lock+0x255/0x2de
  [c10ff28d] mga_dma_buffers+0x0/0x2e3
  [c10f87fc] drm_ioctl+0x142/0x18a
  [c1005973] do_IRQ+0x97/0xb0
  [c10f86ba] drm_ioctl+0x0/0x18a
  [c10f86ba] drm_ioctl+0x0/0x18a
  [c105b0d7] do_ioctl+0x87/0x9f
  [c105b32c] vfs_ioctl+0x23d/0x250
  [c11b533e] schedule+0x2d0/0x2e6
  [c105b372] sys_ioctl+0x33/0x4d
  [c1003d1e] syscall_call+0x7/0xb

it just happened to be on the kernel stack. Nor is the do_IRQ() entry 
real. Both are frequent functions (and were executed recently) that's 
why they were still in the stackframe.

Ingo

-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-28 Thread Ingo Molnar

* Fernando Pablo Lopez-Lezcano [EMAIL PROTECTED] wrote:

 Once I boot I can get, for example, two instances of glxgears happily
 going on at once. But if I start jack with realtime priority from
 within qjackctl I get a hard lock very soon, the machine does not even
 react to the sysrq magic key. _But_, you can see the two glxgears
 instances still drawing to the screen... but they seem to take turns,
 first one, then the other and so on and so forth (frequency of switch
 not perfectly regular, between 0.5 and 1 second, I'd say). Sorry the
 description is not very precise. 

if you want them to timeslice more finegrained, try to run them with
nice +19 - does that make them smoother?

Ingo


---
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047alloc_id=10808op=click
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-23 Thread Ingo Molnar

* Dave Airlie [EMAIL PROTECTED] wrote:

 I can't find the lock from a quick code inspection.. I'll add the
 reschedule to a test drm and I'll build my kernel with all the
 debugging on...

there doesnt seem to be any further lock other than the BKL:

 0001 0.558ms (+0.000ms): delay_tsc (__delay)
 0001 0.560ms (+0.001ms): __const_udelay (radeon_do_wait_for_fifo)
 0001 0.560ms (+0.000ms): __delay (radeon_do_wait_for_fifo)
 0001 0.560ms (+0.000ms): delay_tsc (__delay)
 0001 0.561ms (+0.001ms): __const_udelay (radeon_do_wait_for_fifo)
 0001 0.561ms (+0.000ms): __delay (radeon_do_wait_for_fifo)
 0001 0.561ms (+0.000ms): delay_tsc (__delay)

so unless the BKL is being relied on, it's safe to add the conditional
reschedule.

Ingo


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-23 Thread Ingo Molnar

i'll put the patch below into the -P8 patch. (change voluntary_resched()
to cond_resched() if you apply this to a vanilla kernel.)

Ingo

--- linux/drivers/char/drm/drm_os_linux.h.orig  
+++ linux/drivers/char/drm/drm_os_linux.h   
@@ -14,7 +14,7 @@
 #define DRM_ERR(d) -(d)
 /** Current process ID */
 #define DRM_CURRENTPID current-pid
-#define DRM_UDELAY(d)  udelay(d)
+#define DRM_UDELAY(d)  do { voluntary_resched(); udelay(d); } while 
(0)
 /** Read a byte from a MMIO region */
 #define DRM_READ8(map, offset) readb(((unsigned long)(map)-handle) + 
(offset))
 /** Read a dword from a MMIO region */


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-23 Thread Ingo Molnar

* Jon Smirl [EMAIL PROTECTED] wrote:

 Previously all of the IOCTL calls were protected by the Big Kernel
 Lock. If we break that aren't we allowing multiple callers into the
 IOCTL code on SMP machines that weren't allowed in before? Doesn't
 that mean that we have to check everything to make sure it is SMP
 safe?

as Alan mentioned it before, the BKL did not protect much, since the DRM
code was freely doing copying from and to userspace, and hence dropping
the BKL at those places.

but yes, best would be to drop the BKL as soon as possible and check
everything.

Ingo


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-23 Thread Ingo Molnar

* Mike Mestnik [EMAIL PROTECTED] wrote:

  as Alan mentioned it before, the BKL did not protect much, since the DRM
  code was freely doing copying from and to userspace, and hence dropping
  the BKL at those places.
  
 Add sleeping(usleep) to this list, no?  The big Q I have is why are
 IOCTL's protected by the BKL if it gets droped so easily?

usleep is for short delays, it uses busy-looping. msleep is the one that
schedules away.

  but yes, best would be to drop the BKL as soon as possible and check
  everything.
  
 By check everything, you mean make sure shared resources(memory) are used
 safely?

yes. This means some sort of locking has to be added (if it's not
present already). The easiest solution would be to add a semaphore
(mutex) to each device and use that to serialize multiple users and API
calls. A mutex doesnt get dropped when you schedule away.

Ingo


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-22 Thread Ingo Molnar

* Jon Smirl [EMAIL PROTECTED] wrote:

 --- Ingo Molnar [EMAIL PROTECTED] wrote:
  the cleanest and highest performing solution would be to have an
  interrupt source for 3D completion - do you have such an interrupt
  handler handy? If yes then you can sleep precisely up to completion:
 
 Interrupts are not a good solution. The hardware would generate
 millions of them per second. This is the same problem network cards
 have, you can interrupt the machine to death.

you'd only have to enable interrupt generation when you are not
busy-polling. In 99.9% of the cases (when !need_resched()) you could
busy poll as much as you want, and keep IRQ generation off. Very likely
IRQ generation can be turned on/off via a single PCI write on most 3D
hardware. Anyway, IRQ generation would just be a 'nice' thing. But the
following property is a must:

 I would expect the best solution is to make a few passes through the
 loop delaying a couple usec to catch the common case of quick
 completion. Then switch to doing need_resched().

no. If need_resched() is set then the code *must* try to schedule ASAP. 
There is no excuse for not doing so - 'performance will suffer' is not a
correct argument becase _if_ need_resched() is true then the system (and
the user) doesnt want the 3D app to run right now. There will be no
performance difference to the 3D app, since by having high latencies the
DRI code does not give any extra performance to 3D apps, it only
increases the scheduling latency! The timeslices of the 3D app are used
up depending on how long the 3D app runs - so if it burns cycles
busy-polling it will in fact get less CPU time on average. (if the CPU
is contended.)

Anyway, need_resched() set is not a common condition, and if it happens
then kernel code _must_ react. (In fact most kernel code will be
preempted right away - but the DRI code runs under spinlocks.)

So the correct solution is to add something like this to _at least_ the
busy-polling code:

if (need_resched()) {
... drop locks ...
cond_resched();
... reacquire locks ...
}

or, if there's a single spinlock held (the BKL doesnt count) then:

cond_resched_lock(driver-lock);

(but the locking obviously has to be correct and safe.)

ok?

Ingo


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 2.4.8.1+P6: radeon, dri xruns

2004-08-21 Thread Ingo Molnar

* Jon Smirl [EMAIL PROTECTED] wrote:

 A better solution might be to loop twenty times to pick up the very
 short commands. After 20us switch to a loop that allows the kernel to
 schedule. You don't want to immediately schedule since that will kill
 graphics performance.

You can busy-loop just fine as long as you stay preemptible. Also, if
need_resched() is true then you should get out of your locks ASAP.

the _ability_ to preempt quickly should not affect 3D performance in any
noticeable way: if the only app running is a 3D app, or it has a high
priority (because the user wants good 3D performance) then it will
perform just fine. Conversely, if an audio app has high priority then it
must see low latencies. So this is not an act of balancing, this is an
act of _enabling_ low latencies. Looping in the kernel for even 1 msec
while holding locks is very, very bad and should be avoided at every
cost. In fact you should shoot for as quick preemptability as possible
whenever you are looping for 3D completion, because your code is not
doing anything productive in fact. 'quick preemptability' != scheduling
away, it only means 'scheduling away if need_resched() is set'.

 What's the right way to write a loop like this that meets the above
 requirements and also satisfies the audio needs?

 for ( i = 0 ; i  dev_priv-usec_timeout ; i++ ) {
 if ( !(RADEON_READ( RADEON_RBBM_STATUS )
 RADEON_RBBM_ACTIVE) ) {
 radeon_do_pixcache_flush( dev_priv );
 return 0;
 }
 DRM_UDELAY( 1 );

add:

if (need_resched()) {
break locks ...
cond_resched();
reacquire locks ...
}

and make sure the breaking of the locks is safe at that point (no other
3D application should be able to interfere with your polling, etc.).

this will solve all the audio complaints.

as a bonus, if you want to be 'nice' to lower prio processes as well,
you can also do:

if ((i  127) == 127) {
drop locks ...
msleep(1);
reqacquire locks ...
}

this will busy-loop up to 127 usecs, and will sleep for 1 msec
afterwards.

the cleanest and highest performing solution would be to have an
interrupt source for 3D completion - do you have such an interrupt
handler handy? If yes then you can sleep precisely up to completion:

if ((i  127) == 127) {
drop locks ...
wait_event(driver-wait_queue,
RADEON_READ(RADEON_RBBM_STATUS));
reqacquire locks ...
break;
}

and in the IRQ handler, do a:

wake_up(driver-wait_queue);

the wakeup doesnt even have to be precise - i.e. you can wake up early
or for all events - wait_event()'s second field, the 'is the wait done'
takes care of it and re-suspends the task if the 3D engine is still
busy.

(to implement a timeout and signal-awareness as well you can use
wait_event_interruptible_timeout().)

Ingo


---
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink  Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
--
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: [Dri-devel] mesa 4.1 branch / NO go on 2.5.48

2002-11-21 Thread Ingo Molnar

On Wed, 20 Nov 2002, Linus Torvalds wrote:

 On Thu, 21 Nov 2002, Dieter Nützel wrote:
 
Option   AGPFastWrite 1
 
 This just makes the machine lock up for me at X startup.

same here, instant and nasty lockup.

Ingo



---
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



[Dri-devel] Caught signal 11, with bleeding-edge, radeon 8500 LE

2002-11-17 Thread Ingo Molnar

i'm getting segfaults with the latest bleeding-edge XFree86 binary.

the system is using the latest Red Hat rawhide rpms, i've used the latest
extras snapshot (and the libxaa from the extras tarball), plus the
r200-20021116 radeon driver. I've attached:

 - rpm -qa output
 - the XF86Config
 - strace -f output of the crash [not attached due to size.]
 - XFree86.0.log
 - kernel log messages

here's the md5 checksums of the extras and installed files, everything is 
in place i think:

 # md5sum XFree86 /usr/X11R6/bin/XFree86
 7c1cb25d8b5a42c9e1e6b30530b9d469  XFree86
 7c1cb25d8b5a42c9e1e6b30530b9d469  /usr/X11R6/bin/XFree86

 # md5sum libpcidata.a /usr/X11R6/lib/modules/libpcidata.a
 663b45e969a6454a7a7f4d28b30d0784  libpcidata.a
 663b45e969a6454a7a7f4d28b30d0784  /usr/X11R6/lib/modules/libpcidata.a

 # md5sum libxaa.a /usr/X11R6/lib/modules/libxaa.a
 ea563cb34e447cab7fe5800d29c6df1e  libxaa.a
 ea563cb34e447cab7fe5800d29c6df1e  /usr/X11R6/lib/modules/libxaa.a

X appears to crash when the first X client tries to connect to the server
- ie. if i start up X standalone it works, but when i start the first
xterm, it crashes.

if i use the orignal XFree86 binary (ie. XFree86-4.2.99.2-0.20021110.3)  
then DRI works, sortof. (simple things like glxgears work, but there are a
number of bad artifacts in Ultima1 techdemo 251 for example. Other things,
like tuxracer seems to work fine.)

Btw., if i use the libxaa in the extras package then a more complex app,
like the Ultima1 framerates decrases visibly - is this due to debugging
enabled in the extras libxaa, while the rawhide version not?

let me know if there's anything else i should/could provide, to get this
debugged.

Ingo



rpm-qa.out.bz2
Description: BZip2 compressed data


XF86Config.bz2
Description: BZip2 compressed data


XFree86.0.log.bz2
Description: BZip2 compressed data
agpgart: Maximum main memory to use for agp memory: 439M
agpgart: Detected Intel i845 chipset
agpgart: AGP aperture is 64M @ 0xf800
[drm] AGP 0.99 on Intel i845 @ 0xf800 64MB
[drm] Initialized radeon 1.7.0 20020828 on minor 0
[drm] Loading R200 Microcode




Re: [Dri-devel] R200 testing

2002-07-14 Thread Ingo Molnar


On Sun, 14 Jul 2002, David Bronaugh wrote:

 It seems that the kernel module was installed correctly, etc. He used a
 binary snapshot (20020708 or so).

0709 had some problems for me (tuxracer segfaulting occasionally), but
0712 works well so far.

 glxgears ran, gave about 2800 fps in a window of default size. However,
 when the window was enlarged to fullscreen, the framerate dropped to
 about 200fps. That doesn't sound quite normal.

here it goes to about 2500 fps default, 300 fps full-size. But this
happens with other 3d accelerated cards as well - the full size window is
at least 10 times larger, so a 1:10 performance drop does not look
abnormal.

 Next, tuxracer ran, but the menus were basically unusable. Sounds like
 he hit a whack of software fallbacks - frame rate on menus must have
 been less than 1fps. He didn't have the patience to get into the game.

this happens to me as well, but only if i restart the system after
installing the beta r200 driver. Re-running the install.sh script solves
the problem and the tuxracer menusystem is as fast as expected. I suspect
this is a problem of default dri kernel modules not being overwritten (?),
but the install reloads them properly. It's not a big issue though.

 Apparently there were also some artifacts in 2d, but those are of less
 concern.

havent seen any 2d artifacts so far.

Ingo



---
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel