Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Jiri Slaby
On 09/24/2007 09:37 AM, [EMAIL PROTECTED] wrote:
> (Interestingly, I can't find any of the 3 addresses listed in the 'list_add
> corruption' message anywhere *else* in the netconsole output, and the last 
> thing
> we hear from before the kersplat is apparently an RCU callback in a softirq?)

Hmm, there must be somebody else who changes it under hands without list_add.
Any lru skin game in the mid-layer nvidia sources? Do slab and slub behave the 
same?

[...]
> [   48.222000] 
> [   48.297000] POISONS (810001179148): 810006bbc000, 81000341bec0
> [   48.297000] 
> [   48.297000] Call Trace:
> [   48.297000][] __list_add+0xd7/0x138
> [   48.297000]  [] list_add+0xc/0x11
> [   48.297000]  [] free_hot_cold_page+0xe8/0x16d
> [   48.297000]  [] free_hot_page+0xb/0xd
> [   48.297000]  [] __free_pages+0x18/0x21
> [   48.297000]  [] free_pages+0x2f/0x34
> [   48.297000]  [] kmem_freepages+0xc5/0xce
> [   48.297000]  [] slab_destroy+0x3c/0x53
> [   48.297000]  [] free_block+0xcd/0x110
> [   48.297000]  [] cache_flusharray+0x71/0xa7
> [   48.297000]  [] kmem_cache_free+0x99/0xb2
> [   48.297000]  [] __d_free+0x30/0x34
> [   48.297000]  [] d_callback+0xd/0xf
> [   48.297000]  [] __rcu_process_callbacks+0x143/0x1da
> [   48.297000]  [] rcu_process_callbacks+0x23/0x44
> [   48.297000]  [] tasklet_action+0x54/0x9e
> [   48.297000]  [] __do_softirq+0x57/0xc7
> [   48.297000]  [] ksoftirqd+0x0/0x148
> [   48.297000]  [] call_softirq+0x1c/0x28
> [   48.297000][] do_softirq+0x34/0x87
> [   48.297000]  [] ksoftirqd+0x73/0x148
> [   48.297000]  [] kthread+0x49/0x78
> [   48.297000]  [] child_rip+0xa/0x12
> [   48.297000]  [] kthread+0x0/0x78
> [   48.297000]  [] child_rip+0x0/0x12
> [   48.297000] 
> [   48.297000] list_add corruption. next->prev should be prev 
> (8067e050), but was 8100066d59c0. (next=81000119e560).
> [   48.297000] [ cut here ]
> [   48.297000] kernel BUG at lib/list_debug.c:46!
> [   48.297000] invalid opcode:  [1] PREEMPT SMP 
> [   48.297000] last sysfs file: 
> /devices/pci:00/:00:1e.0/:03:01.4/resource
> [   48.297000] CPU 1 
> [   48.297000] Modules linked in: irnet(U) ppp_generic(U) slhc(U) 
> irtty_sir(U) sir_dev(U) ircomm_tty(U) ircomm(U) irda(U) crc_ccitt(U) 
> coretemp(U) nf_conntrack_ftp(U) xt_pkttype(U) ipt_REJECT(U) ipt_osf(U) 
> nf_conntrack_ipv4(U) xt_ipisforif(U) ipt_recent(U) ipt_LOG(U) xt_u32(U) 
> iptable_filter(U) ip_tables(U) xt_tcpudp(U) nf_conntrack_ipv6(U) xt_state(U) 
> nf_conntrack(U) nfnetlink(U) ip6t_LOG(U) xt_limit(U) ip6table_filter(U) 
> ip6_tables(U) x_tables(U) sha256(U) aes(U) fan(U) container(U) bay(U) 
> acpi_cpufreq(U) nvram(U) pcmcia(U) firmware_class(U) yenta_socket(U) 
> ohci1394(U) rsrc_nonstatic(U) iTCO_wdt(U) iTCO_vendor_support(U) 
> watchdog_core(U) nvidia(P)(U) thermal(U) ieee1394(U) pcmcia_core(U) 
> watchdog_dev(U) processor(U) snd_hda_intel(U) intel_agp(U) ac(U) button(U) 
> video(U) battery(U) power_supply(U) output(U) rtc(U)
> [   48.297000] Pid: 7, comm: ksoftirqd/1 Tainted: P2.6.23-rc6-mm1 #8
> [   48.297000] RIP: 0010:[]  [] 
> __list_add+0xfb/0x138
> [   48.297000] RSP: :81000349fd38  EFLAGS: 00010002
> [   48.297000] RAX: 0088 RBX: 810001179148 RCX: 
> 8061dbbb
> [   48.297000] RDX: 0001 RSI: 0006 RDI: 
> 80672620
> [   48.297000] RBP: 81000349fd58 R08: 80672628 R09: 
> 
> [   48.297000] R10: e731ffa2 R11: 81000349fa98 R12: 
> 81000119e560
> [   48.297000] R13: 8067e050 R14: 8067df80 R15: 
> 81000346d128
> [   48.297000] FS:  () GS:8100034689c0() 
> knlGS:
> [   48.297000] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
> [   48.297000] CR2: 0049b9b0 CR3: 04168000 CR4: 
> 06e0
> [   48.297000] DR0:  DR1:  DR2: 
> 
> [   48.297000] DR3:  DR6: 0ff0 DR7: 
> 0400
> [   48.297000] Process ksoftirqd/1 (pid: 7, threadinfo 810003494000, task 
> 810003463810)
> [   48.297000] last branch before last exception/interrupt
> [   48.297000]  from  [] printk+0xa3/0xa4
> [   48.297000]  to  [] __list_add+0xf5/0x138
> [   48.297000] Stack:  81000349fe60 810001179120 8067e040 
> 0002
> [   48.297000]  81000349fd68 80358117 81000349fd98 
> 80270865
> [   48.297000]  81000100  81000341bec0 
> 810006bbc000
> [   48.297000] Call Trace:
> [   48.297000][] list_add+0xc/0x11
> [   48.297000]  [] free_hot_cold_page+0xe8/0x16d
> [   48.297000]  [] free_hot_page+0xb/0xd
> [   48.297000]  [] __free_pages+0x18/0x21
> [   48.297000]  [] free_pages+0x2f/0x34
> [   48.297000]  [] kmem_freepages+0xc5/0xce
> [   48.297000]  [] slab_destroy+0x3c/0x53
> [   48.297000]  [] free_block+0xcd/0x110
> [   48.297000]  [] 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Valdis . Kletnieks
On Mon, 24 Sep 2007 08:06:45 +0200, Jiri Slaby said:

> Heh :). The few last before the list corruption BUG (you have to have 
> LIST_DEBUG
> enabled) -- but it seems you never reached that phase?

Seems to be somewhat racy - had one attempt that obviously got into some grand
Mongolian flustercluck, because I had a 2M printk buffer defined, and more
than 2M worth of apparently looping output saying that the netconsole/printk
path was poisoning.  I defined the printk buffer to 4M, added a initcall_debug,
and then it managed to die in a reasonable amount of time.  Here's the whole
thing from when it starts blurbing out the POISONS messages until it
rolls over and dies (about 736 lines).

(Interestingly, I can't find any of the 3 addresses listed in the 'list_add
corruption' message anywhere *else* in the netconsole output, and the last thing
we hear from before the kersplat is apparently an RCU callback in a softirq?)

[   47.997000] POISONS (810003fb6ca8): 810003fb6ca8, 8100051600d8
[   47.997000] 
[   47.997000] Call Trace:
[   47.998000]  [] __list_add+0xd7/0x138
[   47.998000]  [] vma_prio_tree_add+0xc9/0xe0
[   47.998000]  [] vma_prio_tree_insert+0x34/0x39
[   47.998000]  [] vma_adjust+0x310/0x452
[   47.998000]  [] split_vma+0xdb/0xed
[   47.998000]  [] mprotect_fixup+0x13b/0x481
[   47.998000]  [] file_map_prot_check+0x7d/0x86
[   47.998000]  [] selinux_file_mprotect+0xe0/0xe9
[   47.998000]  [] sys_mprotect+0x1b2/0x22b
[   47.998000]  [] tracesys+0xdc/0xe1
[   47.998000] 
[   48.078000] POISONS (81000402d768): 810004727810, 810006221810
[   48.078000] 
[   48.078000] Call Trace:
[   48.078000]  [] __list_add+0xd7/0x138
[   48.078000]  [] list_add+0xc/0x11
[   48.078000]  [] vma_prio_tree_add+0xad/0xe0
[   48.078000]  [] copy_process+0xc63/0x1515
[   48.078000]  [] do_fork+0x75/0x20b
[   48.079000]  [] __up_write+0xf0/0x100
[   48.079000]  [] system_call+0x7e/0x83
[   48.079000]  [] sys_clone+0x23/0x25
[   48.079000]  [] ptregscall_common+0x67/0xb0
[   48.079000] 
[   48.08] POISONS (810004096618): 810005266c00, 81000576a960
[   48.08] 
[   48.08] Call Trace:
[   48.08]  [] __down_write_nested+0x3d/0xa1
[   48.08]  [] __list_add+0xd7/0x138
[   48.08]  [] vma_prio_tree_add+0xc9/0xe0
[   48.08]  [] copy_process+0xc63/0x1515
[   48.08]  [] do_fork+0x75/0x20b
[   48.08]  [] __up_write+0xf0/0x100
[   48.08]  [] system_call+0x7e/0x83
[   48.08]  [] sys_clone+0x23/0x25
[   48.08]  [] ptregscall_common+0x67/0xb0
[   48.08] 
[   48.081000] POISONS (810004096768): 81000526e378, 81000526e378
[   48.081000] 
[   48.081000] Call Trace:
[   48.081000]  [] __list_add+0xd7/0x138
[   48.081000]  [] vma_prio_tree_add+0xc9/0xe0
[   48.081000]  [] copy_process+0xc63/0x1515
[   48.081000]  [] do_fork+0x75/0x20b
[   48.081000]  [] __up_write+0xf0/0x100
[   48.081000]  [] system_call+0x7e/0x83
[   48.081000]  [] sys_clone+0x23/0x25
[   48.081000]  [] ptregscall_common+0x67/0xb0
[   48.081000] 
[   48.081000] POISONS (8100040964c8): 81000576a960, 8100051c12d0
[   48.082000] 
[   48.082000] Call Trace:
[   48.087000]  [] __list_add+0xd7/0x138
[   48.087000]  [] vma_prio_tree_add+0xc9/0xe0
[   48.087000]  [] copy_process+0xc63/0x1515
[   48.087000]  [] do_fork+0x75/0x20b
[   48.087000]  [] __up_write+0xf0/0x100
[   48.087000]  [] system_call+0x7e/0x83
[   48.087000]  [] sys_clone+0x23/0x25
[   48.087000]  [] ptregscall_common+0x67/0xb0
[   48.087000] 
[   48.087000] POISONS (810004096d50): 81000412bab0, 810004536618
[   48.087000] 
[   48.087000] Call Trace:
[   48.087000]  [] __list_add+0xd7/0x138
[   48.088000]  [] list_add+0xc/0x11
[   48.088000]  [] vma_prio_tree_add+0xad/0xe0
[   48.088000]  [] copy_process+0xc63/0x1515
[   48.088000]  [] do_fork+0x75/0x20b
[   48.088000]  [] __up_write+0xf0/0x100
[   48.088000]  [] system_call+0x7e/0x83
[   48.088000]  [] sys_clone+0x23/0x25
[   48.088000]  [] ptregscall_common+0x67/0xb0
[   48.088000] 
[   48.088000] POISONS (810004096c00): 81000412b960, 8100043e7ca8
[   48.088000] 
[   48.088000] Call Trace:
[   48.088000]  [] __list_add+0xd7/0x138
[   48.088000]  [] list_add+0xc/0x11
[   48.089000]  [] vma_prio_tree_add+0xad/0xe0
[   48.089000]  [] copy_process+0xc63/0x1515
[   48.089000]  [] do_fork+0x75/0x20b
[   48.089000]  [] __up_write+0xf0/0x100
[   48.089000]  [] system_call+0x7e/0x83
[   48.089000]  [] sys_clone+0x23/0x25
[   48.089000]  [] ptregscall_common+0x67/0xb0
[   48.089000] 
[   48.089000] POISONS (810004096ca8): 81000526e960, 810003f0fb58
[   48.089000] 
[   48.089000] Call Trace:
[   48.089000]  [] __vm_enough_memory+0x1e/0x113
[   48.089000]  [] __list_add+0xd7/0x138
[   48.089000]  [] list_add+0xc/0x11
[   48.09]  [] vma_prio_tree_add+0xad/0xe0
[   48.09]  [] copy_process+0xc63/0x1515
[   48.09]  [] do_fork+0x75/0x20b
[   48.09]  [] __up_write+0xf0/0x100
[   48.09]  [] system_call+0x7e/0x83
[   

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Jiri Slaby
On 09/24/2007 05:25 AM, [EMAIL PROTECTED] wrote:
> On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said:
>> On 09/21/2007 09:38 PM, Jiri Slaby wrote:
>>> It is rather the other user who adds the page to some other list while bein
> g at
>>> deferred_pages list. Could you try my debug patch
>>> (http://lkml.org/lkml/2007/9/19/141)?
>> or the whitespace non-damaged version:
>> http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug
> 
> Gaak. Is that thing *supposed* to spew zillions of lines of output?

Oh, probably yes.

> Some of the hits we get (I'm wondering if anything after the first makes
> any sense, or if we're just slowly watching the corruption spread - the
> thing ended up near 23K lines long before I gave up and hit the poweroff
> button because there was no end in sight):

Yes. it's not perfect, most of them are false positives (It's OK).

> (If there's something specific you want me to find in the output,
> like "the first time we see XYZ", yell...)

Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG
enabled) -- but it seems you never reached that phase?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Jiri Slaby
On 09/24/2007 05:25 AM, [EMAIL PROTECTED] wrote:
 On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said:
 On 09/21/2007 09:38 PM, Jiri Slaby wrote:
 It is rather the other user who adds the page to some other list while bein
 g at
 deferred_pages list. Could you try my debug patch
 (http://lkml.org/lkml/2007/9/19/141)?
 or the whitespace non-damaged version:
 http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug
 
 Gaak. Is that thing *supposed* to spew zillions of lines of output?

Oh, probably yes.

 Some of the hits we get (I'm wondering if anything after the first makes
 any sense, or if we're just slowly watching the corruption spread - the
 thing ended up near 23K lines long before I gave up and hit the poweroff
 button because there was no end in sight):

Yes. it's not perfect, most of them are false positives (It's OK).

 (If there's something specific you want me to find in the output,
 like the first time we see XYZ, yell...)

Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG
enabled) -- but it seems you never reached that phase?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Valdis . Kletnieks
On Mon, 24 Sep 2007 08:06:45 +0200, Jiri Slaby said:

 Heh :). The few last before the list corruption BUG (you have to have 
 LIST_DEBUG
 enabled) -- but it seems you never reached that phase?

Seems to be somewhat racy - had one attempt that obviously got into some grand
Mongolian flustercluck, because I had a 2M printk buffer defined, and more
than 2M worth of apparently looping output saying that the netconsole/printk
path was poisoning.  I defined the printk buffer to 4M, added a initcall_debug,
and then it managed to die in a reasonable amount of time.  Here's the whole
thing from when it starts blurbing out the POISONS messages until it
rolls over and dies (about 736 lines).

(Interestingly, I can't find any of the 3 addresses listed in the 'list_add
corruption' message anywhere *else* in the netconsole output, and the last thing
we hear from before the kersplat is apparently an RCU callback in a softirq?)

[   47.997000] POISONS (810003fb6ca8): 810003fb6ca8, 8100051600d8
[   47.997000] 
[   47.997000] Call Trace:
[   47.998000]  [803580aa] __list_add+0xd7/0x138
[   47.998000]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[   47.998000]  [80276649] vma_prio_tree_insert+0x34/0x39
[   47.998000]  [8027da6c] vma_adjust+0x310/0x452
[   47.998000]  [8027dc89] split_vma+0xdb/0xed
[   47.998000]  [8027f30b] mprotect_fixup+0x13b/0x481
[   47.998000]  [80323da3] file_map_prot_check+0x7d/0x86
[   47.998000]  [80326d93] selinux_file_mprotect+0xe0/0xe9
[   47.998000]  [8027f803] sys_mprotect+0x1b2/0x22b
[   47.998000]  [8020c2fc] tracesys+0xdc/0xe1
[   47.998000] 
[   48.078000] POISONS (81000402d768): 810004727810, 810006221810
[   48.078000] 
[   48.078000] Call Trace:
[   48.078000]  [803580aa] __list_add+0xd7/0x138
[   48.078000]  [80358117] list_add+0xc/0x11
[   48.078000]  [802765e2] vma_prio_tree_add+0xad/0xe0
[   48.078000]  [802324f0] copy_process+0xc63/0x1515
[   48.078000]  [80232eff] do_fork+0x75/0x20b
[   48.079000]  [80353d54] __up_write+0xf0/0x100
[   48.079000]  [8020c17e] system_call+0x7e/0x83
[   48.079000]  [8020a64f] sys_clone+0x23/0x25
[   48.079000]  [8020c497] ptregscall_common+0x67/0xb0
[   48.079000] 
[   48.08] POISONS (810004096618): 810005266c00, 81000576a960
[   48.08] 
[   48.08] Call Trace:
[   48.08]  [80517086] __down_write_nested+0x3d/0xa1
[   48.08]  [803580aa] __list_add+0xd7/0x138
[   48.08]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[   48.08]  [802324f0] copy_process+0xc63/0x1515
[   48.08]  [80232eff] do_fork+0x75/0x20b
[   48.08]  [80353d54] __up_write+0xf0/0x100
[   48.08]  [8020c17e] system_call+0x7e/0x83
[   48.08]  [8020a64f] sys_clone+0x23/0x25
[   48.08]  [8020c497] ptregscall_common+0x67/0xb0
[   48.08] 
[   48.081000] POISONS (810004096768): 81000526e378, 81000526e378
[   48.081000] 
[   48.081000] Call Trace:
[   48.081000]  [803580aa] __list_add+0xd7/0x138
[   48.081000]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[   48.081000]  [802324f0] copy_process+0xc63/0x1515
[   48.081000]  [80232eff] do_fork+0x75/0x20b
[   48.081000]  [80353d54] __up_write+0xf0/0x100
[   48.081000]  [8020c17e] system_call+0x7e/0x83
[   48.081000]  [8020a64f] sys_clone+0x23/0x25
[   48.081000]  [8020c497] ptregscall_common+0x67/0xb0
[   48.081000] 
[   48.081000] POISONS (8100040964c8): 81000576a960, 8100051c12d0
[   48.082000] 
[   48.082000] Call Trace:
[   48.087000]  [803580aa] __list_add+0xd7/0x138
[   48.087000]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[   48.087000]  [802324f0] copy_process+0xc63/0x1515
[   48.087000]  [80232eff] do_fork+0x75/0x20b
[   48.087000]  [80353d54] __up_write+0xf0/0x100
[   48.087000]  [8020c17e] system_call+0x7e/0x83
[   48.087000]  [8020a64f] sys_clone+0x23/0x25
[   48.087000]  [8020c497] ptregscall_common+0x67/0xb0
[   48.087000] 
[   48.087000] POISONS (810004096d50): 81000412bab0, 810004536618
[   48.087000] 
[   48.087000] Call Trace:
[   48.087000]  [803580aa] __list_add+0xd7/0x138
[   48.088000]  [80358117] list_add+0xc/0x11
[   48.088000]  [802765e2] vma_prio_tree_add+0xad/0xe0
[   48.088000]  [802324f0] copy_process+0xc63/0x1515
[   48.088000]  [80232eff] do_fork+0x75/0x20b
[   48.088000]  [80353d54] __up_write+0xf0/0x100
[   48.088000]  [8020c17e] system_call+0x7e/0x83
[   48.088000]  [8020a64f] sys_clone+0x23/0x25
[   48.088000]  [8020c497] ptregscall_common+0x67/0xb0
[   48.088000] 
[   48.088000] POISONS (810004096c00): 81000412b960, 8100043e7ca8
[   48.088000] 
[   48.088000] Call Trace:
[   48.088000] 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-24 Thread Jiri Slaby
On 09/24/2007 09:37 AM, [EMAIL PROTECTED] wrote:
 (Interestingly, I can't find any of the 3 addresses listed in the 'list_add
 corruption' message anywhere *else* in the netconsole output, and the last 
 thing
 we hear from before the kersplat is apparently an RCU callback in a softirq?)

Hmm, there must be somebody else who changes it under hands without list_add.
Any lru skin game in the mid-layer nvidia sources? Do slab and slub behave the 
same?

[...]
 [   48.222000] 
 [   48.297000] POISONS (810001179148): 810006bbc000, 81000341bec0
 [   48.297000] 
 [   48.297000] Call Trace:
 [   48.297000]  IRQ  [803580aa] __list_add+0xd7/0x138
 [   48.297000]  [80358117] list_add+0xc/0x11
 [   48.297000]  [80270865] free_hot_cold_page+0xe8/0x16d
 [   48.297000]  [8027093e] free_hot_page+0xb/0xd
 [   48.297000]  [80270958] __free_pages+0x18/0x21
 [   48.297000]  [80270990] free_pages+0x2f/0x34
 [   48.297000]  [8028922d] kmem_freepages+0xc5/0xce
 [   48.297000]  [8028957f] slab_destroy+0x3c/0x53
 [   48.297000]  [80289663] free_block+0xcd/0x110
 [   48.297000]  [80289345] cache_flusharray+0x71/0xa7
 [   48.297000]  [802894c2] kmem_cache_free+0x99/0xb2
 [   48.297000]  [8029d8b2] __d_free+0x30/0x34
 [   48.297000]  [8029dcb6] d_callback+0xd/0xf
 [   48.297000]  [802458e0] __rcu_process_callbacks+0x143/0x1da
 [   48.297000]  [8024599a] rcu_process_callbacks+0x23/0x44
 [   48.297000]  [802396d2] tasklet_action+0x54/0x9e
 [   48.297000]  [802395ad] __do_softirq+0x57/0xc7
 [   48.297000]  [802398e3] ksoftirqd+0x0/0x148
 [   48.297000]  [8020d32c] call_softirq+0x1c/0x28
 [   48.297000]  EOI  [8020e916] do_softirq+0x34/0x87
 [   48.297000]  [80239956] ksoftirqd+0x73/0x148
 [   48.297000]  [80247ddd] kthread+0x49/0x78
 [   48.297000]  [8020cfc8] child_rip+0xa/0x12
 [   48.297000]  [80247d94] kthread+0x0/0x78
 [   48.297000]  [8020cfbe] child_rip+0x0/0x12
 [   48.297000] 
 [   48.297000] list_add corruption. next-prev should be prev 
 (8067e050), but was 8100066d59c0. (next=81000119e560).
 [   48.297000] [ cut here ]
 [   48.297000] kernel BUG at lib/list_debug.c:46!
 [   48.297000] invalid opcode:  [1] PREEMPT SMP 
 [   48.297000] last sysfs file: 
 /devices/pci:00/:00:1e.0/:03:01.4/resource
 [   48.297000] CPU 1 
 [   48.297000] Modules linked in: irnet(U) ppp_generic(U) slhc(U) 
 irtty_sir(U) sir_dev(U) ircomm_tty(U) ircomm(U) irda(U) crc_ccitt(U) 
 coretemp(U) nf_conntrack_ftp(U) xt_pkttype(U) ipt_REJECT(U) ipt_osf(U) 
 nf_conntrack_ipv4(U) xt_ipisforif(U) ipt_recent(U) ipt_LOG(U) xt_u32(U) 
 iptable_filter(U) ip_tables(U) xt_tcpudp(U) nf_conntrack_ipv6(U) xt_state(U) 
 nf_conntrack(U) nfnetlink(U) ip6t_LOG(U) xt_limit(U) ip6table_filter(U) 
 ip6_tables(U) x_tables(U) sha256(U) aes(U) fan(U) container(U) bay(U) 
 acpi_cpufreq(U) nvram(U) pcmcia(U) firmware_class(U) yenta_socket(U) 
 ohci1394(U) rsrc_nonstatic(U) iTCO_wdt(U) iTCO_vendor_support(U) 
 watchdog_core(U) nvidia(P)(U) thermal(U) ieee1394(U) pcmcia_core(U) 
 watchdog_dev(U) processor(U) snd_hda_intel(U) intel_agp(U) ac(U) button(U) 
 video(U) battery(U) power_supply(U) output(U) rtc(U)
 [   48.297000] Pid: 7, comm: ksoftirqd/1 Tainted: P2.6.23-rc6-mm1 #8
 [   48.297000] RIP: 0010:[803580ce]  [803580ce] 
 __list_add+0xfb/0x138
 [   48.297000] RSP: :81000349fd38  EFLAGS: 00010002
 [   48.297000] RAX: 0088 RBX: 810001179148 RCX: 
 8061dbbb
 [   48.297000] RDX: 0001 RSI: 0006 RDI: 
 80672620
 [   48.297000] RBP: 81000349fd58 R08: 80672628 R09: 
 
 [   48.297000] R10: e731ffa2 R11: 81000349fa98 R12: 
 81000119e560
 [   48.297000] R13: 8067e050 R14: 8067df80 R15: 
 81000346d128
 [   48.297000] FS:  () GS:8100034689c0() 
 knlGS:
 [   48.297000] CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
 [   48.297000] CR2: 0049b9b0 CR3: 04168000 CR4: 
 06e0
 [   48.297000] DR0:  DR1:  DR2: 
 
 [   48.297000] DR3:  DR6: 0ff0 DR7: 
 0400
 [   48.297000] Process ksoftirqd/1 (pid: 7, threadinfo 810003494000, task 
 810003463810)
 [   48.297000] last branch before last exception/interrupt
 [   48.297000]  from  [80234989] printk+0xa3/0xa4
 [   48.297000]  to  [803580c8] __list_add+0xf5/0x138
 [   48.297000] Stack:  81000349fe60 810001179120 8067e040 
 0002
 [   48.297000]  81000349fd68 80358117 81000349fd98 
 80270865
 [   48.297000]  81000100  81000341bec0 
 810006bbc000
 [   48.297000] Call Trace:
 [   

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-23 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said:
> On 09/21/2007 09:38 PM, Jiri Slaby wrote:
> > It is rather the other user who adds the page to some other list while bein
g at
> > deferred_pages list. Could you try my debug patch
> > (http://lkml.org/lkml/2007/9/19/141)?
> 
> or the whitespace non-damaged version:
> http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug

Gaak. Is that thing *supposed* to spew zillions of lines of output?

Some of the hits we get (I'm wondering if anything after the first makes
any sense, or if we're just slowly watching the corruption spread - the
thing ended up near 23K lines long before I gave up and hit the poweroff
button because there was no end in sight):

(If there's something specific you want me to find in the output,
like "the first time we see XYZ", yell...)

[  103.701000] POISONS (81000117dc88): 810006d14000, 8100034225c0
[  103.701000]
[  103.701000] Call Trace:
[  103.701000]  [] __list_add+0xd7/0x138
[  103.701000]  [] list_add+0xc/0x11
[  103.701000]  [] free_hot_cold_page+0xe8/0x16d
[  103.701000]  [] free_hot_page+0xb/0xd
[  103.701000]  [] __free_pages+0x18/0x21
[  103.701000]  [] free_pages+0x2f/0x34
[  103.701000]  [] kmem_freepages+0xc5/0xce
[  103.701000]  [] slab_destroy+0x3c/0x53
[  103.701000]  [] free_block+0xcd/0x110
[  103.701000]  [] drain_array+0x94/0xc9
[  103.701000]  [] cache_reap+0x0/0x105
[  103.701000]  [] cache_reap+0x85/0x105
[  103.701000]  [] run_workqueue+0x8e/0x125
[  103.701000]  [] worker_thread+0x0/0xe7
[  103.701000]  [] worker_thread+0xdc/0xe7
[  103.701000]  [] autoremove_wake_function+0x0/0x38
[  103.701000]  [] kthread+0x49/0x78
[  103.701000]  [] child_rip+0xa/0x12
[  103.701000]  [] kthread+0x0/0x78
[  103.701000]  [] child_rip+0x0/0x12
[  103.701000]

[  103.701000] POISONS (81000117eac0): 810006d55000, 8100034225c0
[  103.701000]
[  103.701000] Call Trace:
[  103.701000]  [] __list_add+0xd7/0x138
[  103.701000]  [] list_add+0xc/0x11
[  103.701000]  [] free_hot_cold_page+0xe8/0x16d
[  103.701000]  [] free_hot_page+0xb/0xd
[  103.701000]  [] __free_pages+0x18/0x21
[  103.701000]  [] free_pages+0x2f/0x34
[  103.701000]  [] kmem_freepages+0xc5/0xce
[  103.701000]  [] slab_destroy+0x3c/0x53
[  103.701000]  [] free_block+0xcd/0x110
[  103.701000]  [] drain_array+0x94/0xc9
[  103.701000]  [] cache_reap+0x0/0x105
[  103.701000]  [] cache_reap+0x85/0x105
[  103.701000]  [] run_workqueue+0x8e/0x125
[  103.701000]  [] worker_thread+0x0/0xe7
[  103.701000]  [] worker_thread+0xdc/0xe7
[  103.701000]  [] autoremove_wake_function+0x0/0x38
[  103.701000]  [] kthread+0x49/0x78
[  103.701000]  [] child_rip+0xa/0x12
[  103.701000]  [] kthread+0x0/0x78
[  103.701000]  [] child_rip+0x0/0x12
[  103.701000]
(That trace repeats 16 times, then we see:)
[  106.284000] POISONS (810004432810): 810005291378, 81000524e618
[  106.284000] 
[  106.284000] Call Trace:
[  106.284000]  [] __down_write_nested+0x3d/0xa1
[  106.284000]  [] __list_add+0xd7/0x138
[  106.284000]  [] vma_prio_tree_add+0xc9/0xe0
[  106.284000]  [] copy_process+0xc63/0x1515
[  106.284000]  [] do_fork+0x75/0x20b
[  106.284000]  [] __up_write+0xf0/0x100
[  106.284000]  [] system_call+0x7e/0x83
[  106.284000]  [] sys_clone+0x23/0x25
[  106.284000]  [] ptregscall_common+0x67/0xb0
[  106.284000] 
..
[  106.284000] POISONS (810004432768): 81000524e618, 81000524e618
[  106.284000]
[  106.284000] Call Trace:
[  106.284000]  [] __list_add+0xd7/0x138
[  106.284000]  [] vma_prio_tree_add+0xc9/0xe0
[  106.284000]  [] copy_process+0xc63/0x1515
[  106.284000]  [] do_fork+0x75/0x20b
[  106.284000]  [] __up_write+0xf0/0x100
[  106.284000]  [] system_call+0x7e/0x83
[  106.284000]  [] sys_clone+0x23/0x25
[  106.284000]  [] ptregscall_common+0x67/0xb0
[  106.284000]  
...
[  106.285000] POISONS (810003637b30): 810003637c18, 0246
[  106.285000]
[  106.285000] Call Trace:
[  106.285000]  [] __list_add+0xd7/0x138
[  106.285000]  [] list_add+0xc/0x11
[  106.285000]  [] add_wait_queue+0x2c/0x40
[  106.285000]  [] __pollwait+0xd6/0xdf
[  106.285000]  [] inotify_poll+0x29/0x5c
[  106.285000]  [] do_select+0x2fa/0x50d
[  106.285000]  [] __pollwait+0x0/0xdf
[  106.285000]  [] default_wake_function+0x0/0xf
[  106.285000]  [] __down_trylock+0x4d/0x5a
[  106.285000]  [] __down_failed_trylock+0x35/0x3a
[  106.285000]  [] __update_rq_clock+0x1a/0xe5
[  106.285000]  [] __alloc_pages+0x5c/0x2b5
[  106.285000]  [] core_sys_select+0x1f3/0x2a2
[  106.285000]  [] alloc_pid+0x2f8/0x34f
[  106.285000]  [] __up_read+0x7a/0x83
[  106.285000]  [] up_read+0x9/0xb
[  106.285000]  [] do_page_fault+0x405/0x7ac
[  106.285000]  [] sys_select+0xbf/0x17b
[  106.285000]  [] system_call+0x7e/0x83
[  106.285000] POISONS (810003637ba0): 8060ff48, 8051471d
[  106.285000]
[  106.285000] Call Trace:
[  106.285000]  [] __list_add+0xd7/0x138
[  106.285000]  [] list_add+0xc/0x11
[  106.285000]  [] add_wait_queue+0x2c/0x40
[  106.285000]  [] 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-23 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said:
 On 09/21/2007 09:38 PM, Jiri Slaby wrote:
  It is rather the other user who adds the page to some other list while bein
g at
  deferred_pages list. Could you try my debug patch
  (http://lkml.org/lkml/2007/9/19/141)?
 
 or the whitespace non-damaged version:
 http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug

Gaak. Is that thing *supposed* to spew zillions of lines of output?

Some of the hits we get (I'm wondering if anything after the first makes
any sense, or if we're just slowly watching the corruption spread - the
thing ended up near 23K lines long before I gave up and hit the poweroff
button because there was no end in sight):

(If there's something specific you want me to find in the output,
like the first time we see XYZ, yell...)

[  103.701000] POISONS (81000117dc88): 810006d14000, 8100034225c0
[  103.701000]
[  103.701000] Call Trace:
[  103.701000]  [803580aa] __list_add+0xd7/0x138
[  103.701000]  [80358117] list_add+0xc/0x11
[  103.701000]  [80270865] free_hot_cold_page+0xe8/0x16d
[  103.701000]  [8027093e] free_hot_page+0xb/0xd
[  103.701000]  [80270958] __free_pages+0x18/0x21
[  103.701000]  [80270990] free_pages+0x2f/0x34
[  103.701000]  [8028922d] kmem_freepages+0xc5/0xce
[  103.701000]  [8028957f] slab_destroy+0x3c/0x53
[  103.701000]  [80289663] free_block+0xcd/0x110
[  103.701000]  [8028973a] drain_array+0x94/0xc9
[  103.701000]  [8028a8c3] cache_reap+0x0/0x105
[  103.701000]  [8028a948] cache_reap+0x85/0x105
[  103.701000]  [80243d5d] run_workqueue+0x8e/0x125
[  103.701000]  [8024478d] worker_thread+0x0/0xe7
[  103.701000]  [80244869] worker_thread+0xdc/0xe7
[  103.701000]  [80247f13] autoremove_wake_function+0x0/0x38
[  103.701000]  [80247ddd] kthread+0x49/0x78
[  103.701000]  [8020cfc8] child_rip+0xa/0x12
[  103.701000]  [80247d94] kthread+0x0/0x78
[  103.701000]  [8020cfbe] child_rip+0x0/0x12
[  103.701000]

[  103.701000] POISONS (81000117eac0): 810006d55000, 8100034225c0
[  103.701000]
[  103.701000] Call Trace:
[  103.701000]  [803580aa] __list_add+0xd7/0x138
[  103.701000]  [80358117] list_add+0xc/0x11
[  103.701000]  [80270865] free_hot_cold_page+0xe8/0x16d
[  103.701000]  [8027093e] free_hot_page+0xb/0xd
[  103.701000]  [80270958] __free_pages+0x18/0x21
[  103.701000]  [80270990] free_pages+0x2f/0x34
[  103.701000]  [8028922d] kmem_freepages+0xc5/0xce
[  103.701000]  [8028957f] slab_destroy+0x3c/0x53
[  103.701000]  [80289663] free_block+0xcd/0x110
[  103.701000]  [8028973a] drain_array+0x94/0xc9
[  103.701000]  [8028a8c3] cache_reap+0x0/0x105
[  103.701000]  [8028a948] cache_reap+0x85/0x105
[  103.701000]  [80243d5d] run_workqueue+0x8e/0x125
[  103.701000]  [8024478d] worker_thread+0x0/0xe7
[  103.701000]  [80244869] worker_thread+0xdc/0xe7
[  103.701000]  [80247f13] autoremove_wake_function+0x0/0x38
[  103.701000]  [80247ddd] kthread+0x49/0x78
[  103.701000]  [8020cfc8] child_rip+0xa/0x12
[  103.701000]  [80247d94] kthread+0x0/0x78
[  103.701000]  [8020cfbe] child_rip+0x0/0x12
[  103.701000]
(That trace repeats 16 times, then we see:)
[  106.284000] POISONS (810004432810): 810005291378, 81000524e618
[  106.284000] 
[  106.284000] Call Trace:
[  106.284000]  [80517086] __down_write_nested+0x3d/0xa1
[  106.284000]  [803580aa] __list_add+0xd7/0x138
[  106.284000]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[  106.284000]  [802324f0] copy_process+0xc63/0x1515
[  106.284000]  [80232eff] do_fork+0x75/0x20b
[  106.284000]  [80353d54] __up_write+0xf0/0x100
[  106.284000]  [8020c17e] system_call+0x7e/0x83
[  106.284000]  [8020a64f] sys_clone+0x23/0x25
[  106.284000]  [8020c497] ptregscall_common+0x67/0xb0
[  106.284000] 
..
[  106.284000] POISONS (810004432768): 81000524e618, 81000524e618
[  106.284000]
[  106.284000] Call Trace:
[  106.284000]  [803580aa] __list_add+0xd7/0x138
[  106.284000]  [802765fe] vma_prio_tree_add+0xc9/0xe0
[  106.284000]  [802324f0] copy_process+0xc63/0x1515
[  106.284000]  [80232eff] do_fork+0x75/0x20b
[  106.284000]  [80353d54] __up_write+0xf0/0x100
[  106.284000]  [8020c17e] system_call+0x7e/0x83
[  106.284000]  [8020a64f] sys_clone+0x23/0x25
[  106.284000]  [8020c497] ptregscall_common+0x67/0xb0
[  106.284000]  
...
[  106.285000] POISONS (810003637b30): 810003637c18, 0246
[  106.285000]
[  106.285000] Call Trace:
[  106.285000]  [803580aa] __list_add+0xd7/0x138
[  106.285000]  [80358117] list_add+0xc/0x11
[  106.285000]  [80248110] add_wait_queue+0x2c/0x40
[  

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 09:38 PM, Jiri Slaby wrote:
> It is rather the other user who adds the page to some other list while being 
> at
> deferred_pages list. Could you try my debug patch
> (http://lkml.org/lkml/2007/9/19/141)?

or the whitespace non-damaged version:
http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 09:33 PM, [EMAIL PROTECTED] wrote:
> On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
>> On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
> 
>>> Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
>>> me,
>>> it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
>>> change matters.
>> This patch probably changes behaviour how the pages are queued on the list
>> somehow. Maybe it's insane to suggest everybody with similar problem to try
>> LIST_DEBUG, but just give it a try after having one of the patches applied 
>> ;).
>> (Or have you tried yet?)
> 
> OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it 
> hit:
> 
> [  198.925000] list_del corruption. next->prev should be 81000118f178, 
> but was 8067e050
> [  198.925000] [ cut here ]
> [  198.925000] kernel BUG at lib/list_debug.c:72!
> [  198.925000] invalid opcode:  [1] PREEMPT SMP 
> [  198.925000] last sysfs file: 
> /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev
> [  198.925000] CPU 1 
> [  198.925000] Modules linked in:
> 
> (Yes, I wish I got a backtrace, but that's as long as it lived.  Apparently,
> the netconsole stuff actually writing this stuff out was over on CPU0 which 
> then
> proceeded to croak).
> 
> Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters)

It is rather the other user who adds the page to some other list while being at
deferred_pages list. Could you try my debug patch
(http://lkml.org/lkml/2007/9/19/141)?

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
> On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:

> > Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
> > me,
> > it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
> > change matters.
> 
> This patch probably changes behaviour how the pages are queued on the list
> somehow. Maybe it's insane to suggest everybody with similar problem to try
> LIST_DEBUG, but just give it a try after having one of the patches applied ;).
> (Or have you tried yet?)

OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it 
hit:

[  198.925000] list_del corruption. next->prev should be 81000118f178, but 
was 8067e050
[  198.925000] [ cut here ]
[  198.925000] kernel BUG at lib/list_debug.c:72!
[  198.925000] invalid opcode:  [1] PREEMPT SMP 
[  198.925000] last sysfs file: 
/devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev
[  198.925000] CPU 1 
[  198.925000] Modules linked in:

(Yes, I wish I got a backtrace, but that's as long as it lived.  Apparently,
the netconsole stuff actually writing this stuff out was over on CPU0 which then
proceeded to croak).

Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters)


pgp0DGzUXtfnR.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
> On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
> > On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
> >> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
> >>   -rc4-mm1: solid lock on X shutdown, random solid locks about
> >> once every four hours
> >>   -rc6-mm1: solid lock on X startup
> >>+your patch: screen goes black, turns off and on a few times during
> >> startup, can reboot with sysrq-b
> > 
> > Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
> > me,
> > it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
> > change matters.
> 
> This patch probably changes behaviour how the pages are queued on the list
> somehow. Maybe it's insane to suggest everybody with similar problem to try
> LIST_DEBUG, but just give it a try after having one of the patches applied ;).
> (Or have you tried yet?)

Haven't tried LIST_DEBUG yet.  I'm spending most of the weekend camping, so
will likely be unable to test that until Monday-ish (unless I get lucky and can
get a test in during the next 2 hours)...


pgpLdN1M63Tfr.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
> On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
>> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
>>   -rc4-mm1: solid lock on X shutdown, random solid locks about
>> once every four hours
>>   -rc6-mm1: solid lock on X startup
>>+your patch: screen goes black, turns off and on a few times during
>> startup, can reboot with sysrq-b
> 
> Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For me,
> it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
> change matters.

This patch probably changes behaviour how the pages are queued on the list
somehow. Maybe it's insane to suggest everybody with similar problem to try
LIST_DEBUG, but just give it a try after having one of the patches applied ;).
(Or have you tried yet?)

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 01:44:41 +0200, Andi Kleen said:

> Full bisect needed then I guess. Ok as a short cut you could perhaps
> the cpa-* patches first (might need to drop some later depending 
> patches), then the drm and agp trees.

The later depending patches:

x86_64-mm-cpa-clflush.patch
x86_64-mm-cpa-cleanup.patch
x86_64-mm-cpa-einval.patch
x86_64-mm-cpa-arch-macro.patch
intel-iommu-clflush_cache_range-now-takes-size-param.patch




pgpBmX3JhtCp0.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
> On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote:

> > I've attached a more complicated patch that does a 2 stage effort to
> > unmapping and freeing pages. My kernel no longer hangs with this
> > patch...
> > 
> > Jiri can you confirm?
> 
> It's broken for me.
> 
> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
>   -rc4-mm1: solid lock on X shutdown, random solid locks about
> once every four hours
>   -rc6-mm1: solid lock on X startup
>+your patch: screen goes black, turns off and on a few times during
> startup, can reboot with sysrq-b

Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For me,
it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
change matters.


pgpwbbHB44LP3.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 01:31 AM, Matt Mackall wrote:
> On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
>>> It's broken for me.
>>>
>>> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
>>>   -rc4-mm1: solid lock on X shutdown, random solid locks about
>>> once every four hours
>>>   -rc6-mm1: solid lock on X startup
>>>+your patch: screen goes black, turns off and on a few times during
>>> startup, can reboot with sysrq-b
>> Does it work with my simple dumb patch instead of Dave's ? 
> 
> Sorry, forgot to mention: your one-liner flush also doesn't work (same
> behavior).

Just an idea, if you enable LIST_DEBUG, it won't spit anything out after the one
of patches applied, right?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 01:31 AM, Matt Mackall wrote:
 On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
 It's broken for me.

 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
   -rc4-mm1: solid lock on X shutdown, random solid locks about
 once every four hours
   -rc6-mm1: solid lock on X startup
+your patch: screen goes black, turns off and on a few times during
 startup, can reboot with sysrq-b
 Does it work with my simple dumb patch instead of Dave's ? 
 
 Sorry, forgot to mention: your one-liner flush also doesn't work (same
 behavior).

Just an idea, if you enable LIST_DEBUG, it won't spit anything out after the one
of patches applied, right?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
 On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote:

  I've attached a more complicated patch that does a 2 stage effort to
  unmapping and freeing pages. My kernel no longer hangs with this
  patch...
  
  Jiri can you confirm?
 
 It's broken for me.
 
 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
   -rc4-mm1: solid lock on X shutdown, random solid locks about
 once every four hours
   -rc6-mm1: solid lock on X startup
+your patch: screen goes black, turns off and on a few times during
 startup, can reboot with sysrq-b

Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For me,
it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
change matters.


pgpwbbHB44LP3.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 01:44:41 +0200, Andi Kleen said:

 Full bisect needed then I guess. Ok as a short cut you could perhaps
 the cpa-* patches first (might need to drop some later depending 
 patches), then the drm and agp trees.

The later depending patches:

x86_64-mm-cpa-clflush.patch
x86_64-mm-cpa-cleanup.patch
x86_64-mm-cpa-einval.patch
x86_64-mm-cpa-arch-macro.patch
intel-iommu-clflush_cache_range-now-takes-size-param.patch




pgpBmX3JhtCp0.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
 On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
  On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
  2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
-rc4-mm1: solid lock on X shutdown, random solid locks about
  once every four hours
-rc6-mm1: solid lock on X startup
 +your patch: screen goes black, turns off and on a few times during
  startup, can reboot with sysrq-b
  
  Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
  me,
  it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
  change matters.
 
 This patch probably changes behaviour how the pages are queued on the list
 somehow. Maybe it's insane to suggest everybody with similar problem to try
 LIST_DEBUG, but just give it a try after having one of the patches applied ;).
 (Or have you tried yet?)

Haven't tried LIST_DEBUG yet.  I'm spending most of the weekend camping, so
will likely be unable to test that until Monday-ish (unless I get lucky and can
get a test in during the next 2 hours)...


pgpLdN1M63Tfr.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
 On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said:
 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
   -rc4-mm1: solid lock on X shutdown, random solid locks about
 once every four hours
   -rc6-mm1: solid lock on X startup
+your patch: screen goes black, turns off and on a few times during
 startup, can reboot with sysrq-b
 
 Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For me,
 it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
 change matters.

This patch probably changes behaviour how the pages are queued on the list
somehow. Maybe it's insane to suggest everybody with similar problem to try
LIST_DEBUG, but just give it a try after having one of the patches applied ;).
(Or have you tried yet?)

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Valdis . Kletnieks
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
 On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:

  Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
  me,
  it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
  change matters.
 
 This patch probably changes behaviour how the pages are queued on the list
 somehow. Maybe it's insane to suggest everybody with similar problem to try
 LIST_DEBUG, but just give it a try after having one of the patches applied ;).
 (Or have you tried yet?)

OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it 
hit:

[  198.925000] list_del corruption. next-prev should be 81000118f178, but 
was 8067e050
[  198.925000] [ cut here ]
[  198.925000] kernel BUG at lib/list_debug.c:72!
[  198.925000] invalid opcode:  [1] PREEMPT SMP 
[  198.925000] last sysfs file: 
/devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev
[  198.925000] CPU 1 
[  198.925000] Modules linked in:

(Yes, I wish I got a backtrace, but that's as long as it lived.  Apparently,
the netconsole stuff actually writing this stuff out was over on CPU0 which then
proceeded to croak).

Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters)


pgp0DGzUXtfnR.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 09:33 PM, [EMAIL PROTECTED] wrote:
 On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said:
 On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote:
 
 Hmm.. maybe I'm chasing a different bug manifested by the same patch.  For 
 me,
 it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't
 change matters.
 This patch probably changes behaviour how the pages are queued on the list
 somehow. Maybe it's insane to suggest everybody with similar problem to try
 LIST_DEBUG, but just give it a try after having one of the patches applied 
 ;).
 (Or have you tried yet?)
 
 OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it 
 hit:
 
 [  198.925000] list_del corruption. next-prev should be 81000118f178, 
 but was 8067e050
 [  198.925000] [ cut here ]
 [  198.925000] kernel BUG at lib/list_debug.c:72!
 [  198.925000] invalid opcode:  [1] PREEMPT SMP 
 [  198.925000] last sysfs file: 
 /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev
 [  198.925000] CPU 1 
 [  198.925000] Modules linked in:
 
 (Yes, I wish I got a backtrace, but that's as long as it lived.  Apparently,
 the netconsole stuff actually writing this stuff out was over on CPU0 which 
 then
 proceeded to croak).
 
 Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters)

It is rather the other user who adds the page to some other list while being at
deferred_pages list. Could you try my debug patch
(http://lkml.org/lkml/2007/9/19/141)?

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-21 Thread Jiri Slaby
On 09/21/2007 09:38 PM, Jiri Slaby wrote:
 It is rather the other user who adds the page to some other list while being 
 at
 deferred_pages list. Could you try my debug patch
 (http://lkml.org/lkml/2007/9/19/141)?

or the whitespace non-damaged version:
http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Zhenyu Wang
On 2007.09.21 00:10:26 +, Jiri Slaby wrote:
> > Could you try current xf86-video-intel driver? just do
> > git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel
> 
> It works! 

yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp.
So for G33 user, you should upgrade both to be able to work correctly.

> 3d problem, but it has maybe nothing to do with kernel:
> $ glxinfo
> name of display: :0.0
> Unrecognized deviceID 29c2
> X Error of failed request:  GLXBadContext
> ...

It looks you have an old version of mesa, that i915 dri driver doesn't know
your chipset. Try mesa-7.X.

I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread
and try Dave's patch.

Thanks for testing!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen
On Thu, Sep 20, 2007 at 06:31:14PM -0500, Matt Mackall wrote:
> On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
> > > It's broken for me.
> > > 
> > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
> > >   -rc4-mm1: solid lock on X shutdown, random solid locks about
> > > once every four hours
> > >   -rc6-mm1: solid lock on X startup
> > >+your patch: screen goes black, turns off and on a few times during
> > > startup, can reboot with sysrq-b
> > 
> > Does it work with my simple dumb patch instead of Dave's ? 
> 
> Sorry, forgot to mention: your one-liner flush also doesn't work (same
> behavior).
> 
> I suspect I'm tripping two things and the flushing thing fixes one but
> not the other.

Full bisect needed then I guess. Ok as a short cut you could perhaps
the cpa-* patches first (might need to drop some later depending 
patches), then the drm and agp trees.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall
On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
> > It's broken for me.
> > 
> > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
> >   -rc4-mm1: solid lock on X shutdown, random solid locks about
> > once every four hours
> >   -rc6-mm1: solid lock on X startup
> >+your patch: screen goes black, turns off and on a few times during
> > startup, can reboot with sysrq-b
> 
> Does it work with my simple dumb patch instead of Dave's ? 

Sorry, forgot to mention: your one-liner flush also doesn't work (same
behavior).

I suspect I'm tripping two things and the flushing thing fixes one but
not the other.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen
> It's broken for me.
> 
> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
>   -rc4-mm1: solid lock on X shutdown, random solid locks about
> once every four hours
>   -rc6-mm1: solid lock on X startup
>+your patch: screen goes black, turns off and on a few times during
> startup, can reboot with sysrq-b

Does it work with my simple dumb patch instead of Dave's ? 

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Dave Airlie
> > But now I'm talking about another issue -- a regression since rc4-mm1, 
> > where X
> > server is unable to bind agp memory (those x logs above). The clflush issue 
> > has
> > solved andi in
> > http://lkml.org/lkml/2007/9/19/334
> > recently
>
> Tried that, my laptop still bricks the instant X starts up and the NVidia 
> driver
> tries to initialize.  Not even sysrq-foo works. Time to power-cycle.
>

I'd expect the binary to be doing something stupid with it's flushing
and relying on the kernel to do something it no longer does.. so this
is most likely a case of not fixable..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 11:24 AM, Zhenyu Wang wrote:
> On 2007.09.20 17:33:45 +, Dave Airlie wrote:
>>> Maybe you are rather interested in these dmesg lines:
>>> Linux agpgart interface v0.102
>>> agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
>>> X.Org
>>> on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
>>> agpgart: Detected an Intel G33 Chipset.
>>> agpgart: Detected 8192K stolen memory.
>>> agpgart: AGP aperture is 256M @ 0xd000
>>> [drm] Initialized drm 1.1.0 20060810
>>> ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
>>> [drm] Initialized i915 1.6.0 20060119 on minor 0
>>> ...
>>> set status page addr 0x00033000
>>> agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
>>> agpgart: Trying to insert into local/stolen memory
>>>
>>> So the problem is, that X passes too low start.
>>>
>>> The X log:
>>> http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old
> 
> Could you try current xf86-video-intel driver? just do
> git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel

It works! 3d problem, but it has maybe nothing to do with kernel:
$ glxinfo
name of display: :0.0
Unrecognized deviceID 29c2
X Error of failed request:  GLXBadContext
...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall
On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote:
> > The code is broken anyways. If you free pages without flushing
> > them first some other innocent user allocating them will end up
> > with possible uncached pages for some time.
> >
> > Does this simple patch help?
> >
> 
> I've attached a more complicated patch that does a 2 stage effort to
> unmapping and freeing pages. My kernel no longer hangs with this
> patch...
> 
> Jiri can you confirm?

It's broken for me.

2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
  -rc4-mm1: solid lock on X shutdown, random solid locks about
once every four hours
  -rc6-mm1: solid lock on X startup
   +your patch: screen goes black, turns off and on a few times during
startup, can reboot with sysrq-b

Video is:

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250
[Mobility FireGL 9000] (rev 02)


-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Valdis . Kletnieks
On Wed, 19 Sep 2007 22:47:41 +0200, Jiri Slaby said:
> On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote:

> > That would probably have been me, saying that x86_64-mm-cpa-clflush.patch 
> > broke
> > the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers 
> > as
> > well?
> 
> Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected
> in my case):
> http://lkml.org/lkml/2007/9/9/51
> 
> But now I'm talking about another issue -- a regression since rc4-mm1, where X
> server is unable to bind agp memory (those x logs above). The clflush issue 
> has
> solved andi in
> http://lkml.org/lkml/2007/9/19/334
> recently

Tried that, my laptop still bricks the instant X starts up and the NVidia driver
tries to initialize.  Not even sysrq-foo works. Time to power-cycle.


pgpPjjghAuo6l.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Zhenyu Wang
On 2007.09.20 17:33:45 +, Dave Airlie wrote:
> > Maybe you are rather interested in these dmesg lines:
> > Linux agpgart interface v0.102
> > agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
> > X.Org
> > on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
> > agpgart: Detected an Intel G33 Chipset.
> > agpgart: Detected 8192K stolen memory.
> > agpgart: AGP aperture is 256M @ 0xd000
> > [drm] Initialized drm 1.1.0 20060810
> > ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
> > [drm] Initialized i915 1.6.0 20060119 on minor 0
> > ...
> > set status page addr 0x00033000
> > agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
> > agpgart: Trying to insert into local/stolen memory
> >
> > So the problem is, that X passes too low start.
> >
> > The X log:
> > http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old

Could you try current xf86-video-intel driver? just do
git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel

My G33 was just verified broken today... so I'll try to reproduce it on
another one tomorrow.

> 
> I've cc'd Zhenyu who might be able to shed some light on this? can you
> try 2.6.23-rc7 as maybe the G33 support still needs some work.. or
> maybe I'm missing a patch in the drm..
> 

yep, should try 2.6.23-rc7 first, and it seems not drm relate.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Dave Airlie
> >> Fatal server error:
> >> Couldn't bind memory for front buffer
> >>
> >> I thought I'd seen a thread about this issue, but I can't find it now. Is 
> >> it
> >> known or am I seeing ghosts yet, Andrew?
> >>
> >
> > Can you send me a complete Xorg log file?
>
> Maybe you are rather interested in these dmesg lines:
> Linux agpgart interface v0.102
> agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
> X.Org
> on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
> agpgart: Detected an Intel G33 Chipset.
> agpgart: Detected 8192K stolen memory.
> agpgart: AGP aperture is 256M @ 0xd000
> [drm] Initialized drm 1.1.0 20060810
> ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
> [drm] Initialized i915 1.6.0 20060119 on minor 0
> ...
> set status page addr 0x00033000
> agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
> agpgart: Trying to insert into local/stolen memory
>
> So the problem is, that X passes too low start.
>
> The X log:
> http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old
>

I've cc'd Zhenyu who might be able to shed some light on this? can you
try 2.6.23-rc7 as maybe the G33 support still needs some work.. or
maybe I'm missing a patch in the drm..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 03:51 AM, Dave Airlie wrote:
> On 9/20/07, Jiri Slaby <[EMAIL PROTECTED]> wrote:
>> On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release).
>>> What do you mean with not run?
>> (II) intel(0): Initializing HW Cursor
>> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
>> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
>> at offset 0x5ff000 failed (Invalid argument)
>>
>> Fatal server error:
>> Couldn't bind memory for front buffer
>>
>> I thought I'd seen a thread about this issue, but I can't find it now. Is it
>> known or am I seeing ghosts yet, Andrew?
>>
> 
> Can you send me a complete Xorg log file?

Maybe you are rather interested in these dmesg lines:
Linux agpgart interface v0.102
agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org
on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
agpgart: Detected an Intel G33 Chipset.
agpgart: Detected 8192K stolen memory.
agpgart: AGP aperture is 256M @ 0xd000
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16
[drm] Initialized i915 1.6.0 20060119 on minor 0
...
set status page addr 0x00033000
agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
agpgart: Trying to insert into local/stolen memory

So the problem is, that X passes too low start.

The X log:
http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old

> and lspci -vv?
# lspci -vvx
00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
Subsystem: Intel Corporation DRAM Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort- SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort- SERR- TAbort-
Reset- FastB2B-
Capabilities: [50] Subsystem: Intel Corporation 82801 PCI Bridge
00: 86 80 4e 24 06 01 10 00 92 01 04 06 00 00 01 00
10: 00 00 00 00 00 00 00 00 00 01 01 20 f0 00 80 22
20: 60 ff 60 ff f1 ff 01 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 02 00

00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02)
Subsystem: Intel Corporation LPC Interface Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort-
SERR- TAbort- SERR- TAbort-
SERR-  my 945 works fine with my drm tree on top of Linus with clflush + my
> agp fix I just sent out ..

Should I try?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 04:24 AM, Andrew Morton wrote:
> On Thu, 20 Sep 2007 11:42:29 +1000 "Dave Airlie" <[EMAIL PROTECTED]> wrote:
> 
>> From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
>> From: Dave Airlie <[EMAIL PROTECTED]>
>> Date: Thu, 20 Sep 2007 11:30:41 +1000
>> Subject: [PATCH] agp: fix race condition between unmapping and freeing pages
> 
> This fixes the hang-when-quitting-X on the Vaio.

Checked.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 04:24 AM, Andrew Morton wrote:
 On Thu, 20 Sep 2007 11:42:29 +1000 Dave Airlie [EMAIL PROTECTED] wrote:
 
 From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
 From: Dave Airlie [EMAIL PROTECTED]
 Date: Thu, 20 Sep 2007 11:30:41 +1000
 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages
 
 This fixes the hang-when-quitting-X on the Vaio.

Checked.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 03:51 AM, Dave Airlie wrote:
 On 9/20/07, Jiri Slaby [EMAIL PROTECTED] wrote:
 On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release).
 What do you mean with not run?
 (II) intel(0): Initializing HW Cursor
 (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
 (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
 at offset 0x5ff000 failed (Invalid argument)

 Fatal server error:
 Couldn't bind memory for front buffer

 I thought I'd seen a thread about this issue, but I can't find it now. Is it
 known or am I seeing ghosts yet, Andrew?

 
 Can you send me a complete Xorg log file?

Maybe you are rather interested in these dmesg lines:
Linux agpgart interface v0.102
agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org
on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
agpgart: Detected an Intel G33 Chipset.
agpgart: Detected 8192K stolen memory.
agpgart: AGP aperture is 256M @ 0xd000
[drm] Initialized drm 1.1.0 20060810
ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16
[drm] Initialized i915 1.6.0 20060119 on minor 0
...
set status page addr 0x00033000
agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
agpgart: Trying to insert into local/stolen memory

So the problem is, that X passes too low start.

The X log:
http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old

 and lspci -vv?
# lspci -vvx
00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02)
Subsystem: Intel Corporation DRAM Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort-
MAbort+ SERR- PERR-
Latency: 0
Capabilities: [e0] Vendor Specific Information
00: 86 80 c0 29 06 00 90 20 02 00 00 06 00 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c0 29
30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00

00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics
Controller (rev 02) (prog-if 00 [VGA])
Subsystem: Intel Corporation Integrated Graphics Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort-
MAbort- SERR- PERR-
Latency: 0
Interrupt: pin A routed to IRQ 16
Region 0: Memory at ffa8 (32-bit, non-prefetchable) [size=512K]
Region 1: I/O ports at ec00 [size=8]
Region 2: Memory at d000 (32-bit, prefetchable) [size=256M]
Region 3: Memory at ff90 (32-bit, non-prefetchable) [size=1M]
Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0
Enable-
Address:   Data: 
Capabilities: [d0] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00: 86 80 c2 29 07 00 90 00 02 00 00 03 00 00 00 00
10: 00 00 a8 ff 01 ec 00 00 08 00 00 d0 00 00 90 ff
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c2 29
30: 00 00 00 00 90 00 00 00 00 00 00 00 0a 01 00 00

00:03.0 Communication controller: Intel Corporation MEI Controller (rev 02)
Subsystem: Intel Corporation MEI Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort-
MAbort- SERR- PERR-
Latency: 0
Interrupt: pin A routed to IRQ 10
Region 0: Memory at ffa7bc00 (64-bit, non-prefetchable) [size=16]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [8c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0
Enable-
Address:   Data: 
00: 86 80 c4 29 06 00 10 00 02 00 80 07 00 00 80 00
10: 04 bc a7 ff 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c4 29
30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 00 00

00:03.2 IDE interface: Intel Corporation PT IDER Controller (rev 02) (prog-if 85
[Master SecO PriO])
Subsystem: Intel Corporation PT IDER Controller
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort-
MAbort- SERR- PERR-
Latency: 0
Interrupt: pin C routed to IRQ 12
Region 0: I/O ports at e880 [size=8]
Region 1: I/O ports at e800 [size=4]
Region 2: I/O ports at e480 [size=8]
Region 3: I/O ports at e400 [size=4]

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Dave Airlie
  Fatal server error:
  Couldn't bind memory for front buffer
 
  I thought I'd seen a thread about this issue, but I can't find it now. Is 
  it
  known or am I seeing ghosts yet, Andrew?
 
 
  Can you send me a complete Xorg log file?

 Maybe you are rather interested in these dmesg lines:
 Linux agpgart interface v0.102
 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
 X.Org
 on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
 agpgart: Detected an Intel G33 Chipset.
 agpgart: Detected 8192K stolen memory.
 agpgart: AGP aperture is 256M @ 0xd000
 [drm] Initialized drm 1.1.0 20060810
 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16
 [drm] Initialized i915 1.6.0 20060119 on minor 0
 ...
 set status page addr 0x00033000
 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
 agpgart: Trying to insert into local/stolen memory

 So the problem is, that X passes too low start.

 The X log:
 http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old


I've cc'd Zhenyu who might be able to shed some light on this? can you
try 2.6.23-rc7 as maybe the G33 support still needs some work.. or
maybe I'm missing a patch in the drm..

Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Zhenyu Wang
On 2007.09.20 17:33:45 +, Dave Airlie wrote:
  Maybe you are rather interested in these dmesg lines:
  Linux agpgart interface v0.102
  agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
  X.Org
  on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
  agpgart: Detected an Intel G33 Chipset.
  agpgart: Detected 8192K stolen memory.
  agpgart: AGP aperture is 256M @ 0xd000
  [drm] Initialized drm 1.1.0 20060810
  ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16
  [drm] Initialized i915 1.6.0 20060119 on minor 0
  ...
  set status page addr 0x00033000
  agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
  agpgart: Trying to insert into local/stolen memory
 
  So the problem is, that X passes too low start.
 
  The X log:
  http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old

Could you try current xf86-video-intel driver? just do
git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel

My G33 was just verified broken today... so I'll try to reproduce it on
another one tomorrow.

 
 I've cc'd Zhenyu who might be able to shed some light on this? can you
 try 2.6.23-rc7 as maybe the G33 support still needs some work.. or
 maybe I'm missing a patch in the drm..
 

yep, should try 2.6.23-rc7 first, and it seems not drm relate.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Valdis . Kletnieks
On Wed, 19 Sep 2007 22:47:41 +0200, Jiri Slaby said:
 On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote:

  That would probably have been me, saying that x86_64-mm-cpa-clflush.patch 
  broke
  the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers 
  as
  well?
 
 Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected
 in my case):
 http://lkml.org/lkml/2007/9/9/51
 
 But now I'm talking about another issue -- a regression since rc4-mm1, where X
 server is unable to bind agp memory (those x logs above). The clflush issue 
 has
 solved andi in
 http://lkml.org/lkml/2007/9/19/334
 recently

Tried that, my laptop still bricks the instant X starts up and the NVidia driver
tries to initialize.  Not even sysrq-foo works. Time to power-cycle.


pgpPjjghAuo6l.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall
On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote:
  The code is broken anyways. If you free pages without flushing
  them first some other innocent user allocating them will end up
  with possible uncached pages for some time.
 
  Does this simple patch help?
 
 
 I've attached a more complicated patch that does a 2 stage effort to
 unmapping and freeing pages. My kernel no longer hangs with this
 patch...
 
 Jiri can you confirm?

It's broken for me.

2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
  -rc4-mm1: solid lock on X shutdown, random solid locks about
once every four hours
  -rc6-mm1: solid lock on X startup
   +your patch: screen goes black, turns off and on a few times during
startup, can reboot with sysrq-b

Video is:

01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250
[Mobility FireGL 9000] (rev 02)


-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Jiri Slaby
On 09/20/2007 11:24 AM, Zhenyu Wang wrote:
 On 2007.09.20 17:33:45 +, Dave Airlie wrote:
 Maybe you are rather interested in these dmesg lines:
 Linux agpgart interface v0.102
 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup 
 X.Org
 on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c)
 agpgart: Detected an Intel G33 Chipset.
 agpgart: Detected 8192K stolen memory.
 agpgart: AGP aperture is 256M @ 0xd000
 [drm] Initialized drm 1.1.0 20060810
 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16
 [drm] Initialized i915 1.6.0 20060119 on minor 0
 ...
 set status page addr 0x00033000
 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800
 agpgart: Trying to insert into local/stolen memory

 So the problem is, that X passes too low start.

 The X log:
 http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old
 
 Could you try current xf86-video-intel driver? just do
 git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel

It works! 3d problem, but it has maybe nothing to do with kernel:
$ glxinfo
name of display: :0.0
Unrecognized deviceID 29c2
X Error of failed request:  GLXBadContext
...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Dave Airlie
  But now I'm talking about another issue -- a regression since rc4-mm1, 
  where X
  server is unable to bind agp memory (those x logs above). The clflush issue 
  has
  solved andi in
  http://lkml.org/lkml/2007/9/19/334
  recently

 Tried that, my laptop still bricks the instant X starts up and the NVidia 
 driver
 tries to initialize.  Not even sysrq-foo works. Time to power-cycle.


I'd expect the binary to be doing something stupid with it's flushing
and relying on the kernel to do something it no longer does.. so this
is most likely a case of not fixable..

Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen
 It's broken for me.
 
 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
   -rc4-mm1: solid lock on X shutdown, random solid locks about
 once every four hours
   -rc6-mm1: solid lock on X startup
+your patch: screen goes black, turns off and on a few times during
 startup, can reboot with sysrq-b

Does it work with my simple dumb patch instead of Dave's ? 

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Matt Mackall
On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
  It's broken for me.
  
  2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
-rc4-mm1: solid lock on X shutdown, random solid locks about
  once every four hours
-rc6-mm1: solid lock on X startup
 +your patch: screen goes black, turns off and on a few times during
  startup, can reboot with sysrq-b
 
 Does it work with my simple dumb patch instead of Dave's ? 

Sorry, forgot to mention: your one-liner flush also doesn't work (same
behavior).

I suspect I'm tripping two things and the flushing thing fixes one but
not the other.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Andi Kleen
On Thu, Sep 20, 2007 at 06:31:14PM -0500, Matt Mackall wrote:
 On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote:
   It's broken for me.
   
   2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading)
 -rc4-mm1: solid lock on X shutdown, random solid locks about
   once every four hours
 -rc6-mm1: solid lock on X startup
  +your patch: screen goes black, turns off and on a few times during
   startup, can reboot with sysrq-b
  
  Does it work with my simple dumb patch instead of Dave's ? 
 
 Sorry, forgot to mention: your one-liner flush also doesn't work (same
 behavior).
 
 I suspect I'm tripping two things and the flushing thing fixes one but
 not the other.

Full bisect needed then I guess. Ok as a short cut you could perhaps
the cpa-* patches first (might need to drop some later depending 
patches), then the drm and agp trees.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-20 Thread Zhenyu Wang
On 2007.09.21 00:10:26 +, Jiri Slaby wrote:
  Could you try current xf86-video-intel driver? just do
  git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel
 
 It works! 

yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp.
So for G33 user, you should upgrade both to be able to work correctly.

 3d problem, but it has maybe nothing to do with kernel:
 $ glxinfo
 name of display: :0.0
 Unrecognized deviceID 29c2
 X Error of failed request:  GLXBadContext
 ...

It looks you have an old version of mesa, that i915 dri driver doesn't know
your chipset. Try mesa-7.X.

I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread
and try Dave's patch.

Thanks for testing!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Thu, 20 Sep 2007 11:42:29 +1000 "Dave Airlie" <[EMAIL PROTECTED]> wrote:

> From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
> From: Dave Airlie <[EMAIL PROTECTED]>
> Date: Thu, 20 Sep 2007 11:30:41 +1000
> Subject: [PATCH] agp: fix race condition between unmapping and freeing pages

This fixes the hang-when-quitting-X on the Vaio.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
On 9/20/07, Jiri Slaby <[EMAIL PROTECTED]> wrote:
> On 09/19/2007 09:54 PM, Andi Kleen wrote:
> >> Yeah. (But X doesn't run -- this is maybe the known issue in this release).
> >
> > What do you mean with not run?
>
> (II) intel(0): Initializing HW Cursor
> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
> at offset 0x5ff000 failed (Invalid argument)
>
> Fatal server error:
> Couldn't bind memory for front buffer
>
> I thought I'd seen a thread about this issue, but I can't find it now. Is it
> known or am I seeing ghosts yet, Andrew?
>

Can you send me a complete Xorg log file?

and lspci -vv?

my 945 works fine with my drm tree on top of Linus with clflush + my
agp fix I just sent out ..

Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
> The code is broken anyways. If you free pages without flushing
> them first some other innocent user allocating them will end up
> with possible uncached pages for some time.
>
> Does this simple patch help?
>

I've attached a more complicated patch that does a 2 stage effort to
unmapping and freeing pages. My kernel no longer hangs with this
patch...

Jiri can you confirm?

I'll look at the other issue separately..

Dave.
From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
From: Dave Airlie <[EMAIL PROTECTED]>
Date: Thu, 20 Sep 2007 11:30:41 +1000
Subject: [PATCH] agp: fix race condition between unmapping and freeing pages

With Andi's clflush fixup, we were getting hangs on server exit, flushing
the mappings after freeing each page helped.

This showed up a race condition where the pages after being freed could be
reused before the agp mappings had been flushed. Flushing after each single
page is a bad thing for future drm work, so make the page destroy a two pass
unmapping all the pages, flushing the mappings, and then destroying the pages.

Signed-off-by: Dave Airlie <[EMAIL PROTECTED]>
---
 drivers/char/agp/agp.h   |7 +--
 drivers/char/agp/ali-agp.c   |   29 +
 drivers/char/agp/backend.c   |   12 
 drivers/char/agp/generic.c   |   20 ++--
 drivers/char/agp/i460-agp.c  |4 ++--
 drivers/char/agp/intel-agp.c |6 --
 6 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index 8955e7f..b83824c 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -58,6 +58,9 @@ struct gatt_mask {
 	 * devices this will probably be ignored */
 };
 
+#define AGP_PAGE_DESTROY_UNMAP 1
+#define AGP_PAGE_DESTROY_FREE 2
+
 struct aper_size_info_8 {
 	int size;
 	int num_entries;
@@ -113,7 +116,7 @@ struct agp_bridge_driver {
 	struct agp_memory *(*alloc_by_type) (size_t, int);
 	void (*free_by_type)(struct agp_memory *);
 	void *(*agp_alloc_page)(struct agp_bridge_data *);
-	void (*agp_destroy_page)(void *);
+	void (*agp_destroy_page)(void *, int flags);
 int (*agp_type_to_mask_type) (struct agp_bridge_data *, int);
 };
 
@@ -267,7 +270,7 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type);
 struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type);
 void agp_generic_free_by_type(struct agp_memory *curr);
 void *agp_generic_alloc_page(struct agp_bridge_data *bridge);
-void agp_generic_destroy_page(void *addr);
+void agp_generic_destroy_page(void *addr, int flags);
 void agp_free_key(int key);
 int agp_num_entries(void);
 u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command);
diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c
index 4941ddb..2b65155 100644
--- a/drivers/char/agp/ali-agp.c
+++ b/drivers/char/agp/ali-agp.c
@@ -156,29 +156,34 @@ static void *m1541_alloc_page(struct agp_bridge_data *bridge)
 	return addr;
 }
 
-static void ali_destroy_page(void * addr)
+static void ali_destroy_page(void * addr, int flags)
 {
 	if (addr) {
-		global_cache_flush();	/* is this really needed?  --hch */
-		agp_generic_destroy_page(addr);
-		global_flush_tlb();
+		if (flags & AGP_PAGE_DESTROY_UNMAP) {
+			global_cache_flush();	/* is this really needed?  --hch */
+			agp_generic_destroy_page(addr, flags);
+			global_flush_tlb();
+		} else
+			agp_generic_destroy_page(addr, flags);
 	}
 }
 
-static void m1541_destroy_page(void * addr)
+static void m1541_destroy_page(void * addr, int flags)
 {
 	u32 temp;
 
 	if (addr == NULL)
 		return;
 
-	global_cache_flush();
-
-	pci_read_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, );
-	pci_write_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL,
-			(((temp & ALI_CACHE_FLUSH_ADDR_MASK) |
-			  virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN));
-	agp_generic_destroy_page(addr);
+	if (flags & AGP_PAGE_DESTROY_UNMAP) {
+		global_cache_flush();
+	  
+		pci_read_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, );
+		pci_write_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL,
+   (((temp & ALI_CACHE_FLUSH_ADDR_MASK) |
+	 virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN));
+	}
+	agp_generic_destroy_page(addr, flags);
 }
 
 
diff --git a/drivers/char/agp/backend.c b/drivers/char/agp/backend.c
index 1b47c89..832ded2 100644
--- a/drivers/char/agp/backend.c
+++ b/drivers/char/agp/backend.c
@@ -189,9 +189,11 @@ static int agp_backend_initialize(struct agp_bridge_data *bridge)
 
 err_out:
 	if (bridge->driver->needs_scratch_page) {
-		bridge->driver->agp_destroy_page(
-gart_to_virt(bridge->scratch_page_real));
+		bridge->driver->agp_destroy_page(gart_to_virt(bridge->scratch_page_real),
+		 AGP_PAGE_DESTROY_UNMAP);
 		flush_agp_mappings();
+		bridge->driver->agp_destroy_page(gart_to_virt(bridge->scratch_page_real),
+		 AGP_PAGE_DESTROY_FREE);
 	}
 	if (got_gatt)
 		bridge->driver->free_gatt_table(bridge);
@@ -215,9 +217,11 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
On 9/20/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
> > On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote:
> >
> > > -8<-8<-8<-8<-8<-8<
> > > That means
> > > void agp_generic_destroy_page(void *addr)
> > > {
> > > struct page *page;
> > >
> > > if (addr == NULL)
> > > return;
> > >
> > > page = virt_to_page(addr);
> > > (1) unmap_page_from_agp(page);
> > > put_page(page);
> > > (2) free_page((unsigned long)addr);
> > > atomic_dec(_bridge->current_memory_agp);
> > > }
> > >
> > > (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr ->
> > > __change_page_attr -> save_page -> list_add(>lru, _pages);
> > > (2) free_page -> free_pages -> __free_pages -> free_hot_page ->
> > > free_hot_cold_page -> list_add(>lru, >list);
> >
> > that'll hurt.
> >
> > > any ideas how to fix this?
> >
> > We should hold a single reference on the page for its membership in
> > deferred_pages.
>
> The code is broken anyways. If you free pages without flushing
> them first some other innocent user allocating them will end up
> with possible uncached pages for some time.
>
> Does this simple patch help?
>
> -Andi
>
>
> Flush uncached AGP pages before freeing

In theory this should be handled by the caller, so as to avoid the
overhead of continuous flushing however I can see a potential race
condition here if the pages are put back into the kernel before the
caller flushes the mappings..

Do we need some sort of two step approach here? as flushing after each
page would be a major overhead for dynamic agp stuff in the new memory
manager..

Dave.

>
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>
>
> Index: linux/drivers/char/agp/generic.c
> ===
> --- linux.orig/drivers/char/agp/generic.c
> +++ linux/drivers/char/agp/generic.c
> @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr
>
> page = virt_to_page(addr);
> unmap_page_from_agp(page);
> +   flush_agp_mappings();
> put_page(page);
> free_page((unsigned long)addr);
> atomic_dec(_bridge->current_memory_agp);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Wed, 19 Sep 2007 22:01:59 +0200
Jiri Slaby <[EMAIL PROTECTED]> wrote:

> On 09/19/2007 09:57 PM, Jiri Slaby wrote:
> > On 09/19/2007 09:54 PM, Andi Kleen wrote:
> >>> Yeah. (But X doesn't run -- this is maybe the known issue in this 
> >>> release).
> >> What do you mean with not run? 
> > 
> > (II) intel(0): Initializing HW Cursor
> > (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
> > (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
> > at offset 0x5ff000 failed (Invalid argument)
> > 
> > Fatal server error:
> > Couldn't bind memory for front buffer
> 
> Further info:
> 4690  write(0, "(II) intel(0): Initializing HW C"..., 38) = 38
> 4690  write(0, "(II) intel(0): xf86BindGARTMemor"..., 76) = 76
> 4690  ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument)
> 4690  write(0, "(WW) intel(0): xf86BindGARTMemor"..., 115) = 115
> 4690  write(2, "\nFatal server error:\n", 21) = 21
> 

This might be a Dave thing and not an Andi thing.

In my usual -mm-testing I only test X on the one machine (the Vaio,
natch).  Check that it runs glxgears, check suspend/resume to mem and disk.
It has intel graphics and I'm not seeing any such problems.

Have you time to bisect it? 
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
describes how.

As a quick test, perhaps build a tree with just
2.6.23-rc6+origin.patch+git-drm.patch?  Fortunately git-agpgart.patch is
presently empty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote:
> On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said:
>> On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release)
> .
>>> What do you mean with not run? 
>> (II) intel(0): Initializing HW Cursor
>> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
>> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
>> at offset 0x5ff000 failed (Invalid argument)
>>
>> Fatal server error:
>> Couldn't bind memory for front buffer
>>
>> I thought I'd seen a thread about this issue, but I can't find it now. Is it
>> known or am I seeing ghosts yet, Andrew?
> 
> That would probably have been me, saying that x86_64-mm-cpa-clflush.patch 
> broke
> the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers as
> well?

Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected
in my case):
http://lkml.org/lkml/2007/9/9/51

But now I'm talking about another issue -- a regression since rc4-mm1, where X
server is unable to bind agp memory (those x logs above). The clflush issue has
solved andi in
http://lkml.org/lkml/2007/9/19/334
recently

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Valdis . Kletnieks
On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said:
> On 09/19/2007 09:54 PM, Andi Kleen wrote:
> >> Yeah. (But X doesn't run -- this is maybe the known issue in this release)
.
> > 
> > What do you mean with not run? 
> 
> (II) intel(0): Initializing HW Cursor
> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
> at offset 0x5ff000 failed (Invalid argument)
> 
> Fatal server error:
> Couldn't bind memory for front buffer
> 
> I thought I'd seen a thread about this issue, but I can't find it now. Is it
> known or am I seeing ghosts yet, Andrew?

That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke
the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers as
well?



pgpsPLYEmEu99.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:57 PM, Jiri Slaby wrote:
> On 09/19/2007 09:54 PM, Andi Kleen wrote:
>>> Yeah. (But X doesn't run -- this is maybe the known issue in this release).
>> What do you mean with not run? 
> 
> (II) intel(0): Initializing HW Cursor
> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
> at offset 0x5ff000 failed (Invalid argument)
> 
> Fatal server error:
> Couldn't bind memory for front buffer

Further info:
4690  write(0, "(II) intel(0): Initializing HW C"..., 38) = 38
4690  write(0, "(II) intel(0): xf86BindGARTMemor"..., 76) = 76
4690  ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument)
4690  write(0, "(WW) intel(0): xf86BindGARTMemor"..., 115) = 115
4690  write(2, "\nFatal server error:\n", 21) = 21

> I thought I'd seen a thread about this issue, but I can't find it now. Is it
> known or am I seeing ghosts yet, Andrew?

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:54 PM, Andi Kleen wrote:
>> Yeah. (But X doesn't run -- this is maybe the known issue in this release).
> 
> What do you mean with not run? 

(II) intel(0): Initializing HW Cursor
(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
(WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
at offset 0x5ff000 failed (Invalid argument)

Fatal server error:
Couldn't bind memory for front buffer

I thought I'd seen a thread about this issue, but I can't find it now. Is it
known or am I seeing ghosts yet, Andrew?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andi Kleen
> Yeah. (But X doesn't run -- this is maybe the known issue in this release).

What do you mean with not run? 

-Andi

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:24 PM, Andi Kleen wrote:
> On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
>> On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote:
>>
>>> -8<-8<-8<-8<-8<-8<
>>> That means
>>> void agp_generic_destroy_page(void *addr)
>>> {
>>> struct page *page;
>>>
>>> if (addr == NULL)
>>> return;
>>>
>>> page = virt_to_page(addr);
>>> (1) unmap_page_from_agp(page);
>>> put_page(page);
>>> (2) free_page((unsigned long)addr);
>>> atomic_dec(_bridge->current_memory_agp);
>>> }
>>>
>>> (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr ->
>>> __change_page_attr -> save_page -> list_add(>lru, _pages);
>>> (2) free_page -> free_pages -> __free_pages -> free_hot_page ->
>>> free_hot_cold_page -> list_add(>lru, >list);
>> that'll hurt.
>>
>>> any ideas how to fix this?
>> We should hold a single reference on the page for its membership in
>> deferred_pages.
> 
> The code is broken anyways. If you free pages without flushing
> them first some other innocent user allocating them will end up
> with possible uncached pages for some time.
> 
> Does this simple patch help? 

Yeah. (But X doesn't run -- this is maybe the known issue in this release).

> Flush uncached AGP pages before freeing
> 
> Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Tested-by: Jiri Slaby <[EMAIL PROTECTED]>

> 
> Index: linux/drivers/char/agp/generic.c
> ===
> --- linux.orig/drivers/char/agp/generic.c
> +++ linux/drivers/char/agp/generic.c
> @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr
>  
>   page = virt_to_page(addr);
>   unmap_page_from_agp(page);
> + flush_agp_mappings();
>   put_page(page);
>   free_page((unsigned long)addr);
>   atomic_dec(_bridge->current_memory_agp);
> 

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andi Kleen
On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
> On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote:
> 
> > -8<-8<-8<-8<-8<-8<
> > That means
> > void agp_generic_destroy_page(void *addr)
> > {
> > struct page *page;
> > 
> > if (addr == NULL)
> > return;
> > 
> > page = virt_to_page(addr);
> > (1) unmap_page_from_agp(page);
> > put_page(page);
> > (2) free_page((unsigned long)addr);
> > atomic_dec(_bridge->current_memory_agp);
> > }
> > 
> > (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr ->
> > __change_page_attr -> save_page -> list_add(>lru, _pages);
> > (2) free_page -> free_pages -> __free_pages -> free_hot_page ->
> > free_hot_cold_page -> list_add(>lru, >list);
> 
> that'll hurt.
> 
> > any ideas how to fix this?
> 
> We should hold a single reference on the page for its membership in
> deferred_pages.

The code is broken anyways. If you free pages without flushing
them first some other innocent user allocating them will end up
with possible uncached pages for some time.

Does this simple patch help? 

-Andi


Flush uncached AGP pages before freeing

Signed-off-by: Andi Kleen <[EMAIL PROTECTED]>

Index: linux/drivers/char/agp/generic.c
===
--- linux.orig/drivers/char/agp/generic.c
+++ linux/drivers/char/agp/generic.c
@@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr
 
page = virt_to_page(addr);
unmap_page_from_agp(page);
+   flush_agp_mappings();
put_page(page);
free_page((unsigned long)addr);
atomic_dec(_bridge->current_memory_agp);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote:

> -8<-8<-8<-8<-8<-8<
> That means
> void agp_generic_destroy_page(void *addr)
> {
> struct page *page;
> 
> if (addr == NULL)
> return;
> 
> page = virt_to_page(addr);
> (1) unmap_page_from_agp(page);
> put_page(page);
> (2) free_page((unsigned long)addr);
> atomic_dec(_bridge->current_memory_agp);
> }
> 
> (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr ->
> __change_page_attr -> save_page -> list_add(>lru, _pages);
> (2) free_page -> free_pages -> __free_pages -> free_hot_page ->
> free_hot_cold_page -> list_add(>lru, >list);

that'll hurt.

> any ideas how to fix this?

We should hold a single reference on the page for its membership in
deferred_pages.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 01:53 PM, Jiri Slaby wrote:
> On 09/19/2007 01:43 PM, Jiri Slaby wrote:
>> On 09/18/2007 10:18 AM, Andrew Morton wrote:
>>> - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but
>>>   I didn't drop that patch because the iommu patch series depends on it.
>> No matter whether slub/slab is selected someone gets a page and moves/adds 
>> its
> 
> Oh, only adds, if it moves, it won't break the list. Going to check for
> POISON1/2 in __list_add, will get results (if any) in few moments...

Huh, it took longer than I expect :). Changed this:
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 836c218..cd8499c 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -112,7 +112,14 @@ static inline void save_page(struct page *fpage, int data)
return;
SetPageFlush(fpage);
}
+   printk("ADD (%s): E=%p, H=%p, H->N=%p, N->P=%p, N->N=%p; PREV0=%p, "
+   "NEXT0=%p, ",
+   current->comm, >lru,
+   _pages, deferred_pages.next,
+   deferred_pages.next->prev, deferred_pages.next->next,
+   fpage->lru.prev, fpage->lru.next);
list_add(>lru, _pages);
+   printk("PREV1=%p, NEXT1=%p\n", fpage->lru.prev, fpage->lru.next);
 }

 /*
@@ -274,6 +281,7 @@ void global_flush_tlb(void)
down_read(_mm.mmap_sem);
arg.full_flush = full_flush;
full_flush = 0;
+   printk("FLUSH\n");
list_replace_init(_pages, );
up_read(_mm.mmap_sem);

diff --git a/include/linux/list.h b/include/linux/list.h
index f29fc9c..1add963 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -265,6 +265,8 @@ static inline void list_del_init(struct list_head *entry)
 static inline void list_move(struct list_head *list, struct list_head *head)
 {
__list_del(list->prev, list->next);
+   list->next = LIST_POISON1;
+   list->prev = LIST_POISON2;
list_add(list, head);
 }

@@ -277,6 +279,8 @@ static inline void list_move_tail(struct list_head *list,
  struct list_head *head)
 {
__list_del(list->prev, list->next);
+   list->next = LIST_POISON1;
+   list->prev = LIST_POISON2;
list_add_tail(list, head);
 }

diff --git a/lib/list_debug.c b/lib/list_debug.c
index 4350ba9..57573d5 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -15,15 +15,34 @@
  * This is only for internal list manipulation where we know
  * the prev/next entries already!
  */
-
+#include 
 void __list_add(struct list_head *new,
  struct list_head *prev,
  struct list_head *next)
 {
+   static unsigned int a, b;
+   unsigned long off;
+
+   if (unlikely(!a && current && current->comm[0] == 'X'))
+   a++;
+
+   if (unlikely(a && !b && (void *)new >= (void *)mem_map &&
+   (void *)new < (void *)(mem_map + 1048576) &&
+   (new->prev != LIST_POISON2 || new->next != LIST_POISON1) &&
+   (new->prev != NULL || new->next != NULL) &&
+   (new->prev != new || new->next != new))) {
+   off = ((void *)new - (void *)mem_map) % sizeof(struct page);
+   if (off == offsetof(struct page, lru)) {
+   printk(KERN_DEBUG "POISONS (%p): "
+   "%p, %p\n", new, new->prev, new->next);
+   dump_stack();
+   }
+   }
if (unlikely(next->prev != prev)) {
printk(KERN_ERR "list_add corruption. next->prev should be "
"prev (%p), but was %p. (next=%p).\n",
prev, next->prev, next);
+   b++;
BUG();
}
if (unlikely(prev->next != next)) {

-8<-8<-8<-8<-8<-8<
and got this:
ADD (X): E=81000115b9e0, H=80673aa0, H->N=8100011573e0,
N->P=80673aa0, N->N=81000115ba18; PREV0=00200200,
NEXT0=00100100, PREV1=80673aa0, NEXT1=8100011573e0
/--\
this () was output from unmap path, see (1) below
and here () follows output from free_page path, see (2)
\--/
POISONS (81000115b9e0): 80673aa0, 8100011573e0

Call Trace:
 [] __list_add+0xf6/0x190
 [] list_add+0xc/0x10
 [] free_hot_cold_page+0xfd/0x170
 [] free_hot_page+0xb/0x10
 [] __free_pages+0x25/0x30
 [] free_pages+0x2b/0x40
 [] agp_generic_destroy_page+0x56/0x70
 [] agp_free_memory+0x65/0xd0
 [] agp_free_memory_wrap+0x39/0x60
 [] agp_release+0xdb/0x1c0
 [] __fput+0xc0/0x1b0
 [] fput+0x16/0x20
 [] filp_close+0x56/0x90
 [] put_files_struct+0xc2/0xd0
 [] do_exit+0x1e5/0x990
 [] do_group_exit+0x37/0x90
 [] sys_exit_group+0x12/0x20
 [] system_call+0x7e/0x83


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 01:43 PM, Jiri Slaby wrote:
> On 09/18/2007 10:18 AM, Andrew Morton wrote:
>> - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but
>>   I didn't drop that patch because the iommu patch series depends on it.
> 
> No matter whether slub/slab is selected someone gets a page and moves/adds its

Oh, only adds, if it moves, it won't break the list. Going to check for
POISON1/2 in __list_add, will get results (if any) in few moments...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 01:43 PM, Jiri Slaby wrote:
 On 09/18/2007 10:18 AM, Andrew Morton wrote:
 - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but
   I didn't drop that patch because the iommu patch series depends on it.
 
 No matter whether slub/slab is selected someone gets a page and moves/adds its

Oh, only adds, if it moves, it won't break the list. Going to check for
POISON1/2 in __list_add, will get results (if any) in few moments...

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 01:53 PM, Jiri Slaby wrote:
 On 09/19/2007 01:43 PM, Jiri Slaby wrote:
 On 09/18/2007 10:18 AM, Andrew Morton wrote:
 - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but
   I didn't drop that patch because the iommu patch series depends on it.
 No matter whether slub/slab is selected someone gets a page and moves/adds 
 its
 
 Oh, only adds, if it moves, it won't break the list. Going to check for
 POISON1/2 in __list_add, will get results (if any) in few moments...

Huh, it took longer than I expect :). Changed this:
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 836c218..cd8499c 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -112,7 +112,14 @@ static inline void save_page(struct page *fpage, int data)
return;
SetPageFlush(fpage);
}
+   printk(ADD (%s): E=%p, H=%p, H-N=%p, N-P=%p, N-N=%p; PREV0=%p, 
+   NEXT0=%p, ,
+   current-comm, fpage-lru,
+   deferred_pages, deferred_pages.next,
+   deferred_pages.next-prev, deferred_pages.next-next,
+   fpage-lru.prev, fpage-lru.next);
list_add(fpage-lru, deferred_pages);
+   printk(PREV1=%p, NEXT1=%p\n, fpage-lru.prev, fpage-lru.next);
 }

 /*
@@ -274,6 +281,7 @@ void global_flush_tlb(void)
down_read(init_mm.mmap_sem);
arg.full_flush = full_flush;
full_flush = 0;
+   printk(FLUSH\n);
list_replace_init(deferred_pages, arg.l);
up_read(init_mm.mmap_sem);

diff --git a/include/linux/list.h b/include/linux/list.h
index f29fc9c..1add963 100644
--- a/include/linux/list.h
+++ b/include/linux/list.h
@@ -265,6 +265,8 @@ static inline void list_del_init(struct list_head *entry)
 static inline void list_move(struct list_head *list, struct list_head *head)
 {
__list_del(list-prev, list-next);
+   list-next = LIST_POISON1;
+   list-prev = LIST_POISON2;
list_add(list, head);
 }

@@ -277,6 +279,8 @@ static inline void list_move_tail(struct list_head *list,
  struct list_head *head)
 {
__list_del(list-prev, list-next);
+   list-next = LIST_POISON1;
+   list-prev = LIST_POISON2;
list_add_tail(list, head);
 }

diff --git a/lib/list_debug.c b/lib/list_debug.c
index 4350ba9..57573d5 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -15,15 +15,34 @@
  * This is only for internal list manipulation where we know
  * the prev/next entries already!
  */
-
+#include linux/sched.h
 void __list_add(struct list_head *new,
  struct list_head *prev,
  struct list_head *next)
 {
+   static unsigned int a, b;
+   unsigned long off;
+
+   if (unlikely(!a  current  current-comm[0] == 'X'))
+   a++;
+
+   if (unlikely(a  !b  (void *)new = (void *)mem_map 
+   (void *)new  (void *)(mem_map + 1048576) 
+   (new-prev != LIST_POISON2 || new-next != LIST_POISON1) 
+   (new-prev != NULL || new-next != NULL) 
+   (new-prev != new || new-next != new))) {
+   off = ((void *)new - (void *)mem_map) % sizeof(struct page);
+   if (off == offsetof(struct page, lru)) {
+   printk(KERN_DEBUG POISONS (%p): 
+   %p, %p\n, new, new-prev, new-next);
+   dump_stack();
+   }
+   }
if (unlikely(next-prev != prev)) {
printk(KERN_ERR list_add corruption. next-prev should be 
prev (%p), but was %p. (next=%p).\n,
prev, next-prev, next);
+   b++;
BUG();
}
if (unlikely(prev-next != next)) {

-8-8-8-8-8-8
and got this:
ADD (X): E=81000115b9e0, H=80673aa0, H-N=8100011573e0,
N-P=80673aa0, N-N=81000115ba18; PREV0=00200200,
NEXT0=00100100, PREV1=80673aa0, NEXT1=8100011573e0
/--\
this () was output from unmap path, see (1) below
and here () follows output from free_page path, see (2)
\--/
POISONS (81000115b9e0): 80673aa0, 8100011573e0

Call Trace:
 [80328c06] __list_add+0xf6/0x190
 [80328cac] list_add+0xc/0x10
 [8026e46d] free_hot_cold_page+0xfd/0x170
 [8026e52b] free_hot_page+0xb/0x10
 [8026e555] __free_pages+0x25/0x30
 [8026e58b] free_pages+0x2b/0x40
 [8037cf16] agp_generic_destroy_page+0x56/0x70
 [8037d9d5] agp_free_memory+0x65/0xd0
 [8037c0e9] agp_free_memory_wrap+0x39/0x60
 [8037c79b] agp_release+0xdb/0x1c0
 [80291440] __fput+0xc0/0x1b0
 [802915b6] fput+0x16/0x20
 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote:

 -8-8-8-8-8-8
 That means
 void agp_generic_destroy_page(void *addr)
 {
 struct page *page;
 
 if (addr == NULL)
 return;
 
 page = virt_to_page(addr);
 (1) unmap_page_from_agp(page);
 put_page(page);
 (2) free_page((unsigned long)addr);
 atomic_dec(agp_bridge-current_memory_agp);
 }
 
 (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr -
 __change_page_attr - save_page - list_add(fpage-lru, deferred_pages);
 (2) free_page - free_pages - __free_pages - free_hot_page -
 free_hot_cold_page - list_add(page-lru, pcp-list);

that'll hurt.

 any ideas how to fix this?

We should hold a single reference on the page for its membership in
deferred_pages.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andi Kleen
On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
 On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote:
 
  -8-8-8-8-8-8
  That means
  void agp_generic_destroy_page(void *addr)
  {
  struct page *page;
  
  if (addr == NULL)
  return;
  
  page = virt_to_page(addr);
  (1) unmap_page_from_agp(page);
  put_page(page);
  (2) free_page((unsigned long)addr);
  atomic_dec(agp_bridge-current_memory_agp);
  }
  
  (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr -
  __change_page_attr - save_page - list_add(fpage-lru, deferred_pages);
  (2) free_page - free_pages - __free_pages - free_hot_page -
  free_hot_cold_page - list_add(page-lru, pcp-list);
 
 that'll hurt.
 
  any ideas how to fix this?
 
 We should hold a single reference on the page for its membership in
 deferred_pages.

The code is broken anyways. If you free pages without flushing
them first some other innocent user allocating them will end up
with possible uncached pages for some time.

Does this simple patch help? 

-Andi


Flush uncached AGP pages before freeing

Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Index: linux/drivers/char/agp/generic.c
===
--- linux.orig/drivers/char/agp/generic.c
+++ linux/drivers/char/agp/generic.c
@@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr
 
page = virt_to_page(addr);
unmap_page_from_agp(page);
+   flush_agp_mappings();
put_page(page);
free_page((unsigned long)addr);
atomic_dec(agp_bridge-current_memory_agp);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:24 PM, Andi Kleen wrote:
 On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
 On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote:

 -8-8-8-8-8-8
 That means
 void agp_generic_destroy_page(void *addr)
 {
 struct page *page;

 if (addr == NULL)
 return;

 page = virt_to_page(addr);
 (1) unmap_page_from_agp(page);
 put_page(page);
 (2) free_page((unsigned long)addr);
 atomic_dec(agp_bridge-current_memory_agp);
 }

 (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr -
 __change_page_attr - save_page - list_add(fpage-lru, deferred_pages);
 (2) free_page - free_pages - __free_pages - free_hot_page -
 free_hot_cold_page - list_add(page-lru, pcp-list);
 that'll hurt.

 any ideas how to fix this?
 We should hold a single reference on the page for its membership in
 deferred_pages.
 
 The code is broken anyways. If you free pages without flushing
 them first some other innocent user allocating them will end up
 with possible uncached pages for some time.
 
 Does this simple patch help? 

Yeah. (But X doesn't run -- this is maybe the known issue in this release).

 Flush uncached AGP pages before freeing
 
 Signed-off-by: Andi Kleen [EMAIL PROTECTED]

Tested-by: Jiri Slaby [EMAIL PROTECTED]

 
 Index: linux/drivers/char/agp/generic.c
 ===
 --- linux.orig/drivers/char/agp/generic.c
 +++ linux/drivers/char/agp/generic.c
 @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr
  
   page = virt_to_page(addr);
   unmap_page_from_agp(page);
 + flush_agp_mappings();
   put_page(page);
   free_page((unsigned long)addr);
   atomic_dec(agp_bridge-current_memory_agp);
 

thanks,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andi Kleen
 Yeah. (But X doesn't run -- this is maybe the known issue in this release).

What do you mean with not run? 

-Andi

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release).
 
 What do you mean with not run? 

(II) intel(0): Initializing HW Cursor
(II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
(WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
at offset 0x5ff000 failed (Invalid argument)

Fatal server error:
Couldn't bind memory for front buffer

I thought I'd seen a thread about this issue, but I can't find it now. Is it
known or am I seeing ghosts yet, Andrew?

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 09:57 PM, Jiri Slaby wrote:
 On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release).
 What do you mean with not run? 
 
 (II) intel(0): Initializing HW Cursor
 (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
 (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
 at offset 0x5ff000 failed (Invalid argument)
 
 Fatal server error:
 Couldn't bind memory for front buffer

Further info:
4690  write(0, (II) intel(0): Initializing HW C..., 38) = 38
4690  write(0, (II) intel(0): xf86BindGARTMemor..., 76) = 76
4690  ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument)
4690  write(0, (WW) intel(0): xf86BindGARTMemor..., 115) = 115
4690  write(2, \nFatal server error:\n, 21) = 21

 I thought I'd seen a thread about this issue, but I can't find it now. Is it
 known or am I seeing ghosts yet, Andrew?

-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Valdis . Kletnieks
On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said:
 On 09/19/2007 09:54 PM, Andi Kleen wrote:
  Yeah. (But X doesn't run -- this is maybe the known issue in this release)
.
  
  What do you mean with not run? 
 
 (II) intel(0): Initializing HW Cursor
 (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
 (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
 at offset 0x5ff000 failed (Invalid argument)
 
 Fatal server error:
 Couldn't bind memory for front buffer
 
 I thought I'd seen a thread about this issue, but I can't find it now. Is it
 known or am I seeing ghosts yet, Andrew?

That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke
the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers as
well?



pgpsPLYEmEu99.pgp
Description: PGP signature


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Jiri Slaby
On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote:
 On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said:
 On 09/19/2007 09:54 PM, Andi Kleen wrote:
 Yeah. (But X doesn't run -- this is maybe the known issue in this release)
 .
 What do you mean with not run? 
 (II) intel(0): Initializing HW Cursor
 (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
 (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
 at offset 0x5ff000 failed (Invalid argument)

 Fatal server error:
 Couldn't bind memory for front buffer

 I thought I'd seen a thread about this issue, but I can't find it now. Is it
 known or am I seeing ghosts yet, Andrew?
 
 That would probably have been me, saying that x86_64-mm-cpa-clflush.patch 
 broke
 the NVidia graphics driver in 23-rc3-mm1.  Is it breaking *other* X drivers as
 well?

Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected
in my case):
http://lkml.org/lkml/2007/9/9/51

But now I'm talking about another issue -- a regression since rc4-mm1, where X
server is unable to bind agp memory (those x logs above). The clflush issue has
solved andi in
http://lkml.org/lkml/2007/9/19/334
recently

regards,
-- 
Jiri Slaby ([EMAIL PROTECTED])
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Wed, 19 Sep 2007 22:01:59 +0200
Jiri Slaby [EMAIL PROTECTED] wrote:

 On 09/19/2007 09:57 PM, Jiri Slaby wrote:
  On 09/19/2007 09:54 PM, Andi Kleen wrote:
  Yeah. (But X doesn't run -- this is maybe the known issue in this 
  release).
  What do you mean with not run? 
  
  (II) intel(0): Initializing HW Cursor
  (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
  (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
  at offset 0x5ff000 failed (Invalid argument)
  
  Fatal server error:
  Couldn't bind memory for front buffer
 
 Further info:
 4690  write(0, (II) intel(0): Initializing HW C..., 38) = 38
 4690  write(0, (II) intel(0): xf86BindGARTMemor..., 76) = 76
 4690  ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument)
 4690  write(0, (WW) intel(0): xf86BindGARTMemor..., 115) = 115
 4690  write(2, \nFatal server error:\n, 21) = 21
 

This might be a Dave thing and not an Andi thing.

In my usual -mm-testing I only test X on the one machine (the Vaio,
natch).  Check that it runs glxgears, check suspend/resume to mem and disk.
It has intel graphics and I'm not seeing any such problems.

Have you time to bisect it? 
http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt
describes how.

As a quick test, perhaps build a tree with just
2.6.23-rc6+origin.patch+git-drm.patch?  Fortunately git-agpgart.patch is
presently empty.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
On 9/20/07, Andi Kleen [EMAIL PROTECTED] wrote:
 On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote:
  On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote:
 
   -8-8-8-8-8-8
   That means
   void agp_generic_destroy_page(void *addr)
   {
   struct page *page;
  
   if (addr == NULL)
   return;
  
   page = virt_to_page(addr);
   (1) unmap_page_from_agp(page);
   put_page(page);
   (2) free_page((unsigned long)addr);
   atomic_dec(agp_bridge-current_memory_agp);
   }
  
   (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr -
   __change_page_attr - save_page - list_add(fpage-lru, deferred_pages);
   (2) free_page - free_pages - __free_pages - free_hot_page -
   free_hot_cold_page - list_add(page-lru, pcp-list);
 
  that'll hurt.
 
   any ideas how to fix this?
 
  We should hold a single reference on the page for its membership in
  deferred_pages.

 The code is broken anyways. If you free pages without flushing
 them first some other innocent user allocating them will end up
 with possible uncached pages for some time.

 Does this simple patch help?

 -Andi


 Flush uncached AGP pages before freeing

In theory this should be handled by the caller, so as to avoid the
overhead of continuous flushing however I can see a potential race
condition here if the pages are put back into the kernel before the
caller flushes the mappings..

Do we need some sort of two step approach here? as flushing after each
page would be a major overhead for dynamic agp stuff in the new memory
manager..

Dave.


 Signed-off-by: Andi Kleen [EMAIL PROTECTED]

 Index: linux/drivers/char/agp/generic.c
 ===
 --- linux.orig/drivers/char/agp/generic.c
 +++ linux/drivers/char/agp/generic.c
 @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr

 page = virt_to_page(addr);
 unmap_page_from_agp(page);
 +   flush_agp_mappings();
 put_page(page);
 free_page((unsigned long)addr);
 atomic_dec(agp_bridge-current_memory_agp);
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
 The code is broken anyways. If you free pages without flushing
 them first some other innocent user allocating them will end up
 with possible uncached pages for some time.

 Does this simple patch help?


I've attached a more complicated patch that does a 2 stage effort to
unmapping and freeing pages. My kernel no longer hangs with this
patch...

Jiri can you confirm?

I'll look at the other issue separately..

Dave.
From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
From: Dave Airlie [EMAIL PROTECTED]
Date: Thu, 20 Sep 2007 11:30:41 +1000
Subject: [PATCH] agp: fix race condition between unmapping and freeing pages

With Andi's clflush fixup, we were getting hangs on server exit, flushing
the mappings after freeing each page helped.

This showed up a race condition where the pages after being freed could be
reused before the agp mappings had been flushed. Flushing after each single
page is a bad thing for future drm work, so make the page destroy a two pass
unmapping all the pages, flushing the mappings, and then destroying the pages.

Signed-off-by: Dave Airlie [EMAIL PROTECTED]
---
 drivers/char/agp/agp.h   |7 +--
 drivers/char/agp/ali-agp.c   |   29 +
 drivers/char/agp/backend.c   |   12 
 drivers/char/agp/generic.c   |   20 ++--
 drivers/char/agp/i460-agp.c  |4 ++--
 drivers/char/agp/intel-agp.c |6 --
 6 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h
index 8955e7f..b83824c 100644
--- a/drivers/char/agp/agp.h
+++ b/drivers/char/agp/agp.h
@@ -58,6 +58,9 @@ struct gatt_mask {
 	 * devices this will probably be ignored */
 };
 
+#define AGP_PAGE_DESTROY_UNMAP 1
+#define AGP_PAGE_DESTROY_FREE 2
+
 struct aper_size_info_8 {
 	int size;
 	int num_entries;
@@ -113,7 +116,7 @@ struct agp_bridge_driver {
 	struct agp_memory *(*alloc_by_type) (size_t, int);
 	void (*free_by_type)(struct agp_memory *);
 	void *(*agp_alloc_page)(struct agp_bridge_data *);
-	void (*agp_destroy_page)(void *);
+	void (*agp_destroy_page)(void *, int flags);
 int (*agp_type_to_mask_type) (struct agp_bridge_data *, int);
 };
 
@@ -267,7 +270,7 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type);
 struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type);
 void agp_generic_free_by_type(struct agp_memory *curr);
 void *agp_generic_alloc_page(struct agp_bridge_data *bridge);
-void agp_generic_destroy_page(void *addr);
+void agp_generic_destroy_page(void *addr, int flags);
 void agp_free_key(int key);
 int agp_num_entries(void);
 u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command);
diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c
index 4941ddb..2b65155 100644
--- a/drivers/char/agp/ali-agp.c
+++ b/drivers/char/agp/ali-agp.c
@@ -156,29 +156,34 @@ static void *m1541_alloc_page(struct agp_bridge_data *bridge)
 	return addr;
 }
 
-static void ali_destroy_page(void * addr)
+static void ali_destroy_page(void * addr, int flags)
 {
 	if (addr) {
-		global_cache_flush();	/* is this really needed?  --hch */
-		agp_generic_destroy_page(addr);
-		global_flush_tlb();
+		if (flags  AGP_PAGE_DESTROY_UNMAP) {
+			global_cache_flush();	/* is this really needed?  --hch */
+			agp_generic_destroy_page(addr, flags);
+			global_flush_tlb();
+		} else
+			agp_generic_destroy_page(addr, flags);
 	}
 }
 
-static void m1541_destroy_page(void * addr)
+static void m1541_destroy_page(void * addr, int flags)
 {
 	u32 temp;
 
 	if (addr == NULL)
 		return;
 
-	global_cache_flush();
-
-	pci_read_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, temp);
-	pci_write_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL,
-			(((temp  ALI_CACHE_FLUSH_ADDR_MASK) |
-			  virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN));
-	agp_generic_destroy_page(addr);
+	if (flags  AGP_PAGE_DESTROY_UNMAP) {
+		global_cache_flush();
+	  
+		pci_read_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, temp);
+		pci_write_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL,
+   (((temp  ALI_CACHE_FLUSH_ADDR_MASK) |
+	 virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN));
+	}
+	agp_generic_destroy_page(addr, flags);
 }
 
 
diff --git a/drivers/char/agp/backend.c b/drivers/char/agp/backend.c
index 1b47c89..832ded2 100644
--- a/drivers/char/agp/backend.c
+++ b/drivers/char/agp/backend.c
@@ -189,9 +189,11 @@ static int agp_backend_initialize(struct agp_bridge_data *bridge)
 
 err_out:
 	if (bridge-driver-needs_scratch_page) {
-		bridge-driver-agp_destroy_page(
-gart_to_virt(bridge-scratch_page_real));
+		bridge-driver-agp_destroy_page(gart_to_virt(bridge-scratch_page_real),
+		 AGP_PAGE_DESTROY_UNMAP);
 		flush_agp_mappings();
+		bridge-driver-agp_destroy_page(gart_to_virt(bridge-scratch_page_real),
+		 AGP_PAGE_DESTROY_FREE);
 	}
 	if (got_gatt)
 		bridge-driver-free_gatt_table(bridge);
@@ -215,9 +217,11 @@ static void 

Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Dave Airlie
On 9/20/07, Jiri Slaby [EMAIL PROTECTED] wrote:
 On 09/19/2007 09:54 PM, Andi Kleen wrote:
  Yeah. (But X doesn't run -- this is maybe the known issue in this release).
 
  What do you mean with not run?

 (II) intel(0): Initializing HW Cursor
 (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535)
 (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0
 at offset 0x5ff000 failed (Invalid argument)

 Fatal server error:
 Couldn't bind memory for front buffer

 I thought I'd seen a thread about this issue, but I can't find it now. Is it
 known or am I seeing ghosts yet, Andrew?


Can you send me a complete Xorg log file?

and lspci -vv?

my 945 works fine with my drm tree on top of Linus with clflush + my
agp fix I just sent out ..

Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]

2007-09-19 Thread Andrew Morton
On Thu, 20 Sep 2007 11:42:29 +1000 Dave Airlie [EMAIL PROTECTED] wrote:

 From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001
 From: Dave Airlie [EMAIL PROTECTED]
 Date: Thu, 20 Sep 2007 11:30:41 +1000
 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages

This fixes the hang-when-quitting-X on the Vaio.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/