Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/24/2007 09:37 AM, [EMAIL PROTECTED] wrote: > (Interestingly, I can't find any of the 3 addresses listed in the 'list_add > corruption' message anywhere *else* in the netconsole output, and the last > thing > we hear from before the kersplat is apparently an RCU callback in a softirq?) Hmm, there must be somebody else who changes it under hands without list_add. Any lru skin game in the mid-layer nvidia sources? Do slab and slub behave the same? [...] > [ 48.222000] > [ 48.297000] POISONS (810001179148): 810006bbc000, 81000341bec0 > [ 48.297000] > [ 48.297000] Call Trace: > [ 48.297000][] __list_add+0xd7/0x138 > [ 48.297000] [] list_add+0xc/0x11 > [ 48.297000] [] free_hot_cold_page+0xe8/0x16d > [ 48.297000] [] free_hot_page+0xb/0xd > [ 48.297000] [] __free_pages+0x18/0x21 > [ 48.297000] [] free_pages+0x2f/0x34 > [ 48.297000] [] kmem_freepages+0xc5/0xce > [ 48.297000] [] slab_destroy+0x3c/0x53 > [ 48.297000] [] free_block+0xcd/0x110 > [ 48.297000] [] cache_flusharray+0x71/0xa7 > [ 48.297000] [] kmem_cache_free+0x99/0xb2 > [ 48.297000] [] __d_free+0x30/0x34 > [ 48.297000] [] d_callback+0xd/0xf > [ 48.297000] [] __rcu_process_callbacks+0x143/0x1da > [ 48.297000] [] rcu_process_callbacks+0x23/0x44 > [ 48.297000] [] tasklet_action+0x54/0x9e > [ 48.297000] [] __do_softirq+0x57/0xc7 > [ 48.297000] [] ksoftirqd+0x0/0x148 > [ 48.297000] [] call_softirq+0x1c/0x28 > [ 48.297000][] do_softirq+0x34/0x87 > [ 48.297000] [] ksoftirqd+0x73/0x148 > [ 48.297000] [] kthread+0x49/0x78 > [ 48.297000] [] child_rip+0xa/0x12 > [ 48.297000] [] kthread+0x0/0x78 > [ 48.297000] [] child_rip+0x0/0x12 > [ 48.297000] > [ 48.297000] list_add corruption. next->prev should be prev > (8067e050), but was 8100066d59c0. (next=81000119e560). > [ 48.297000] [ cut here ] > [ 48.297000] kernel BUG at lib/list_debug.c:46! > [ 48.297000] invalid opcode: [1] PREEMPT SMP > [ 48.297000] last sysfs file: > /devices/pci:00/:00:1e.0/:03:01.4/resource > [ 48.297000] CPU 1 > [ 48.297000] Modules linked in: irnet(U) ppp_generic(U) slhc(U) > irtty_sir(U) sir_dev(U) ircomm_tty(U) ircomm(U) irda(U) crc_ccitt(U) > coretemp(U) nf_conntrack_ftp(U) xt_pkttype(U) ipt_REJECT(U) ipt_osf(U) > nf_conntrack_ipv4(U) xt_ipisforif(U) ipt_recent(U) ipt_LOG(U) xt_u32(U) > iptable_filter(U) ip_tables(U) xt_tcpudp(U) nf_conntrack_ipv6(U) xt_state(U) > nf_conntrack(U) nfnetlink(U) ip6t_LOG(U) xt_limit(U) ip6table_filter(U) > ip6_tables(U) x_tables(U) sha256(U) aes(U) fan(U) container(U) bay(U) > acpi_cpufreq(U) nvram(U) pcmcia(U) firmware_class(U) yenta_socket(U) > ohci1394(U) rsrc_nonstatic(U) iTCO_wdt(U) iTCO_vendor_support(U) > watchdog_core(U) nvidia(P)(U) thermal(U) ieee1394(U) pcmcia_core(U) > watchdog_dev(U) processor(U) snd_hda_intel(U) intel_agp(U) ac(U) button(U) > video(U) battery(U) power_supply(U) output(U) rtc(U) > [ 48.297000] Pid: 7, comm: ksoftirqd/1 Tainted: P2.6.23-rc6-mm1 #8 > [ 48.297000] RIP: 0010:[] [] > __list_add+0xfb/0x138 > [ 48.297000] RSP: :81000349fd38 EFLAGS: 00010002 > [ 48.297000] RAX: 0088 RBX: 810001179148 RCX: > 8061dbbb > [ 48.297000] RDX: 0001 RSI: 0006 RDI: > 80672620 > [ 48.297000] RBP: 81000349fd58 R08: 80672628 R09: > > [ 48.297000] R10: e731ffa2 R11: 81000349fa98 R12: > 81000119e560 > [ 48.297000] R13: 8067e050 R14: 8067df80 R15: > 81000346d128 > [ 48.297000] FS: () GS:8100034689c0() > knlGS: > [ 48.297000] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b > [ 48.297000] CR2: 0049b9b0 CR3: 04168000 CR4: > 06e0 > [ 48.297000] DR0: DR1: DR2: > > [ 48.297000] DR3: DR6: 0ff0 DR7: > 0400 > [ 48.297000] Process ksoftirqd/1 (pid: 7, threadinfo 810003494000, task > 810003463810) > [ 48.297000] last branch before last exception/interrupt > [ 48.297000] from [] printk+0xa3/0xa4 > [ 48.297000] to [] __list_add+0xf5/0x138 > [ 48.297000] Stack: 81000349fe60 810001179120 8067e040 > 0002 > [ 48.297000] 81000349fd68 80358117 81000349fd98 > 80270865 > [ 48.297000] 81000100 81000341bec0 > 810006bbc000 > [ 48.297000] Call Trace: > [ 48.297000][] list_add+0xc/0x11 > [ 48.297000] [] free_hot_cold_page+0xe8/0x16d > [ 48.297000] [] free_hot_page+0xb/0xd > [ 48.297000] [] __free_pages+0x18/0x21 > [ 48.297000] [] free_pages+0x2f/0x34 > [ 48.297000] [] kmem_freepages+0xc5/0xce > [ 48.297000] [] slab_destroy+0x3c/0x53 > [ 48.297000] [] free_block+0xcd/0x110 > [ 48.297000] []
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Mon, 24 Sep 2007 08:06:45 +0200, Jiri Slaby said: > Heh :). The few last before the list corruption BUG (you have to have > LIST_DEBUG > enabled) -- but it seems you never reached that phase? Seems to be somewhat racy - had one attempt that obviously got into some grand Mongolian flustercluck, because I had a 2M printk buffer defined, and more than 2M worth of apparently looping output saying that the netconsole/printk path was poisoning. I defined the printk buffer to 4M, added a initcall_debug, and then it managed to die in a reasonable amount of time. Here's the whole thing from when it starts blurbing out the POISONS messages until it rolls over and dies (about 736 lines). (Interestingly, I can't find any of the 3 addresses listed in the 'list_add corruption' message anywhere *else* in the netconsole output, and the last thing we hear from before the kersplat is apparently an RCU callback in a softirq?) [ 47.997000] POISONS (810003fb6ca8): 810003fb6ca8, 8100051600d8 [ 47.997000] [ 47.997000] Call Trace: [ 47.998000] [] __list_add+0xd7/0x138 [ 47.998000] [] vma_prio_tree_add+0xc9/0xe0 [ 47.998000] [] vma_prio_tree_insert+0x34/0x39 [ 47.998000] [] vma_adjust+0x310/0x452 [ 47.998000] [] split_vma+0xdb/0xed [ 47.998000] [] mprotect_fixup+0x13b/0x481 [ 47.998000] [] file_map_prot_check+0x7d/0x86 [ 47.998000] [] selinux_file_mprotect+0xe0/0xe9 [ 47.998000] [] sys_mprotect+0x1b2/0x22b [ 47.998000] [] tracesys+0xdc/0xe1 [ 47.998000] [ 48.078000] POISONS (81000402d768): 810004727810, 810006221810 [ 48.078000] [ 48.078000] Call Trace: [ 48.078000] [] __list_add+0xd7/0x138 [ 48.078000] [] list_add+0xc/0x11 [ 48.078000] [] vma_prio_tree_add+0xad/0xe0 [ 48.078000] [] copy_process+0xc63/0x1515 [ 48.078000] [] do_fork+0x75/0x20b [ 48.079000] [] __up_write+0xf0/0x100 [ 48.079000] [] system_call+0x7e/0x83 [ 48.079000] [] sys_clone+0x23/0x25 [ 48.079000] [] ptregscall_common+0x67/0xb0 [ 48.079000] [ 48.08] POISONS (810004096618): 810005266c00, 81000576a960 [ 48.08] [ 48.08] Call Trace: [ 48.08] [] __down_write_nested+0x3d/0xa1 [ 48.08] [] __list_add+0xd7/0x138 [ 48.08] [] vma_prio_tree_add+0xc9/0xe0 [ 48.08] [] copy_process+0xc63/0x1515 [ 48.08] [] do_fork+0x75/0x20b [ 48.08] [] __up_write+0xf0/0x100 [ 48.08] [] system_call+0x7e/0x83 [ 48.08] [] sys_clone+0x23/0x25 [ 48.08] [] ptregscall_common+0x67/0xb0 [ 48.08] [ 48.081000] POISONS (810004096768): 81000526e378, 81000526e378 [ 48.081000] [ 48.081000] Call Trace: [ 48.081000] [] __list_add+0xd7/0x138 [ 48.081000] [] vma_prio_tree_add+0xc9/0xe0 [ 48.081000] [] copy_process+0xc63/0x1515 [ 48.081000] [] do_fork+0x75/0x20b [ 48.081000] [] __up_write+0xf0/0x100 [ 48.081000] [] system_call+0x7e/0x83 [ 48.081000] [] sys_clone+0x23/0x25 [ 48.081000] [] ptregscall_common+0x67/0xb0 [ 48.081000] [ 48.081000] POISONS (8100040964c8): 81000576a960, 8100051c12d0 [ 48.082000] [ 48.082000] Call Trace: [ 48.087000] [] __list_add+0xd7/0x138 [ 48.087000] [] vma_prio_tree_add+0xc9/0xe0 [ 48.087000] [] copy_process+0xc63/0x1515 [ 48.087000] [] do_fork+0x75/0x20b [ 48.087000] [] __up_write+0xf0/0x100 [ 48.087000] [] system_call+0x7e/0x83 [ 48.087000] [] sys_clone+0x23/0x25 [ 48.087000] [] ptregscall_common+0x67/0xb0 [ 48.087000] [ 48.087000] POISONS (810004096d50): 81000412bab0, 810004536618 [ 48.087000] [ 48.087000] Call Trace: [ 48.087000] [] __list_add+0xd7/0x138 [ 48.088000] [] list_add+0xc/0x11 [ 48.088000] [] vma_prio_tree_add+0xad/0xe0 [ 48.088000] [] copy_process+0xc63/0x1515 [ 48.088000] [] do_fork+0x75/0x20b [ 48.088000] [] __up_write+0xf0/0x100 [ 48.088000] [] system_call+0x7e/0x83 [ 48.088000] [] sys_clone+0x23/0x25 [ 48.088000] [] ptregscall_common+0x67/0xb0 [ 48.088000] [ 48.088000] POISONS (810004096c00): 81000412b960, 8100043e7ca8 [ 48.088000] [ 48.088000] Call Trace: [ 48.088000] [] __list_add+0xd7/0x138 [ 48.088000] [] list_add+0xc/0x11 [ 48.089000] [] vma_prio_tree_add+0xad/0xe0 [ 48.089000] [] copy_process+0xc63/0x1515 [ 48.089000] [] do_fork+0x75/0x20b [ 48.089000] [] __up_write+0xf0/0x100 [ 48.089000] [] system_call+0x7e/0x83 [ 48.089000] [] sys_clone+0x23/0x25 [ 48.089000] [] ptregscall_common+0x67/0xb0 [ 48.089000] [ 48.089000] POISONS (810004096ca8): 81000526e960, 810003f0fb58 [ 48.089000] [ 48.089000] Call Trace: [ 48.089000] [] __vm_enough_memory+0x1e/0x113 [ 48.089000] [] __list_add+0xd7/0x138 [ 48.089000] [] list_add+0xc/0x11 [ 48.09] [] vma_prio_tree_add+0xad/0xe0 [ 48.09] [] copy_process+0xc63/0x1515 [ 48.09] [] do_fork+0x75/0x20b [ 48.09] [] __up_write+0xf0/0x100 [ 48.09] [] system_call+0x7e/0x83 [
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/24/2007 05:25 AM, [EMAIL PROTECTED] wrote: > On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said: >> On 09/21/2007 09:38 PM, Jiri Slaby wrote: >>> It is rather the other user who adds the page to some other list while bein > g at >>> deferred_pages list. Could you try my debug patch >>> (http://lkml.org/lkml/2007/9/19/141)? >> or the whitespace non-damaged version: >> http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug > > Gaak. Is that thing *supposed* to spew zillions of lines of output? Oh, probably yes. > Some of the hits we get (I'm wondering if anything after the first makes > any sense, or if we're just slowly watching the corruption spread - the > thing ended up near 23K lines long before I gave up and hit the poweroff > button because there was no end in sight): Yes. it's not perfect, most of them are false positives (It's OK). > (If there's something specific you want me to find in the output, > like "the first time we see XYZ", yell...) Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG enabled) -- but it seems you never reached that phase? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/24/2007 05:25 AM, [EMAIL PROTECTED] wrote: On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said: On 09/21/2007 09:38 PM, Jiri Slaby wrote: It is rather the other user who adds the page to some other list while bein g at deferred_pages list. Could you try my debug patch (http://lkml.org/lkml/2007/9/19/141)? or the whitespace non-damaged version: http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug Gaak. Is that thing *supposed* to spew zillions of lines of output? Oh, probably yes. Some of the hits we get (I'm wondering if anything after the first makes any sense, or if we're just slowly watching the corruption spread - the thing ended up near 23K lines long before I gave up and hit the poweroff button because there was no end in sight): Yes. it's not perfect, most of them are false positives (It's OK). (If there's something specific you want me to find in the output, like the first time we see XYZ, yell...) Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG enabled) -- but it seems you never reached that phase? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Mon, 24 Sep 2007 08:06:45 +0200, Jiri Slaby said: Heh :). The few last before the list corruption BUG (you have to have LIST_DEBUG enabled) -- but it seems you never reached that phase? Seems to be somewhat racy - had one attempt that obviously got into some grand Mongolian flustercluck, because I had a 2M printk buffer defined, and more than 2M worth of apparently looping output saying that the netconsole/printk path was poisoning. I defined the printk buffer to 4M, added a initcall_debug, and then it managed to die in a reasonable amount of time. Here's the whole thing from when it starts blurbing out the POISONS messages until it rolls over and dies (about 736 lines). (Interestingly, I can't find any of the 3 addresses listed in the 'list_add corruption' message anywhere *else* in the netconsole output, and the last thing we hear from before the kersplat is apparently an RCU callback in a softirq?) [ 47.997000] POISONS (810003fb6ca8): 810003fb6ca8, 8100051600d8 [ 47.997000] [ 47.997000] Call Trace: [ 47.998000] [803580aa] __list_add+0xd7/0x138 [ 47.998000] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 47.998000] [80276649] vma_prio_tree_insert+0x34/0x39 [ 47.998000] [8027da6c] vma_adjust+0x310/0x452 [ 47.998000] [8027dc89] split_vma+0xdb/0xed [ 47.998000] [8027f30b] mprotect_fixup+0x13b/0x481 [ 47.998000] [80323da3] file_map_prot_check+0x7d/0x86 [ 47.998000] [80326d93] selinux_file_mprotect+0xe0/0xe9 [ 47.998000] [8027f803] sys_mprotect+0x1b2/0x22b [ 47.998000] [8020c2fc] tracesys+0xdc/0xe1 [ 47.998000] [ 48.078000] POISONS (81000402d768): 810004727810, 810006221810 [ 48.078000] [ 48.078000] Call Trace: [ 48.078000] [803580aa] __list_add+0xd7/0x138 [ 48.078000] [80358117] list_add+0xc/0x11 [ 48.078000] [802765e2] vma_prio_tree_add+0xad/0xe0 [ 48.078000] [802324f0] copy_process+0xc63/0x1515 [ 48.078000] [80232eff] do_fork+0x75/0x20b [ 48.079000] [80353d54] __up_write+0xf0/0x100 [ 48.079000] [8020c17e] system_call+0x7e/0x83 [ 48.079000] [8020a64f] sys_clone+0x23/0x25 [ 48.079000] [8020c497] ptregscall_common+0x67/0xb0 [ 48.079000] [ 48.08] POISONS (810004096618): 810005266c00, 81000576a960 [ 48.08] [ 48.08] Call Trace: [ 48.08] [80517086] __down_write_nested+0x3d/0xa1 [ 48.08] [803580aa] __list_add+0xd7/0x138 [ 48.08] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 48.08] [802324f0] copy_process+0xc63/0x1515 [ 48.08] [80232eff] do_fork+0x75/0x20b [ 48.08] [80353d54] __up_write+0xf0/0x100 [ 48.08] [8020c17e] system_call+0x7e/0x83 [ 48.08] [8020a64f] sys_clone+0x23/0x25 [ 48.08] [8020c497] ptregscall_common+0x67/0xb0 [ 48.08] [ 48.081000] POISONS (810004096768): 81000526e378, 81000526e378 [ 48.081000] [ 48.081000] Call Trace: [ 48.081000] [803580aa] __list_add+0xd7/0x138 [ 48.081000] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 48.081000] [802324f0] copy_process+0xc63/0x1515 [ 48.081000] [80232eff] do_fork+0x75/0x20b [ 48.081000] [80353d54] __up_write+0xf0/0x100 [ 48.081000] [8020c17e] system_call+0x7e/0x83 [ 48.081000] [8020a64f] sys_clone+0x23/0x25 [ 48.081000] [8020c497] ptregscall_common+0x67/0xb0 [ 48.081000] [ 48.081000] POISONS (8100040964c8): 81000576a960, 8100051c12d0 [ 48.082000] [ 48.082000] Call Trace: [ 48.087000] [803580aa] __list_add+0xd7/0x138 [ 48.087000] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 48.087000] [802324f0] copy_process+0xc63/0x1515 [ 48.087000] [80232eff] do_fork+0x75/0x20b [ 48.087000] [80353d54] __up_write+0xf0/0x100 [ 48.087000] [8020c17e] system_call+0x7e/0x83 [ 48.087000] [8020a64f] sys_clone+0x23/0x25 [ 48.087000] [8020c497] ptregscall_common+0x67/0xb0 [ 48.087000] [ 48.087000] POISONS (810004096d50): 81000412bab0, 810004536618 [ 48.087000] [ 48.087000] Call Trace: [ 48.087000] [803580aa] __list_add+0xd7/0x138 [ 48.088000] [80358117] list_add+0xc/0x11 [ 48.088000] [802765e2] vma_prio_tree_add+0xad/0xe0 [ 48.088000] [802324f0] copy_process+0xc63/0x1515 [ 48.088000] [80232eff] do_fork+0x75/0x20b [ 48.088000] [80353d54] __up_write+0xf0/0x100 [ 48.088000] [8020c17e] system_call+0x7e/0x83 [ 48.088000] [8020a64f] sys_clone+0x23/0x25 [ 48.088000] [8020c497] ptregscall_common+0x67/0xb0 [ 48.088000] [ 48.088000] POISONS (810004096c00): 81000412b960, 8100043e7ca8 [ 48.088000] [ 48.088000] Call Trace: [ 48.088000]
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/24/2007 09:37 AM, [EMAIL PROTECTED] wrote: (Interestingly, I can't find any of the 3 addresses listed in the 'list_add corruption' message anywhere *else* in the netconsole output, and the last thing we hear from before the kersplat is apparently an RCU callback in a softirq?) Hmm, there must be somebody else who changes it under hands without list_add. Any lru skin game in the mid-layer nvidia sources? Do slab and slub behave the same? [...] [ 48.222000] [ 48.297000] POISONS (810001179148): 810006bbc000, 81000341bec0 [ 48.297000] [ 48.297000] Call Trace: [ 48.297000] IRQ [803580aa] __list_add+0xd7/0x138 [ 48.297000] [80358117] list_add+0xc/0x11 [ 48.297000] [80270865] free_hot_cold_page+0xe8/0x16d [ 48.297000] [8027093e] free_hot_page+0xb/0xd [ 48.297000] [80270958] __free_pages+0x18/0x21 [ 48.297000] [80270990] free_pages+0x2f/0x34 [ 48.297000] [8028922d] kmem_freepages+0xc5/0xce [ 48.297000] [8028957f] slab_destroy+0x3c/0x53 [ 48.297000] [80289663] free_block+0xcd/0x110 [ 48.297000] [80289345] cache_flusharray+0x71/0xa7 [ 48.297000] [802894c2] kmem_cache_free+0x99/0xb2 [ 48.297000] [8029d8b2] __d_free+0x30/0x34 [ 48.297000] [8029dcb6] d_callback+0xd/0xf [ 48.297000] [802458e0] __rcu_process_callbacks+0x143/0x1da [ 48.297000] [8024599a] rcu_process_callbacks+0x23/0x44 [ 48.297000] [802396d2] tasklet_action+0x54/0x9e [ 48.297000] [802395ad] __do_softirq+0x57/0xc7 [ 48.297000] [802398e3] ksoftirqd+0x0/0x148 [ 48.297000] [8020d32c] call_softirq+0x1c/0x28 [ 48.297000] EOI [8020e916] do_softirq+0x34/0x87 [ 48.297000] [80239956] ksoftirqd+0x73/0x148 [ 48.297000] [80247ddd] kthread+0x49/0x78 [ 48.297000] [8020cfc8] child_rip+0xa/0x12 [ 48.297000] [80247d94] kthread+0x0/0x78 [ 48.297000] [8020cfbe] child_rip+0x0/0x12 [ 48.297000] [ 48.297000] list_add corruption. next-prev should be prev (8067e050), but was 8100066d59c0. (next=81000119e560). [ 48.297000] [ cut here ] [ 48.297000] kernel BUG at lib/list_debug.c:46! [ 48.297000] invalid opcode: [1] PREEMPT SMP [ 48.297000] last sysfs file: /devices/pci:00/:00:1e.0/:03:01.4/resource [ 48.297000] CPU 1 [ 48.297000] Modules linked in: irnet(U) ppp_generic(U) slhc(U) irtty_sir(U) sir_dev(U) ircomm_tty(U) ircomm(U) irda(U) crc_ccitt(U) coretemp(U) nf_conntrack_ftp(U) xt_pkttype(U) ipt_REJECT(U) ipt_osf(U) nf_conntrack_ipv4(U) xt_ipisforif(U) ipt_recent(U) ipt_LOG(U) xt_u32(U) iptable_filter(U) ip_tables(U) xt_tcpudp(U) nf_conntrack_ipv6(U) xt_state(U) nf_conntrack(U) nfnetlink(U) ip6t_LOG(U) xt_limit(U) ip6table_filter(U) ip6_tables(U) x_tables(U) sha256(U) aes(U) fan(U) container(U) bay(U) acpi_cpufreq(U) nvram(U) pcmcia(U) firmware_class(U) yenta_socket(U) ohci1394(U) rsrc_nonstatic(U) iTCO_wdt(U) iTCO_vendor_support(U) watchdog_core(U) nvidia(P)(U) thermal(U) ieee1394(U) pcmcia_core(U) watchdog_dev(U) processor(U) snd_hda_intel(U) intel_agp(U) ac(U) button(U) video(U) battery(U) power_supply(U) output(U) rtc(U) [ 48.297000] Pid: 7, comm: ksoftirqd/1 Tainted: P2.6.23-rc6-mm1 #8 [ 48.297000] RIP: 0010:[803580ce] [803580ce] __list_add+0xfb/0x138 [ 48.297000] RSP: :81000349fd38 EFLAGS: 00010002 [ 48.297000] RAX: 0088 RBX: 810001179148 RCX: 8061dbbb [ 48.297000] RDX: 0001 RSI: 0006 RDI: 80672620 [ 48.297000] RBP: 81000349fd58 R08: 80672628 R09: [ 48.297000] R10: e731ffa2 R11: 81000349fa98 R12: 81000119e560 [ 48.297000] R13: 8067e050 R14: 8067df80 R15: 81000346d128 [ 48.297000] FS: () GS:8100034689c0() knlGS: [ 48.297000] CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b [ 48.297000] CR2: 0049b9b0 CR3: 04168000 CR4: 06e0 [ 48.297000] DR0: DR1: DR2: [ 48.297000] DR3: DR6: 0ff0 DR7: 0400 [ 48.297000] Process ksoftirqd/1 (pid: 7, threadinfo 810003494000, task 810003463810) [ 48.297000] last branch before last exception/interrupt [ 48.297000] from [80234989] printk+0xa3/0xa4 [ 48.297000] to [803580c8] __list_add+0xf5/0x138 [ 48.297000] Stack: 81000349fe60 810001179120 8067e040 0002 [ 48.297000] 81000349fd68 80358117 81000349fd98 80270865 [ 48.297000] 81000100 81000341bec0 810006bbc000 [ 48.297000] Call Trace: [
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said: > On 09/21/2007 09:38 PM, Jiri Slaby wrote: > > It is rather the other user who adds the page to some other list while bein g at > > deferred_pages list. Could you try my debug patch > > (http://lkml.org/lkml/2007/9/19/141)? > > or the whitespace non-damaged version: > http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug Gaak. Is that thing *supposed* to spew zillions of lines of output? Some of the hits we get (I'm wondering if anything after the first makes any sense, or if we're just slowly watching the corruption spread - the thing ended up near 23K lines long before I gave up and hit the poweroff button because there was no end in sight): (If there's something specific you want me to find in the output, like "the first time we see XYZ", yell...) [ 103.701000] POISONS (81000117dc88): 810006d14000, 8100034225c0 [ 103.701000] [ 103.701000] Call Trace: [ 103.701000] [] __list_add+0xd7/0x138 [ 103.701000] [] list_add+0xc/0x11 [ 103.701000] [] free_hot_cold_page+0xe8/0x16d [ 103.701000] [] free_hot_page+0xb/0xd [ 103.701000] [] __free_pages+0x18/0x21 [ 103.701000] [] free_pages+0x2f/0x34 [ 103.701000] [] kmem_freepages+0xc5/0xce [ 103.701000] [] slab_destroy+0x3c/0x53 [ 103.701000] [] free_block+0xcd/0x110 [ 103.701000] [] drain_array+0x94/0xc9 [ 103.701000] [] cache_reap+0x0/0x105 [ 103.701000] [] cache_reap+0x85/0x105 [ 103.701000] [] run_workqueue+0x8e/0x125 [ 103.701000] [] worker_thread+0x0/0xe7 [ 103.701000] [] worker_thread+0xdc/0xe7 [ 103.701000] [] autoremove_wake_function+0x0/0x38 [ 103.701000] [] kthread+0x49/0x78 [ 103.701000] [] child_rip+0xa/0x12 [ 103.701000] [] kthread+0x0/0x78 [ 103.701000] [] child_rip+0x0/0x12 [ 103.701000] [ 103.701000] POISONS (81000117eac0): 810006d55000, 8100034225c0 [ 103.701000] [ 103.701000] Call Trace: [ 103.701000] [] __list_add+0xd7/0x138 [ 103.701000] [] list_add+0xc/0x11 [ 103.701000] [] free_hot_cold_page+0xe8/0x16d [ 103.701000] [] free_hot_page+0xb/0xd [ 103.701000] [] __free_pages+0x18/0x21 [ 103.701000] [] free_pages+0x2f/0x34 [ 103.701000] [] kmem_freepages+0xc5/0xce [ 103.701000] [] slab_destroy+0x3c/0x53 [ 103.701000] [] free_block+0xcd/0x110 [ 103.701000] [] drain_array+0x94/0xc9 [ 103.701000] [] cache_reap+0x0/0x105 [ 103.701000] [] cache_reap+0x85/0x105 [ 103.701000] [] run_workqueue+0x8e/0x125 [ 103.701000] [] worker_thread+0x0/0xe7 [ 103.701000] [] worker_thread+0xdc/0xe7 [ 103.701000] [] autoremove_wake_function+0x0/0x38 [ 103.701000] [] kthread+0x49/0x78 [ 103.701000] [] child_rip+0xa/0x12 [ 103.701000] [] kthread+0x0/0x78 [ 103.701000] [] child_rip+0x0/0x12 [ 103.701000] (That trace repeats 16 times, then we see:) [ 106.284000] POISONS (810004432810): 810005291378, 81000524e618 [ 106.284000] [ 106.284000] Call Trace: [ 106.284000] [] __down_write_nested+0x3d/0xa1 [ 106.284000] [] __list_add+0xd7/0x138 [ 106.284000] [] vma_prio_tree_add+0xc9/0xe0 [ 106.284000] [] copy_process+0xc63/0x1515 [ 106.284000] [] do_fork+0x75/0x20b [ 106.284000] [] __up_write+0xf0/0x100 [ 106.284000] [] system_call+0x7e/0x83 [ 106.284000] [] sys_clone+0x23/0x25 [ 106.284000] [] ptregscall_common+0x67/0xb0 [ 106.284000] .. [ 106.284000] POISONS (810004432768): 81000524e618, 81000524e618 [ 106.284000] [ 106.284000] Call Trace: [ 106.284000] [] __list_add+0xd7/0x138 [ 106.284000] [] vma_prio_tree_add+0xc9/0xe0 [ 106.284000] [] copy_process+0xc63/0x1515 [ 106.284000] [] do_fork+0x75/0x20b [ 106.284000] [] __up_write+0xf0/0x100 [ 106.284000] [] system_call+0x7e/0x83 [ 106.284000] [] sys_clone+0x23/0x25 [ 106.284000] [] ptregscall_common+0x67/0xb0 [ 106.284000] ... [ 106.285000] POISONS (810003637b30): 810003637c18, 0246 [ 106.285000] [ 106.285000] Call Trace: [ 106.285000] [] __list_add+0xd7/0x138 [ 106.285000] [] list_add+0xc/0x11 [ 106.285000] [] add_wait_queue+0x2c/0x40 [ 106.285000] [] __pollwait+0xd6/0xdf [ 106.285000] [] inotify_poll+0x29/0x5c [ 106.285000] [] do_select+0x2fa/0x50d [ 106.285000] [] __pollwait+0x0/0xdf [ 106.285000] [] default_wake_function+0x0/0xf [ 106.285000] [] __down_trylock+0x4d/0x5a [ 106.285000] [] __down_failed_trylock+0x35/0x3a [ 106.285000] [] __update_rq_clock+0x1a/0xe5 [ 106.285000] [] __alloc_pages+0x5c/0x2b5 [ 106.285000] [] core_sys_select+0x1f3/0x2a2 [ 106.285000] [] alloc_pid+0x2f8/0x34f [ 106.285000] [] __up_read+0x7a/0x83 [ 106.285000] [] up_read+0x9/0xb [ 106.285000] [] do_page_fault+0x405/0x7ac [ 106.285000] [] sys_select+0xbf/0x17b [ 106.285000] [] system_call+0x7e/0x83 [ 106.285000] POISONS (810003637ba0): 8060ff48, 8051471d [ 106.285000] [ 106.285000] Call Trace: [ 106.285000] [] __list_add+0xd7/0x138 [ 106.285000] [] list_add+0xc/0x11 [ 106.285000] [] add_wait_queue+0x2c/0x40 [ 106.285000] []
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 21:43:20 +0200, Jiri Slaby said: On 09/21/2007 09:38 PM, Jiri Slaby wrote: It is rather the other user who adds the page to some other list while bein g at deferred_pages list. Could you try my debug patch (http://lkml.org/lkml/2007/9/19/141)? or the whitespace non-damaged version: http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug Gaak. Is that thing *supposed* to spew zillions of lines of output? Some of the hits we get (I'm wondering if anything after the first makes any sense, or if we're just slowly watching the corruption spread - the thing ended up near 23K lines long before I gave up and hit the poweroff button because there was no end in sight): (If there's something specific you want me to find in the output, like the first time we see XYZ, yell...) [ 103.701000] POISONS (81000117dc88): 810006d14000, 8100034225c0 [ 103.701000] [ 103.701000] Call Trace: [ 103.701000] [803580aa] __list_add+0xd7/0x138 [ 103.701000] [80358117] list_add+0xc/0x11 [ 103.701000] [80270865] free_hot_cold_page+0xe8/0x16d [ 103.701000] [8027093e] free_hot_page+0xb/0xd [ 103.701000] [80270958] __free_pages+0x18/0x21 [ 103.701000] [80270990] free_pages+0x2f/0x34 [ 103.701000] [8028922d] kmem_freepages+0xc5/0xce [ 103.701000] [8028957f] slab_destroy+0x3c/0x53 [ 103.701000] [80289663] free_block+0xcd/0x110 [ 103.701000] [8028973a] drain_array+0x94/0xc9 [ 103.701000] [8028a8c3] cache_reap+0x0/0x105 [ 103.701000] [8028a948] cache_reap+0x85/0x105 [ 103.701000] [80243d5d] run_workqueue+0x8e/0x125 [ 103.701000] [8024478d] worker_thread+0x0/0xe7 [ 103.701000] [80244869] worker_thread+0xdc/0xe7 [ 103.701000] [80247f13] autoremove_wake_function+0x0/0x38 [ 103.701000] [80247ddd] kthread+0x49/0x78 [ 103.701000] [8020cfc8] child_rip+0xa/0x12 [ 103.701000] [80247d94] kthread+0x0/0x78 [ 103.701000] [8020cfbe] child_rip+0x0/0x12 [ 103.701000] [ 103.701000] POISONS (81000117eac0): 810006d55000, 8100034225c0 [ 103.701000] [ 103.701000] Call Trace: [ 103.701000] [803580aa] __list_add+0xd7/0x138 [ 103.701000] [80358117] list_add+0xc/0x11 [ 103.701000] [80270865] free_hot_cold_page+0xe8/0x16d [ 103.701000] [8027093e] free_hot_page+0xb/0xd [ 103.701000] [80270958] __free_pages+0x18/0x21 [ 103.701000] [80270990] free_pages+0x2f/0x34 [ 103.701000] [8028922d] kmem_freepages+0xc5/0xce [ 103.701000] [8028957f] slab_destroy+0x3c/0x53 [ 103.701000] [80289663] free_block+0xcd/0x110 [ 103.701000] [8028973a] drain_array+0x94/0xc9 [ 103.701000] [8028a8c3] cache_reap+0x0/0x105 [ 103.701000] [8028a948] cache_reap+0x85/0x105 [ 103.701000] [80243d5d] run_workqueue+0x8e/0x125 [ 103.701000] [8024478d] worker_thread+0x0/0xe7 [ 103.701000] [80244869] worker_thread+0xdc/0xe7 [ 103.701000] [80247f13] autoremove_wake_function+0x0/0x38 [ 103.701000] [80247ddd] kthread+0x49/0x78 [ 103.701000] [8020cfc8] child_rip+0xa/0x12 [ 103.701000] [80247d94] kthread+0x0/0x78 [ 103.701000] [8020cfbe] child_rip+0x0/0x12 [ 103.701000] (That trace repeats 16 times, then we see:) [ 106.284000] POISONS (810004432810): 810005291378, 81000524e618 [ 106.284000] [ 106.284000] Call Trace: [ 106.284000] [80517086] __down_write_nested+0x3d/0xa1 [ 106.284000] [803580aa] __list_add+0xd7/0x138 [ 106.284000] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 106.284000] [802324f0] copy_process+0xc63/0x1515 [ 106.284000] [80232eff] do_fork+0x75/0x20b [ 106.284000] [80353d54] __up_write+0xf0/0x100 [ 106.284000] [8020c17e] system_call+0x7e/0x83 [ 106.284000] [8020a64f] sys_clone+0x23/0x25 [ 106.284000] [8020c497] ptregscall_common+0x67/0xb0 [ 106.284000] .. [ 106.284000] POISONS (810004432768): 81000524e618, 81000524e618 [ 106.284000] [ 106.284000] Call Trace: [ 106.284000] [803580aa] __list_add+0xd7/0x138 [ 106.284000] [802765fe] vma_prio_tree_add+0xc9/0xe0 [ 106.284000] [802324f0] copy_process+0xc63/0x1515 [ 106.284000] [80232eff] do_fork+0x75/0x20b [ 106.284000] [80353d54] __up_write+0xf0/0x100 [ 106.284000] [8020c17e] system_call+0x7e/0x83 [ 106.284000] [8020a64f] sys_clone+0x23/0x25 [ 106.284000] [8020c497] ptregscall_common+0x67/0xb0 [ 106.284000] ... [ 106.285000] POISONS (810003637b30): 810003637c18, 0246 [ 106.285000] [ 106.285000] Call Trace: [ 106.285000] [803580aa] __list_add+0xd7/0x138 [ 106.285000] [80358117] list_add+0xc/0x11 [ 106.285000] [80248110] add_wait_queue+0x2c/0x40 [
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 09:38 PM, Jiri Slaby wrote: > It is rather the other user who adds the page to some other list while being > at > deferred_pages list. Could you try my debug patch > (http://lkml.org/lkml/2007/9/19/141)? or the whitespace non-damaged version: http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 09:33 PM, [EMAIL PROTECTED] wrote: > On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: >> On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: > >>> Hmm.. maybe I'm chasing a different bug manifested by the same patch. For >>> me, >>> it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't >>> change matters. >> This patch probably changes behaviour how the pages are queued on the list >> somehow. Maybe it's insane to suggest everybody with similar problem to try >> LIST_DEBUG, but just give it a try after having one of the patches applied >> ;). >> (Or have you tried yet?) > > OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it > hit: > > [ 198.925000] list_del corruption. next->prev should be 81000118f178, > but was 8067e050 > [ 198.925000] [ cut here ] > [ 198.925000] kernel BUG at lib/list_debug.c:72! > [ 198.925000] invalid opcode: [1] PREEMPT SMP > [ 198.925000] last sysfs file: > /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev > [ 198.925000] CPU 1 > [ 198.925000] Modules linked in: > > (Yes, I wish I got a backtrace, but that's as long as it lived. Apparently, > the netconsole stuff actually writing this stuff out was over on CPU0 which > then > proceeded to croak). > > Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters) It is rather the other user who adds the page to some other list while being at deferred_pages list. Could you try my debug patch (http://lkml.org/lkml/2007/9/19/141)? -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: > On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: > > Hmm.. maybe I'm chasing a different bug manifested by the same patch. For > > me, > > it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't > > change matters. > > This patch probably changes behaviour how the pages are queued on the list > somehow. Maybe it's insane to suggest everybody with similar problem to try > LIST_DEBUG, but just give it a try after having one of the patches applied ;). > (Or have you tried yet?) OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it hit: [ 198.925000] list_del corruption. next->prev should be 81000118f178, but was 8067e050 [ 198.925000] [ cut here ] [ 198.925000] kernel BUG at lib/list_debug.c:72! [ 198.925000] invalid opcode: [1] PREEMPT SMP [ 198.925000] last sysfs file: /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev [ 198.925000] CPU 1 [ 198.925000] Modules linked in: (Yes, I wish I got a backtrace, but that's as long as it lived. Apparently, the netconsole stuff actually writing this stuff out was over on CPU0 which then proceeded to croak). Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters) pgp0DGzUXtfnR.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: > On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: > > On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: > >> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) > >> -rc4-mm1: solid lock on X shutdown, random solid locks about > >> once every four hours > >> -rc6-mm1: solid lock on X startup > >>+your patch: screen goes black, turns off and on a few times during > >> startup, can reboot with sysrq-b > > > > Hmm.. maybe I'm chasing a different bug manifested by the same patch. For > > me, > > it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't > > change matters. > > This patch probably changes behaviour how the pages are queued on the list > somehow. Maybe it's insane to suggest everybody with similar problem to try > LIST_DEBUG, but just give it a try after having one of the patches applied ;). > (Or have you tried yet?) Haven't tried LIST_DEBUG yet. I'm spending most of the weekend camping, so will likely be unable to test that until Monday-ish (unless I get lucky and can get a test in during the next 2 hours)... pgpLdN1M63Tfr.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: > On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: >> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) >> -rc4-mm1: solid lock on X shutdown, random solid locks about >> once every four hours >> -rc6-mm1: solid lock on X startup >>+your patch: screen goes black, turns off and on a few times during >> startup, can reboot with sysrq-b > > Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, > it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't > change matters. This patch probably changes behaviour how the pages are queued on the list somehow. Maybe it's insane to suggest everybody with similar problem to try LIST_DEBUG, but just give it a try after having one of the patches applied ;). (Or have you tried yet?) regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 01:44:41 +0200, Andi Kleen said: > Full bisect needed then I guess. Ok as a short cut you could perhaps > the cpa-* patches first (might need to drop some later depending > patches), then the drm and agp trees. The later depending patches: x86_64-mm-cpa-clflush.patch x86_64-mm-cpa-cleanup.patch x86_64-mm-cpa-einval.patch x86_64-mm-cpa-arch-macro.patch intel-iommu-clflush_cache_range-now-takes-size-param.patch pgpBmX3JhtCp0.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: > On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote: > > I've attached a more complicated patch that does a 2 stage effort to > > unmapping and freeing pages. My kernel no longer hangs with this > > patch... > > > > Jiri can you confirm? > > It's broken for me. > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) > -rc4-mm1: solid lock on X shutdown, random solid locks about > once every four hours > -rc6-mm1: solid lock on X startup >+your patch: screen goes black, turns off and on a few times during > startup, can reboot with sysrq-b Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. pgpwbbHB44LP3.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 01:31 AM, Matt Mackall wrote: > On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: >>> It's broken for me. >>> >>> 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) >>> -rc4-mm1: solid lock on X shutdown, random solid locks about >>> once every four hours >>> -rc6-mm1: solid lock on X startup >>>+your patch: screen goes black, turns off and on a few times during >>> startup, can reboot with sysrq-b >> Does it work with my simple dumb patch instead of Dave's ? > > Sorry, forgot to mention: your one-liner flush also doesn't work (same > behavior). Just an idea, if you enable LIST_DEBUG, it won't spit anything out after the one of patches applied, right? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 01:31 AM, Matt Mackall wrote: On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Does it work with my simple dumb patch instead of Dave's ? Sorry, forgot to mention: your one-liner flush also doesn't work (same behavior). Just an idea, if you enable LIST_DEBUG, it won't spit anything out after the one of patches applied, right? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote: I've attached a more complicated patch that does a 2 stage effort to unmapping and freeing pages. My kernel no longer hangs with this patch... Jiri can you confirm? It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. pgpwbbHB44LP3.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 01:44:41 +0200, Andi Kleen said: Full bisect needed then I guess. Ok as a short cut you could perhaps the cpa-* patches first (might need to drop some later depending patches), then the drm and agp trees. The later depending patches: x86_64-mm-cpa-clflush.patch x86_64-mm-cpa-cleanup.patch x86_64-mm-cpa-einval.patch x86_64-mm-cpa-arch-macro.patch intel-iommu-clflush_cache_range-now-takes-size-param.patch pgpBmX3JhtCp0.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. This patch probably changes behaviour how the pages are queued on the list somehow. Maybe it's insane to suggest everybody with similar problem to try LIST_DEBUG, but just give it a try after having one of the patches applied ;). (Or have you tried yet?) Haven't tried LIST_DEBUG yet. I'm spending most of the weekend camping, so will likely be unable to test that until Monday-ish (unless I get lucky and can get a test in during the next 2 hours)... pgpLdN1M63Tfr.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: On Thu, 20 Sep 2007 17:06:05 CDT, Matt Mackall said: 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. This patch probably changes behaviour how the pages are queued on the list somehow. Maybe it's insane to suggest everybody with similar problem to try LIST_DEBUG, but just give it a try after having one of the patches applied ;). (Or have you tried yet?) regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. This patch probably changes behaviour how the pages are queued on the list somehow. Maybe it's insane to suggest everybody with similar problem to try LIST_DEBUG, but just give it a try after having one of the patches applied ;). (Or have you tried yet?) OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it hit: [ 198.925000] list_del corruption. next-prev should be 81000118f178, but was 8067e050 [ 198.925000] [ cut here ] [ 198.925000] kernel BUG at lib/list_debug.c:72! [ 198.925000] invalid opcode: [1] PREEMPT SMP [ 198.925000] last sysfs file: /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev [ 198.925000] CPU 1 [ 198.925000] Modules linked in: (Yes, I wish I got a backtrace, but that's as long as it lived. Apparently, the netconsole stuff actually writing this stuff out was over on CPU0 which then proceeded to croak). Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters) pgp0DGzUXtfnR.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 09:33 PM, [EMAIL PROTECTED] wrote: On Fri, 21 Sep 2007 19:30:04 +0200, Jiri Slaby said: On 09/21/2007 07:16 PM, [EMAIL PROTECTED] wrote: Hmm.. maybe I'm chasing a different bug manifested by the same patch. For me, it's been a solid lockup at X startup since -rc3-mm1, and this patch doesn't change matters. This patch probably changes behaviour how the pages are queued on the list somehow. Maybe it's insane to suggest everybody with similar problem to try LIST_DEBUG, but just give it a try after having one of the patches applied ;). (Or have you tried yet?) OK, had a chance to test it, with Dave Airlie's AGP patch, and here's what it hit: [ 198.925000] list_del corruption. next-prev should be 81000118f178, but was 8067e050 [ 198.925000] [ cut here ] [ 198.925000] kernel BUG at lib/list_debug.c:72! [ 198.925000] invalid opcode: [1] PREEMPT SMP [ 198.925000] last sysfs file: /devices/pci:00/:00:01.0/:01:00.0/i2c-adapter/i2c-1/i2c-1/dev [ 198.925000] CPU 1 [ 198.925000] Modules linked in: (Yes, I wish I got a backtrace, but that's as long as it lived. Apparently, the netconsole stuff actually writing this stuff out was over on CPU0 which then proceeded to croak). Some odd SMP-related race? (x86_64 kernel on a Core2 Duo T7200, if it matters) It is rather the other user who adds the page to some other list while being at deferred_pages list. Could you try my debug patch (http://lkml.org/lkml/2007/9/19/141)? -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/21/2007 09:38 PM, Jiri Slaby wrote: It is rather the other user who adds the page to some other list while being at deferred_pages list. Could you try my debug patch (http://lkml.org/lkml/2007/9/19/141)? or the whitespace non-damaged version: http://www.fi.muni.cz/~xslaby/sklad/pageattr_debug - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 2007.09.21 00:10:26 +, Jiri Slaby wrote: > > Could you try current xf86-video-intel driver? just do > > git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel > > It works! yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp. So for G33 user, you should upgrade both to be able to work correctly. > 3d problem, but it has maybe nothing to do with kernel: > $ glxinfo > name of display: :0.0 > Unrecognized deviceID 29c2 > X Error of failed request: GLXBadContext > ... It looks you have an old version of mesa, that i915 dri driver doesn't know your chipset. Try mesa-7.X. I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread and try Dave's patch. Thanks for testing! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, Sep 20, 2007 at 06:31:14PM -0500, Matt Mackall wrote: > On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: > > > It's broken for me. > > > > > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) > > > -rc4-mm1: solid lock on X shutdown, random solid locks about > > > once every four hours > > > -rc6-mm1: solid lock on X startup > > >+your patch: screen goes black, turns off and on a few times during > > > startup, can reboot with sysrq-b > > > > Does it work with my simple dumb patch instead of Dave's ? > > Sorry, forgot to mention: your one-liner flush also doesn't work (same > behavior). > > I suspect I'm tripping two things and the flushing thing fixes one but > not the other. Full bisect needed then I guess. Ok as a short cut you could perhaps the cpa-* patches first (might need to drop some later depending patches), then the drm and agp trees. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: > > It's broken for me. > > > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) > > -rc4-mm1: solid lock on X shutdown, random solid locks about > > once every four hours > > -rc6-mm1: solid lock on X startup > >+your patch: screen goes black, turns off and on a few times during > > startup, can reboot with sysrq-b > > Does it work with my simple dumb patch instead of Dave's ? Sorry, forgot to mention: your one-liner flush also doesn't work (same behavior). I suspect I'm tripping two things and the flushing thing fixes one but not the other. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
> It's broken for me. > > 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) > -rc4-mm1: solid lock on X shutdown, random solid locks about > once every four hours > -rc6-mm1: solid lock on X startup >+your patch: screen goes black, turns off and on a few times during > startup, can reboot with sysrq-b Does it work with my simple dumb patch instead of Dave's ? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
> > But now I'm talking about another issue -- a regression since rc4-mm1, > > where X > > server is unable to bind agp memory (those x logs above). The clflush issue > > has > > solved andi in > > http://lkml.org/lkml/2007/9/19/334 > > recently > > Tried that, my laptop still bricks the instant X starts up and the NVidia > driver > tries to initialize. Not even sysrq-foo works. Time to power-cycle. > I'd expect the binary to be doing something stupid with it's flushing and relying on the kernel to do something it no longer does.. so this is most likely a case of not fixable.. Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 11:24 AM, Zhenyu Wang wrote: > On 2007.09.20 17:33:45 +, Dave Airlie wrote: >>> Maybe you are rather interested in these dmesg lines: >>> Linux agpgart interface v0.102 >>> agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup >>> X.Org >>> on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) >>> agpgart: Detected an Intel G33 Chipset. >>> agpgart: Detected 8192K stolen memory. >>> agpgart: AGP aperture is 256M @ 0xd000 >>> [drm] Initialized drm 1.1.0 20060810 >>> ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 >>> [drm] Initialized i915 1.6.0 20060119 on minor 0 >>> ... >>> set status page addr 0x00033000 >>> agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 >>> agpgart: Trying to insert into local/stolen memory >>> >>> So the problem is, that X passes too low start. >>> >>> The X log: >>> http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old > > Could you try current xf86-video-intel driver? just do > git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel It works! 3d problem, but it has maybe nothing to do with kernel: $ glxinfo name of display: :0.0 Unrecognized deviceID 29c2 X Error of failed request: GLXBadContext ... regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote: > > The code is broken anyways. If you free pages without flushing > > them first some other innocent user allocating them will end up > > with possible uncached pages for some time. > > > > Does this simple patch help? > > > > I've attached a more complicated patch that does a 2 stage effort to > unmapping and freeing pages. My kernel no longer hangs with this > patch... > > Jiri can you confirm? It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Video is: 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 [Mobility FireGL 9000] (rev 02) -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 22:47:41 +0200, Jiri Slaby said: > On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote: > > That would probably have been me, saying that x86_64-mm-cpa-clflush.patch > > broke > > the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers > > as > > well? > > Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected > in my case): > http://lkml.org/lkml/2007/9/9/51 > > But now I'm talking about another issue -- a regression since rc4-mm1, where X > server is unable to bind agp memory (those x logs above). The clflush issue > has > solved andi in > http://lkml.org/lkml/2007/9/19/334 > recently Tried that, my laptop still bricks the instant X starts up and the NVidia driver tries to initialize. Not even sysrq-foo works. Time to power-cycle. pgpPjjghAuo6l.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 2007.09.20 17:33:45 +, Dave Airlie wrote: > > Maybe you are rather interested in these dmesg lines: > > Linux agpgart interface v0.102 > > agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup > > X.Org > > on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) > > agpgart: Detected an Intel G33 Chipset. > > agpgart: Detected 8192K stolen memory. > > agpgart: AGP aperture is 256M @ 0xd000 > > [drm] Initialized drm 1.1.0 20060810 > > ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 > > [drm] Initialized i915 1.6.0 20060119 on minor 0 > > ... > > set status page addr 0x00033000 > > agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 > > agpgart: Trying to insert into local/stolen memory > > > > So the problem is, that X passes too low start. > > > > The X log: > > http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old Could you try current xf86-video-intel driver? just do git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel My G33 was just verified broken today... so I'll try to reproduce it on another one tomorrow. > > I've cc'd Zhenyu who might be able to shed some light on this? can you > try 2.6.23-rc7 as maybe the G33 support still needs some work.. or > maybe I'm missing a patch in the drm.. > yep, should try 2.6.23-rc7 first, and it seems not drm relate. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
> >> Fatal server error: > >> Couldn't bind memory for front buffer > >> > >> I thought I'd seen a thread about this issue, but I can't find it now. Is > >> it > >> known or am I seeing ghosts yet, Andrew? > >> > > > > Can you send me a complete Xorg log file? > > Maybe you are rather interested in these dmesg lines: > Linux agpgart interface v0.102 > agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup > X.Org > on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) > agpgart: Detected an Intel G33 Chipset. > agpgart: Detected 8192K stolen memory. > agpgart: AGP aperture is 256M @ 0xd000 > [drm] Initialized drm 1.1.0 20060810 > ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 > [drm] Initialized i915 1.6.0 20060119 on minor 0 > ... > set status page addr 0x00033000 > agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 > agpgart: Trying to insert into local/stolen memory > > So the problem is, that X passes too low start. > > The X log: > http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old > I've cc'd Zhenyu who might be able to shed some light on this? can you try 2.6.23-rc7 as maybe the G33 support still needs some work.. or maybe I'm missing a patch in the drm.. Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 03:51 AM, Dave Airlie wrote: > On 9/20/07, Jiri Slaby <[EMAIL PROTECTED]> wrote: >> On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). >>> What do you mean with not run? >> (II) intel(0): Initializing HW Cursor >> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) >> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 >> at offset 0x5ff000 failed (Invalid argument) >> >> Fatal server error: >> Couldn't bind memory for front buffer >> >> I thought I'd seen a thread about this issue, but I can't find it now. Is it >> known or am I seeing ghosts yet, Andrew? >> > > Can you send me a complete Xorg log file? Maybe you are rather interested in these dmesg lines: Linux agpgart interface v0.102 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) agpgart: Detected an Intel G33 Chipset. agpgart: Detected 8192K stolen memory. agpgart: AGP aperture is 256M @ 0xd000 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt :00:02.0[A] -> GSI 16 (level, low) -> IRQ 16 [drm] Initialized i915 1.6.0 20060119 on minor 0 ... set status page addr 0x00033000 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 agpgart: Trying to insert into local/stolen memory So the problem is, that X passes too low start. The X log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old > and lspci -vv? # lspci -vvx 00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02) Subsystem: Intel Corporation DRAM Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- Capabilities: [50] Subsystem: Intel Corporation 82801 PCI Bridge 00: 86 80 4e 24 06 01 10 00 92 01 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 01 20 f0 00 80 22 20: 60 ff 60 ff f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 50 00 00 00 00 00 00 00 ff 00 02 00 00:1f.0 ISA bridge: Intel Corporation LPC Interface Controller (rev 02) Subsystem: Intel Corporation LPC Interface Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- my 945 works fine with my drm tree on top of Linus with clflush + my > agp fix I just sent out .. Should I try? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 04:24 AM, Andrew Morton wrote: > On Thu, 20 Sep 2007 11:42:29 +1000 "Dave Airlie" <[EMAIL PROTECTED]> wrote: > >> From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 >> From: Dave Airlie <[EMAIL PROTECTED]> >> Date: Thu, 20 Sep 2007 11:30:41 +1000 >> Subject: [PATCH] agp: fix race condition between unmapping and freeing pages > > This fixes the hang-when-quitting-X on the Vaio. Checked. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 04:24 AM, Andrew Morton wrote: On Thu, 20 Sep 2007 11:42:29 +1000 Dave Airlie [EMAIL PROTECTED] wrote: From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 From: Dave Airlie [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 11:30:41 +1000 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages This fixes the hang-when-quitting-X on the Vaio. Checked. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 03:51 AM, Dave Airlie wrote: On 9/20/07, Jiri Slaby [EMAIL PROTECTED] wrote: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? Can you send me a complete Xorg log file? Maybe you are rather interested in these dmesg lines: Linux agpgart interface v0.102 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) agpgart: Detected an Intel G33 Chipset. agpgart: Detected 8192K stolen memory. agpgart: AGP aperture is 256M @ 0xd000 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 [drm] Initialized i915 1.6.0 20060119 on minor 0 ... set status page addr 0x00033000 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 agpgart: Trying to insert into local/stolen memory So the problem is, that X passes too low start. The X log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old and lspci -vv? # lspci -vvx 00:00.0 Host bridge: Intel Corporation DRAM Controller (rev 02) Subsystem: Intel Corporation DRAM Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort+ SERR- PERR- Latency: 0 Capabilities: [e0] Vendor Specific Information 00: 86 80 c0 29 06 00 90 20 02 00 00 06 00 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c0 29 30: 00 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00:02.0 VGA compatible controller: Intel Corporation Integrated Graphics Controller (rev 02) (prog-if 00 [VGA]) Subsystem: Intel Corporation Integrated Graphics Controller Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at ffa8 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at ec00 [size=8] Region 2: Memory at d000 (32-bit, prefetchable) [size=256M] Region 3: Memory at ff90 (32-bit, non-prefetchable) [size=1M] Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- Queue=0/0 Enable- Address: Data: Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00: 86 80 c2 29 07 00 90 00 02 00 00 03 00 00 00 00 10: 00 00 a8 ff 01 ec 00 00 08 00 00 d0 00 00 90 ff 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c2 29 30: 00 00 00 00 90 00 00 00 00 00 00 00 0a 01 00 00 00:03.0 Communication controller: Intel Corporation MEI Controller (rev 02) Subsystem: Intel Corporation MEI Controller Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 Interrupt: pin A routed to IRQ 10 Region 0: Memory at ffa7bc00 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [8c] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable- Address: Data: 00: 86 80 c4 29 06 00 10 00 02 00 80 07 00 00 80 00 10: 04 bc a7 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 c4 29 30: 00 00 00 00 50 00 00 00 00 00 00 00 0a 01 00 00 00:03.2 IDE interface: Intel Corporation PT IDER Controller (rev 02) (prog-if 85 [Master SecO PriO]) Subsystem: Intel Corporation PT IDER Controller Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- Latency: 0 Interrupt: pin C routed to IRQ 12 Region 0: I/O ports at e880 [size=8] Region 1: I/O ports at e800 [size=4] Region 2: I/O ports at e480 [size=8] Region 3: I/O ports at e400 [size=4]
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? Can you send me a complete Xorg log file? Maybe you are rather interested in these dmesg lines: Linux agpgart interface v0.102 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) agpgart: Detected an Intel G33 Chipset. agpgart: Detected 8192K stolen memory. agpgart: AGP aperture is 256M @ 0xd000 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 [drm] Initialized i915 1.6.0 20060119 on minor 0 ... set status page addr 0x00033000 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 agpgart: Trying to insert into local/stolen memory So the problem is, that X passes too low start. The X log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old I've cc'd Zhenyu who might be able to shed some light on this? can you try 2.6.23-rc7 as maybe the G33 support still needs some work.. or maybe I'm missing a patch in the drm.. Dave. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 2007.09.20 17:33:45 +, Dave Airlie wrote: Maybe you are rather interested in these dmesg lines: Linux agpgart interface v0.102 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) agpgart: Detected an Intel G33 Chipset. agpgart: Detected 8192K stolen memory. agpgart: AGP aperture is 256M @ 0xd000 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 [drm] Initialized i915 1.6.0 20060119 on minor 0 ... set status page addr 0x00033000 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 agpgart: Trying to insert into local/stolen memory So the problem is, that X passes too low start. The X log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old Could you try current xf86-video-intel driver? just do git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel My G33 was just verified broken today... so I'll try to reproduce it on another one tomorrow. I've cc'd Zhenyu who might be able to shed some light on this? can you try 2.6.23-rc7 as maybe the G33 support still needs some work.. or maybe I'm missing a patch in the drm.. yep, should try 2.6.23-rc7 first, and it seems not drm relate. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 22:47:41 +0200, Jiri Slaby said: On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote: That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as well? Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected in my case): http://lkml.org/lkml/2007/9/9/51 But now I'm talking about another issue -- a regression since rc4-mm1, where X server is unable to bind agp memory (those x logs above). The clflush issue has solved andi in http://lkml.org/lkml/2007/9/19/334 recently Tried that, my laptop still bricks the instant X starts up and the NVidia driver tries to initialize. Not even sysrq-foo works. Time to power-cycle. pgpPjjghAuo6l.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, Sep 20, 2007 at 11:42:29AM +1000, Dave Airlie wrote: The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? I've attached a more complicated patch that does a 2 stage effort to unmapping and freeing pages. My kernel no longer hangs with this patch... Jiri can you confirm? It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Video is: 01:00.0 VGA compatible controller: ATI Technologies Inc Radeon R250 [Mobility FireGL 9000] (rev 02) -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/20/2007 11:24 AM, Zhenyu Wang wrote: On 2007.09.20 17:33:45 +, Dave Airlie wrote: Maybe you are rather interested in these dmesg lines: Linux agpgart interface v0.102 agpgart: suspend/resume problematic: resume with 3D/DRI active may lockup X.Org on some chipset/BIOS combos (see DEBUG_AGP_PM in intel-agp.c) agpgart: Detected an Intel G33 Chipset. agpgart: Detected 8192K stolen memory. agpgart: AGP aperture is 256M @ 0xd000 [drm] Initialized drm 1.1.0 20060810 ACPI: PCI Interrupt :00:02.0[A] - GSI 16 (level, low) - IRQ 16 [drm] Initialized i915 1.6.0 20060119 on minor 0 ... set status page addr 0x00033000 agpgart: pg_start == 0x05ff,intel_private.gtt_entries == 0x0800 agpgart: Trying to insert into local/stolen memory So the problem is, that X passes too low start. The X log: http://www.fi.muni.cz/~xslaby/sklad/Xorg.0.log.old Could you try current xf86-video-intel driver? just do git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel It works! 3d problem, but it has maybe nothing to do with kernel: $ glxinfo name of display: :0.0 Unrecognized deviceID 29c2 X Error of failed request: GLXBadContext ... regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
But now I'm talking about another issue -- a regression since rc4-mm1, where X server is unable to bind agp memory (those x logs above). The clflush issue has solved andi in http://lkml.org/lkml/2007/9/19/334 recently Tried that, my laptop still bricks the instant X starts up and the NVidia driver tries to initialize. Not even sysrq-foo works. Time to power-cycle. I'd expect the binary to be doing something stupid with it's flushing and relying on the kernel to do something it no longer does.. so this is most likely a case of not fixable.. Dave. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Does it work with my simple dumb patch instead of Dave's ? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Does it work with my simple dumb patch instead of Dave's ? Sorry, forgot to mention: your one-liner flush also doesn't work (same behavior). I suspect I'm tripping two things and the flushing thing fixes one but not the other. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, Sep 20, 2007 at 06:31:14PM -0500, Matt Mackall wrote: On Fri, Sep 21, 2007 at 01:03:04AM +0200, Andi Kleen wrote: It's broken for me. 2.6.23-rc3-mm1: solid lock on X shutdown (noticed when upgrading) -rc4-mm1: solid lock on X shutdown, random solid locks about once every four hours -rc6-mm1: solid lock on X startup +your patch: screen goes black, turns off and on a few times during startup, can reboot with sysrq-b Does it work with my simple dumb patch instead of Dave's ? Sorry, forgot to mention: your one-liner flush also doesn't work (same behavior). I suspect I'm tripping two things and the flushing thing fixes one but not the other. Full bisect needed then I guess. Ok as a short cut you could perhaps the cpa-* patches first (might need to drop some later depending patches), then the drm and agp trees. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 2007.09.21 00:10:26 +, Jiri Slaby wrote: Could you try current xf86-video-intel driver? just do git clone git://anongit.freedesktop.org/git/xorg/driver/xf86-video-intel It works! yep, I also pushed a fix for G33 in xf86-video-intel when fixing the intel agp. So for G33 user, you should upgrade both to be able to work correctly. 3d problem, but it has maybe nothing to do with kernel: $ glxinfo name of display: :0.0 Unrecognized deviceID 29c2 X Error of failed request: GLXBadContext ... It looks you have an old version of mesa, that i915 dri driver doesn't know your chipset. Try mesa-7.X. I have also seen X exit broken with 2.6.23-rc6-mm1, will follow this thread and try Dave's patch. Thanks for testing! - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, 20 Sep 2007 11:42:29 +1000 "Dave Airlie" <[EMAIL PROTECTED]> wrote: > From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 > From: Dave Airlie <[EMAIL PROTECTED]> > Date: Thu, 20 Sep 2007 11:30:41 +1000 > Subject: [PATCH] agp: fix race condition between unmapping and freeing pages This fixes the hang-when-quitting-X on the Vaio. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 9/20/07, Jiri Slaby <[EMAIL PROTECTED]> wrote: > On 09/19/2007 09:54 PM, Andi Kleen wrote: > >> Yeah. (But X doesn't run -- this is maybe the known issue in this release). > > > > What do you mean with not run? > > (II) intel(0): Initializing HW Cursor > (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) > (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 > at offset 0x5ff000 failed (Invalid argument) > > Fatal server error: > Couldn't bind memory for front buffer > > I thought I'd seen a thread about this issue, but I can't find it now. Is it > known or am I seeing ghosts yet, Andrew? > Can you send me a complete Xorg log file? and lspci -vv? my 945 works fine with my drm tree on top of Linus with clflush + my agp fix I just sent out .. Dave. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
> The code is broken anyways. If you free pages without flushing > them first some other innocent user allocating them will end up > with possible uncached pages for some time. > > Does this simple patch help? > I've attached a more complicated patch that does a 2 stage effort to unmapping and freeing pages. My kernel no longer hangs with this patch... Jiri can you confirm? I'll look at the other issue separately.. Dave. From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 From: Dave Airlie <[EMAIL PROTECTED]> Date: Thu, 20 Sep 2007 11:30:41 +1000 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages With Andi's clflush fixup, we were getting hangs on server exit, flushing the mappings after freeing each page helped. This showed up a race condition where the pages after being freed could be reused before the agp mappings had been flushed. Flushing after each single page is a bad thing for future drm work, so make the page destroy a two pass unmapping all the pages, flushing the mappings, and then destroying the pages. Signed-off-by: Dave Airlie <[EMAIL PROTECTED]> --- drivers/char/agp/agp.h |7 +-- drivers/char/agp/ali-agp.c | 29 + drivers/char/agp/backend.c | 12 drivers/char/agp/generic.c | 20 ++-- drivers/char/agp/i460-agp.c |4 ++-- drivers/char/agp/intel-agp.c |6 -- 6 files changed, 50 insertions(+), 28 deletions(-) diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h index 8955e7f..b83824c 100644 --- a/drivers/char/agp/agp.h +++ b/drivers/char/agp/agp.h @@ -58,6 +58,9 @@ struct gatt_mask { * devices this will probably be ignored */ }; +#define AGP_PAGE_DESTROY_UNMAP 1 +#define AGP_PAGE_DESTROY_FREE 2 + struct aper_size_info_8 { int size; int num_entries; @@ -113,7 +116,7 @@ struct agp_bridge_driver { struct agp_memory *(*alloc_by_type) (size_t, int); void (*free_by_type)(struct agp_memory *); void *(*agp_alloc_page)(struct agp_bridge_data *); - void (*agp_destroy_page)(void *); + void (*agp_destroy_page)(void *, int flags); int (*agp_type_to_mask_type) (struct agp_bridge_data *, int); }; @@ -267,7 +270,7 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type); struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type); void agp_generic_free_by_type(struct agp_memory *curr); void *agp_generic_alloc_page(struct agp_bridge_data *bridge); -void agp_generic_destroy_page(void *addr); +void agp_generic_destroy_page(void *addr, int flags); void agp_free_key(int key); int agp_num_entries(void); u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command); diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c index 4941ddb..2b65155 100644 --- a/drivers/char/agp/ali-agp.c +++ b/drivers/char/agp/ali-agp.c @@ -156,29 +156,34 @@ static void *m1541_alloc_page(struct agp_bridge_data *bridge) return addr; } -static void ali_destroy_page(void * addr) +static void ali_destroy_page(void * addr, int flags) { if (addr) { - global_cache_flush(); /* is this really needed? --hch */ - agp_generic_destroy_page(addr); - global_flush_tlb(); + if (flags & AGP_PAGE_DESTROY_UNMAP) { + global_cache_flush(); /* is this really needed? --hch */ + agp_generic_destroy_page(addr, flags); + global_flush_tlb(); + } else + agp_generic_destroy_page(addr, flags); } } -static void m1541_destroy_page(void * addr) +static void m1541_destroy_page(void * addr, int flags) { u32 temp; if (addr == NULL) return; - global_cache_flush(); - - pci_read_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, ); - pci_write_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, - (((temp & ALI_CACHE_FLUSH_ADDR_MASK) | - virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN)); - agp_generic_destroy_page(addr); + if (flags & AGP_PAGE_DESTROY_UNMAP) { + global_cache_flush(); + + pci_read_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, ); + pci_write_config_dword(agp_bridge->dev, ALI_CACHE_FLUSH_CTRL, + (((temp & ALI_CACHE_FLUSH_ADDR_MASK) | + virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN)); + } + agp_generic_destroy_page(addr, flags); } diff --git a/drivers/char/agp/backend.c b/drivers/char/agp/backend.c index 1b47c89..832ded2 100644 --- a/drivers/char/agp/backend.c +++ b/drivers/char/agp/backend.c @@ -189,9 +189,11 @@ static int agp_backend_initialize(struct agp_bridge_data *bridge) err_out: if (bridge->driver->needs_scratch_page) { - bridge->driver->agp_destroy_page( -gart_to_virt(bridge->scratch_page_real)); + bridge->driver->agp_destroy_page(gart_to_virt(bridge->scratch_page_real), + AGP_PAGE_DESTROY_UNMAP); flush_agp_mappings(); + bridge->driver->agp_destroy_page(gart_to_virt(bridge->scratch_page_real), + AGP_PAGE_DESTROY_FREE); } if (got_gatt) bridge->driver->free_gatt_table(bridge); @@ -215,9 +217,11
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 9/20/07, Andi Kleen <[EMAIL PROTECTED]> wrote: > On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: > > On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: > > > > > -8<-8<-8<-8<-8<-8< > > > That means > > > void agp_generic_destroy_page(void *addr) > > > { > > > struct page *page; > > > > > > if (addr == NULL) > > > return; > > > > > > page = virt_to_page(addr); > > > (1) unmap_page_from_agp(page); > > > put_page(page); > > > (2) free_page((unsigned long)addr); > > > atomic_dec(_bridge->current_memory_agp); > > > } > > > > > > (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr -> > > > __change_page_attr -> save_page -> list_add(>lru, _pages); > > > (2) free_page -> free_pages -> __free_pages -> free_hot_page -> > > > free_hot_cold_page -> list_add(>lru, >list); > > > > that'll hurt. > > > > > any ideas how to fix this? > > > > We should hold a single reference on the page for its membership in > > deferred_pages. > > The code is broken anyways. If you free pages without flushing > them first some other innocent user allocating them will end up > with possible uncached pages for some time. > > Does this simple patch help? > > -Andi > > > Flush uncached AGP pages before freeing In theory this should be handled by the caller, so as to avoid the overhead of continuous flushing however I can see a potential race condition here if the pages are put back into the kernel before the caller flushes the mappings.. Do we need some sort of two step approach here? as flushing after each page would be a major overhead for dynamic agp stuff in the new memory manager.. Dave. > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> > > Index: linux/drivers/char/agp/generic.c > === > --- linux.orig/drivers/char/agp/generic.c > +++ linux/drivers/char/agp/generic.c > @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr > > page = virt_to_page(addr); > unmap_page_from_agp(page); > + flush_agp_mappings(); > put_page(page); > free_page((unsigned long)addr); > atomic_dec(_bridge->current_memory_agp); > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 22:01:59 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: > On 09/19/2007 09:57 PM, Jiri Slaby wrote: > > On 09/19/2007 09:54 PM, Andi Kleen wrote: > >>> Yeah. (But X doesn't run -- this is maybe the known issue in this > >>> release). > >> What do you mean with not run? > > > > (II) intel(0): Initializing HW Cursor > > (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) > > (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 > > at offset 0x5ff000 failed (Invalid argument) > > > > Fatal server error: > > Couldn't bind memory for front buffer > > Further info: > 4690 write(0, "(II) intel(0): Initializing HW C"..., 38) = 38 > 4690 write(0, "(II) intel(0): xf86BindGARTMemor"..., 76) = 76 > 4690 ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument) > 4690 write(0, "(WW) intel(0): xf86BindGARTMemor"..., 115) = 115 > 4690 write(2, "\nFatal server error:\n", 21) = 21 > This might be a Dave thing and not an Andi thing. In my usual -mm-testing I only test X on the one machine (the Vaio, natch). Check that it runs glxgears, check suspend/resume to mem and disk. It has intel graphics and I'm not seeing any such problems. Have you time to bisect it? http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt describes how. As a quick test, perhaps build a tree with just 2.6.23-rc6+origin.patch+git-drm.patch? Fortunately git-agpgart.patch is presently empty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote: > On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said: >> On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release) > . >>> What do you mean with not run? >> (II) intel(0): Initializing HW Cursor >> (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) >> (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 >> at offset 0x5ff000 failed (Invalid argument) >> >> Fatal server error: >> Couldn't bind memory for front buffer >> >> I thought I'd seen a thread about this issue, but I can't find it now. Is it >> known or am I seeing ghosts yet, Andrew? > > That would probably have been me, saying that x86_64-mm-cpa-clflush.patch > broke > the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as > well? Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected in my case): http://lkml.org/lkml/2007/9/9/51 But now I'm talking about another issue -- a regression since rc4-mm1, where X server is unable to bind agp memory (those x logs above). The clflush issue has solved andi in http://lkml.org/lkml/2007/9/19/334 recently regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said: > On 09/19/2007 09:54 PM, Andi Kleen wrote: > >> Yeah. (But X doesn't run -- this is maybe the known issue in this release) . > > > > What do you mean with not run? > > (II) intel(0): Initializing HW Cursor > (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) > (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 > at offset 0x5ff000 failed (Invalid argument) > > Fatal server error: > Couldn't bind memory for front buffer > > I thought I'd seen a thread about this issue, but I can't find it now. Is it > known or am I seeing ghosts yet, Andrew? That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as well? pgpsPLYEmEu99.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:57 PM, Jiri Slaby wrote: > On 09/19/2007 09:54 PM, Andi Kleen wrote: >>> Yeah. (But X doesn't run -- this is maybe the known issue in this release). >> What do you mean with not run? > > (II) intel(0): Initializing HW Cursor > (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) > (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 > at offset 0x5ff000 failed (Invalid argument) > > Fatal server error: > Couldn't bind memory for front buffer Further info: 4690 write(0, "(II) intel(0): Initializing HW C"..., 38) = 38 4690 write(0, "(II) intel(0): xf86BindGARTMemor"..., 76) = 76 4690 ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument) 4690 write(0, "(WW) intel(0): xf86BindGARTMemor"..., 115) = 115 4690 write(2, "\nFatal server error:\n", 21) = 21 > I thought I'd seen a thread about this issue, but I can't find it now. Is it > known or am I seeing ghosts yet, Andrew? -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:54 PM, Andi Kleen wrote: >> Yeah. (But X doesn't run -- this is maybe the known issue in this release). > > What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
> Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:24 PM, Andi Kleen wrote: > On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: >> On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: >> >>> -8<-8<-8<-8<-8<-8< >>> That means >>> void agp_generic_destroy_page(void *addr) >>> { >>> struct page *page; >>> >>> if (addr == NULL) >>> return; >>> >>> page = virt_to_page(addr); >>> (1) unmap_page_from_agp(page); >>> put_page(page); >>> (2) free_page((unsigned long)addr); >>> atomic_dec(_bridge->current_memory_agp); >>> } >>> >>> (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr -> >>> __change_page_attr -> save_page -> list_add(>lru, _pages); >>> (2) free_page -> free_pages -> __free_pages -> free_hot_page -> >>> free_hot_cold_page -> list_add(>lru, >list); >> that'll hurt. >> >>> any ideas how to fix this? >> We should hold a single reference on the page for its membership in >> deferred_pages. > > The code is broken anyways. If you free pages without flushing > them first some other innocent user allocating them will end up > with possible uncached pages for some time. > > Does this simple patch help? Yeah. (But X doesn't run -- this is maybe the known issue in this release). > Flush uncached AGP pages before freeing > > Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Tested-by: Jiri Slaby <[EMAIL PROTECTED]> > > Index: linux/drivers/char/agp/generic.c > === > --- linux.orig/drivers/char/agp/generic.c > +++ linux/drivers/char/agp/generic.c > @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr > > page = virt_to_page(addr); > unmap_page_from_agp(page); > + flush_agp_mappings(); > put_page(page); > free_page((unsigned long)addr); > atomic_dec(_bridge->current_memory_agp); > thanks, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: > On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: > > > -8<-8<-8<-8<-8<-8< > > That means > > void agp_generic_destroy_page(void *addr) > > { > > struct page *page; > > > > if (addr == NULL) > > return; > > > > page = virt_to_page(addr); > > (1) unmap_page_from_agp(page); > > put_page(page); > > (2) free_page((unsigned long)addr); > > atomic_dec(_bridge->current_memory_agp); > > } > > > > (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr -> > > __change_page_attr -> save_page -> list_add(>lru, _pages); > > (2) free_page -> free_pages -> __free_pages -> free_hot_page -> > > free_hot_cold_page -> list_add(>lru, >list); > > that'll hurt. > > > any ideas how to fix this? > > We should hold a single reference on the page for its membership in > deferred_pages. The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? -Andi Flush uncached AGP pages before freeing Signed-off-by: Andi Kleen <[EMAIL PROTECTED]> Index: linux/drivers/char/agp/generic.c === --- linux.orig/drivers/char/agp/generic.c +++ linux/drivers/char/agp/generic.c @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr page = virt_to_page(addr); unmap_page_from_agp(page); + flush_agp_mappings(); put_page(page); free_page((unsigned long)addr); atomic_dec(_bridge->current_memory_agp); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby <[EMAIL PROTECTED]> wrote: > -8<-8<-8<-8<-8<-8< > That means > void agp_generic_destroy_page(void *addr) > { > struct page *page; > > if (addr == NULL) > return; > > page = virt_to_page(addr); > (1) unmap_page_from_agp(page); > put_page(page); > (2) free_page((unsigned long)addr); > atomic_dec(_bridge->current_memory_agp); > } > > (1) unmap_page_from_agp -> change_page_attr -> change_page_attr_addr -> > __change_page_attr -> save_page -> list_add(>lru, _pages); > (2) free_page -> free_pages -> __free_pages -> free_hot_page -> > free_hot_cold_page -> list_add(>lru, >list); that'll hurt. > any ideas how to fix this? We should hold a single reference on the page for its membership in deferred_pages. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 01:53 PM, Jiri Slaby wrote: > On 09/19/2007 01:43 PM, Jiri Slaby wrote: >> On 09/18/2007 10:18 AM, Andrew Morton wrote: >>> - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but >>> I didn't drop that patch because the iommu patch series depends on it. >> No matter whether slub/slab is selected someone gets a page and moves/adds >> its > > Oh, only adds, if it moves, it won't break the list. Going to check for > POISON1/2 in __list_add, will get results (if any) in few moments... Huh, it took longer than I expect :). Changed this: diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c index 836c218..cd8499c 100644 --- a/arch/x86_64/mm/pageattr.c +++ b/arch/x86_64/mm/pageattr.c @@ -112,7 +112,14 @@ static inline void save_page(struct page *fpage, int data) return; SetPageFlush(fpage); } + printk("ADD (%s): E=%p, H=%p, H->N=%p, N->P=%p, N->N=%p; PREV0=%p, " + "NEXT0=%p, ", + current->comm, >lru, + _pages, deferred_pages.next, + deferred_pages.next->prev, deferred_pages.next->next, + fpage->lru.prev, fpage->lru.next); list_add(>lru, _pages); + printk("PREV1=%p, NEXT1=%p\n", fpage->lru.prev, fpage->lru.next); } /* @@ -274,6 +281,7 @@ void global_flush_tlb(void) down_read(_mm.mmap_sem); arg.full_flush = full_flush; full_flush = 0; + printk("FLUSH\n"); list_replace_init(_pages, ); up_read(_mm.mmap_sem); diff --git a/include/linux/list.h b/include/linux/list.h index f29fc9c..1add963 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -265,6 +265,8 @@ static inline void list_del_init(struct list_head *entry) static inline void list_move(struct list_head *list, struct list_head *head) { __list_del(list->prev, list->next); + list->next = LIST_POISON1; + list->prev = LIST_POISON2; list_add(list, head); } @@ -277,6 +279,8 @@ static inline void list_move_tail(struct list_head *list, struct list_head *head) { __list_del(list->prev, list->next); + list->next = LIST_POISON1; + list->prev = LIST_POISON2; list_add_tail(list, head); } diff --git a/lib/list_debug.c b/lib/list_debug.c index 4350ba9..57573d5 100644 --- a/lib/list_debug.c +++ b/lib/list_debug.c @@ -15,15 +15,34 @@ * This is only for internal list manipulation where we know * the prev/next entries already! */ - +#include void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { + static unsigned int a, b; + unsigned long off; + + if (unlikely(!a && current && current->comm[0] == 'X')) + a++; + + if (unlikely(a && !b && (void *)new >= (void *)mem_map && + (void *)new < (void *)(mem_map + 1048576) && + (new->prev != LIST_POISON2 || new->next != LIST_POISON1) && + (new->prev != NULL || new->next != NULL) && + (new->prev != new || new->next != new))) { + off = ((void *)new - (void *)mem_map) % sizeof(struct page); + if (off == offsetof(struct page, lru)) { + printk(KERN_DEBUG "POISONS (%p): " + "%p, %p\n", new, new->prev, new->next); + dump_stack(); + } + } if (unlikely(next->prev != prev)) { printk(KERN_ERR "list_add corruption. next->prev should be " "prev (%p), but was %p. (next=%p).\n", prev, next->prev, next); + b++; BUG(); } if (unlikely(prev->next != next)) { -8<-8<-8<-8<-8<-8< and got this: ADD (X): E=81000115b9e0, H=80673aa0, H->N=8100011573e0, N->P=80673aa0, N->N=81000115ba18; PREV0=00200200, NEXT0=00100100, PREV1=80673aa0, NEXT1=8100011573e0 /--\ this () was output from unmap path, see (1) below and here () follows output from free_page path, see (2) \--/ POISONS (81000115b9e0): 80673aa0, 8100011573e0 Call Trace: [] __list_add+0xf6/0x190 [] list_add+0xc/0x10 [] free_hot_cold_page+0xfd/0x170 [] free_hot_page+0xb/0x10 [] __free_pages+0x25/0x30 [] free_pages+0x2b/0x40 [] agp_generic_destroy_page+0x56/0x70 [] agp_free_memory+0x65/0xd0 [] agp_free_memory_wrap+0x39/0x60 [] agp_release+0xdb/0x1c0 [] __fput+0xc0/0x1b0 [] fput+0x16/0x20 [] filp_close+0x56/0x90 [] put_files_struct+0xc2/0xd0 [] do_exit+0x1e5/0x990 [] do_group_exit+0x37/0x90 [] sys_exit_group+0x12/0x20 [] system_call+0x7e/0x83
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 01:43 PM, Jiri Slaby wrote: > On 09/18/2007 10:18 AM, Andrew Morton wrote: >> - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but >> I didn't drop that patch because the iommu patch series depends on it. > > No matter whether slub/slab is selected someone gets a page and moves/adds its Oh, only adds, if it moves, it won't break the list. Going to check for POISON1/2 in __list_add, will get results (if any) in few moments... regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 01:43 PM, Jiri Slaby wrote: On 09/18/2007 10:18 AM, Andrew Morton wrote: - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but I didn't drop that patch because the iommu patch series depends on it. No matter whether slub/slab is selected someone gets a page and moves/adds its Oh, only adds, if it moves, it won't break the list. Going to check for POISON1/2 in __list_add, will get results (if any) in few moments... regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 01:53 PM, Jiri Slaby wrote: On 09/19/2007 01:43 PM, Jiri Slaby wrote: On 09/18/2007 10:18 AM, Andrew Morton wrote: - The Vaio hangs when quitting X due to x86_64-mm-cpa-clflush.patch, but I didn't drop that patch because the iommu patch series depends on it. No matter whether slub/slab is selected someone gets a page and moves/adds its Oh, only adds, if it moves, it won't break the list. Going to check for POISON1/2 in __list_add, will get results (if any) in few moments... Huh, it took longer than I expect :). Changed this: diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c index 836c218..cd8499c 100644 --- a/arch/x86_64/mm/pageattr.c +++ b/arch/x86_64/mm/pageattr.c @@ -112,7 +112,14 @@ static inline void save_page(struct page *fpage, int data) return; SetPageFlush(fpage); } + printk(ADD (%s): E=%p, H=%p, H-N=%p, N-P=%p, N-N=%p; PREV0=%p, + NEXT0=%p, , + current-comm, fpage-lru, + deferred_pages, deferred_pages.next, + deferred_pages.next-prev, deferred_pages.next-next, + fpage-lru.prev, fpage-lru.next); list_add(fpage-lru, deferred_pages); + printk(PREV1=%p, NEXT1=%p\n, fpage-lru.prev, fpage-lru.next); } /* @@ -274,6 +281,7 @@ void global_flush_tlb(void) down_read(init_mm.mmap_sem); arg.full_flush = full_flush; full_flush = 0; + printk(FLUSH\n); list_replace_init(deferred_pages, arg.l); up_read(init_mm.mmap_sem); diff --git a/include/linux/list.h b/include/linux/list.h index f29fc9c..1add963 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -265,6 +265,8 @@ static inline void list_del_init(struct list_head *entry) static inline void list_move(struct list_head *list, struct list_head *head) { __list_del(list-prev, list-next); + list-next = LIST_POISON1; + list-prev = LIST_POISON2; list_add(list, head); } @@ -277,6 +279,8 @@ static inline void list_move_tail(struct list_head *list, struct list_head *head) { __list_del(list-prev, list-next); + list-next = LIST_POISON1; + list-prev = LIST_POISON2; list_add_tail(list, head); } diff --git a/lib/list_debug.c b/lib/list_debug.c index 4350ba9..57573d5 100644 --- a/lib/list_debug.c +++ b/lib/list_debug.c @@ -15,15 +15,34 @@ * This is only for internal list manipulation where we know * the prev/next entries already! */ - +#include linux/sched.h void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { + static unsigned int a, b; + unsigned long off; + + if (unlikely(!a current current-comm[0] == 'X')) + a++; + + if (unlikely(a !b (void *)new = (void *)mem_map + (void *)new (void *)(mem_map + 1048576) + (new-prev != LIST_POISON2 || new-next != LIST_POISON1) + (new-prev != NULL || new-next != NULL) + (new-prev != new || new-next != new))) { + off = ((void *)new - (void *)mem_map) % sizeof(struct page); + if (off == offsetof(struct page, lru)) { + printk(KERN_DEBUG POISONS (%p): + %p, %p\n, new, new-prev, new-next); + dump_stack(); + } + } if (unlikely(next-prev != prev)) { printk(KERN_ERR list_add corruption. next-prev should be prev (%p), but was %p. (next=%p).\n, prev, next-prev, next); + b++; BUG(); } if (unlikely(prev-next != next)) { -8-8-8-8-8-8 and got this: ADD (X): E=81000115b9e0, H=80673aa0, H-N=8100011573e0, N-P=80673aa0, N-N=81000115ba18; PREV0=00200200, NEXT0=00100100, PREV1=80673aa0, NEXT1=8100011573e0 /--\ this () was output from unmap path, see (1) below and here () follows output from free_page path, see (2) \--/ POISONS (81000115b9e0): 80673aa0, 8100011573e0 Call Trace: [80328c06] __list_add+0xf6/0x190 [80328cac] list_add+0xc/0x10 [8026e46d] free_hot_cold_page+0xfd/0x170 [8026e52b] free_hot_page+0xb/0x10 [8026e555] __free_pages+0x25/0x30 [8026e58b] free_pages+0x2b/0x40 [8037cf16] agp_generic_destroy_page+0x56/0x70 [8037d9d5] agp_free_memory+0x65/0xd0 [8037c0e9] agp_free_memory_wrap+0x39/0x60 [8037c79b] agp_release+0xdb/0x1c0 [80291440] __fput+0xc0/0x1b0 [802915b6] fput+0x16/0x20
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote: -8-8-8-8-8-8 That means void agp_generic_destroy_page(void *addr) { struct page *page; if (addr == NULL) return; page = virt_to_page(addr); (1) unmap_page_from_agp(page); put_page(page); (2) free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); } (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr - __change_page_attr - save_page - list_add(fpage-lru, deferred_pages); (2) free_page - free_pages - __free_pages - free_hot_page - free_hot_cold_page - list_add(page-lru, pcp-list); that'll hurt. any ideas how to fix this? We should hold a single reference on the page for its membership in deferred_pages. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote: -8-8-8-8-8-8 That means void agp_generic_destroy_page(void *addr) { struct page *page; if (addr == NULL) return; page = virt_to_page(addr); (1) unmap_page_from_agp(page); put_page(page); (2) free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); } (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr - __change_page_attr - save_page - list_add(fpage-lru, deferred_pages); (2) free_page - free_pages - __free_pages - free_hot_page - free_hot_cold_page - list_add(page-lru, pcp-list); that'll hurt. any ideas how to fix this? We should hold a single reference on the page for its membership in deferred_pages. The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? -Andi Flush uncached AGP pages before freeing Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/drivers/char/agp/generic.c === --- linux.orig/drivers/char/agp/generic.c +++ linux/drivers/char/agp/generic.c @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr page = virt_to_page(addr); unmap_page_from_agp(page); + flush_agp_mappings(); put_page(page); free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:24 PM, Andi Kleen wrote: On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote: -8-8-8-8-8-8 That means void agp_generic_destroy_page(void *addr) { struct page *page; if (addr == NULL) return; page = virt_to_page(addr); (1) unmap_page_from_agp(page); put_page(page); (2) free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); } (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr - __change_page_attr - save_page - list_add(fpage-lru, deferred_pages); (2) free_page - free_pages - __free_pages - free_hot_page - free_hot_cold_page - list_add(page-lru, pcp-list); that'll hurt. any ideas how to fix this? We should hold a single reference on the page for its membership in deferred_pages. The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? Yeah. (But X doesn't run -- this is maybe the known issue in this release). Flush uncached AGP pages before freeing Signed-off-by: Andi Kleen [EMAIL PROTECTED] Tested-by: Jiri Slaby [EMAIL PROTECTED] Index: linux/drivers/char/agp/generic.c === --- linux.orig/drivers/char/agp/generic.c +++ linux/drivers/char/agp/generic.c @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr page = virt_to_page(addr); unmap_page_from_agp(page); + flush_agp_mappings(); put_page(page); free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); thanks, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 09:57 PM, Jiri Slaby wrote: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer Further info: 4690 write(0, (II) intel(0): Initializing HW C..., 38) = 38 4690 write(0, (II) intel(0): xf86BindGARTMemor..., 76) = 76 4690 ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument) 4690 write(0, (WW) intel(0): xf86BindGARTMemor..., 115) = 115 4690 write(2, \nFatal server error:\n, 21) = 21 I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release) . What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as well? pgpsPLYEmEu99.pgp Description: PGP signature
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 09/19/2007 10:32 PM, [EMAIL PROTECTED] wrote: On Wed, 19 Sep 2007 21:57:27 +0200, Jiri Slaby said: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release) . What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? That would probably have been me, saying that x86_64-mm-cpa-clflush.patch broke the NVidia graphics driver in 23-rc3-mm1. Is it breaking *other* X drivers as well? Yes, the issue is there from rc3-mm1, see (intel and radeon cards are affected in my case): http://lkml.org/lkml/2007/9/9/51 But now I'm talking about another issue -- a regression since rc4-mm1, where X server is unable to bind agp memory (those x logs above). The clflush issue has solved andi in http://lkml.org/lkml/2007/9/19/334 recently regards, -- Jiri Slaby ([EMAIL PROTECTED]) Faculty of Informatics, Masaryk University - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Wed, 19 Sep 2007 22:01:59 +0200 Jiri Slaby [EMAIL PROTECTED] wrote: On 09/19/2007 09:57 PM, Jiri Slaby wrote: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer Further info: 4690 write(0, (II) intel(0): Initializing HW C..., 38) = 38 4690 write(0, (II) intel(0): xf86BindGARTMemor..., 76) = 76 4690 ioctl(9, AGPIOC_BIND, 0x7fffbe1cb850) = -1 EINVAL (Invalid argument) 4690 write(0, (WW) intel(0): xf86BindGARTMemor..., 115) = 115 4690 write(2, \nFatal server error:\n, 21) = 21 This might be a Dave thing and not an Andi thing. In my usual -mm-testing I only test X on the one machine (the Vaio, natch). Check that it runs glxgears, check suspend/resume to mem and disk. It has intel graphics and I'm not seeing any such problems. Have you time to bisect it? http://www.zip.com.au/~akpm/linux/patches/stuff/bisecting-mm-trees.txt describes how. As a quick test, perhaps build a tree with just 2.6.23-rc6+origin.patch+git-drm.patch? Fortunately git-agpgart.patch is presently empty. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 9/20/07, Andi Kleen [EMAIL PROTECTED] wrote: On Wed, Sep 19, 2007 at 12:10:17PM -0700, Andrew Morton wrote: On Wed, 19 Sep 2007 16:59:04 +0200 Jiri Slaby [EMAIL PROTECTED] wrote: -8-8-8-8-8-8 That means void agp_generic_destroy_page(void *addr) { struct page *page; if (addr == NULL) return; page = virt_to_page(addr); (1) unmap_page_from_agp(page); put_page(page); (2) free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); } (1) unmap_page_from_agp - change_page_attr - change_page_attr_addr - __change_page_attr - save_page - list_add(fpage-lru, deferred_pages); (2) free_page - free_pages - __free_pages - free_hot_page - free_hot_cold_page - list_add(page-lru, pcp-list); that'll hurt. any ideas how to fix this? We should hold a single reference on the page for its membership in deferred_pages. The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? -Andi Flush uncached AGP pages before freeing In theory this should be handled by the caller, so as to avoid the overhead of continuous flushing however I can see a potential race condition here if the pages are put back into the kernel before the caller flushes the mappings.. Do we need some sort of two step approach here? as flushing after each page would be a major overhead for dynamic agp stuff in the new memory manager.. Dave. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Index: linux/drivers/char/agp/generic.c === --- linux.orig/drivers/char/agp/generic.c +++ linux/drivers/char/agp/generic.c @@ -1185,6 +1185,7 @@ void agp_generic_destroy_page(void *addr page = virt_to_page(addr); unmap_page_from_agp(page); + flush_agp_mappings(); put_page(page); free_page((unsigned long)addr); atomic_dec(agp_bridge-current_memory_agp); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
The code is broken anyways. If you free pages without flushing them first some other innocent user allocating them will end up with possible uncached pages for some time. Does this simple patch help? I've attached a more complicated patch that does a 2 stage effort to unmapping and freeing pages. My kernel no longer hangs with this patch... Jiri can you confirm? I'll look at the other issue separately.. Dave. From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 From: Dave Airlie [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 11:30:41 +1000 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages With Andi's clflush fixup, we were getting hangs on server exit, flushing the mappings after freeing each page helped. This showed up a race condition where the pages after being freed could be reused before the agp mappings had been flushed. Flushing after each single page is a bad thing for future drm work, so make the page destroy a two pass unmapping all the pages, flushing the mappings, and then destroying the pages. Signed-off-by: Dave Airlie [EMAIL PROTECTED] --- drivers/char/agp/agp.h |7 +-- drivers/char/agp/ali-agp.c | 29 + drivers/char/agp/backend.c | 12 drivers/char/agp/generic.c | 20 ++-- drivers/char/agp/i460-agp.c |4 ++-- drivers/char/agp/intel-agp.c |6 -- 6 files changed, 50 insertions(+), 28 deletions(-) diff --git a/drivers/char/agp/agp.h b/drivers/char/agp/agp.h index 8955e7f..b83824c 100644 --- a/drivers/char/agp/agp.h +++ b/drivers/char/agp/agp.h @@ -58,6 +58,9 @@ struct gatt_mask { * devices this will probably be ignored */ }; +#define AGP_PAGE_DESTROY_UNMAP 1 +#define AGP_PAGE_DESTROY_FREE 2 + struct aper_size_info_8 { int size; int num_entries; @@ -113,7 +116,7 @@ struct agp_bridge_driver { struct agp_memory *(*alloc_by_type) (size_t, int); void (*free_by_type)(struct agp_memory *); void *(*agp_alloc_page)(struct agp_bridge_data *); - void (*agp_destroy_page)(void *); + void (*agp_destroy_page)(void *, int flags); int (*agp_type_to_mask_type) (struct agp_bridge_data *, int); }; @@ -267,7 +270,7 @@ int agp_generic_remove_memory(struct agp_memory *mem, off_t pg_start, int type); struct agp_memory *agp_generic_alloc_by_type(size_t page_count, int type); void agp_generic_free_by_type(struct agp_memory *curr); void *agp_generic_alloc_page(struct agp_bridge_data *bridge); -void agp_generic_destroy_page(void *addr); +void agp_generic_destroy_page(void *addr, int flags); void agp_free_key(int key); int agp_num_entries(void); u32 agp_collect_device_status(struct agp_bridge_data *bridge, u32 mode, u32 command); diff --git a/drivers/char/agp/ali-agp.c b/drivers/char/agp/ali-agp.c index 4941ddb..2b65155 100644 --- a/drivers/char/agp/ali-agp.c +++ b/drivers/char/agp/ali-agp.c @@ -156,29 +156,34 @@ static void *m1541_alloc_page(struct agp_bridge_data *bridge) return addr; } -static void ali_destroy_page(void * addr) +static void ali_destroy_page(void * addr, int flags) { if (addr) { - global_cache_flush(); /* is this really needed? --hch */ - agp_generic_destroy_page(addr); - global_flush_tlb(); + if (flags AGP_PAGE_DESTROY_UNMAP) { + global_cache_flush(); /* is this really needed? --hch */ + agp_generic_destroy_page(addr, flags); + global_flush_tlb(); + } else + agp_generic_destroy_page(addr, flags); } } -static void m1541_destroy_page(void * addr) +static void m1541_destroy_page(void * addr, int flags) { u32 temp; if (addr == NULL) return; - global_cache_flush(); - - pci_read_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, temp); - pci_write_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, - (((temp ALI_CACHE_FLUSH_ADDR_MASK) | - virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN)); - agp_generic_destroy_page(addr); + if (flags AGP_PAGE_DESTROY_UNMAP) { + global_cache_flush(); + + pci_read_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, temp); + pci_write_config_dword(agp_bridge-dev, ALI_CACHE_FLUSH_CTRL, + (((temp ALI_CACHE_FLUSH_ADDR_MASK) | + virt_to_gart(addr)) | ALI_CACHE_FLUSH_EN)); + } + agp_generic_destroy_page(addr, flags); } diff --git a/drivers/char/agp/backend.c b/drivers/char/agp/backend.c index 1b47c89..832ded2 100644 --- a/drivers/char/agp/backend.c +++ b/drivers/char/agp/backend.c @@ -189,9 +189,11 @@ static int agp_backend_initialize(struct agp_bridge_data *bridge) err_out: if (bridge-driver-needs_scratch_page) { - bridge-driver-agp_destroy_page( -gart_to_virt(bridge-scratch_page_real)); + bridge-driver-agp_destroy_page(gart_to_virt(bridge-scratch_page_real), + AGP_PAGE_DESTROY_UNMAP); flush_agp_mappings(); + bridge-driver-agp_destroy_page(gart_to_virt(bridge-scratch_page_real), + AGP_PAGE_DESTROY_FREE); } if (got_gatt) bridge-driver-free_gatt_table(bridge); @@ -215,9 +217,11 @@ static void
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On 9/20/07, Jiri Slaby [EMAIL PROTECTED] wrote: On 09/19/2007 09:54 PM, Andi Kleen wrote: Yeah. (But X doesn't run -- this is maybe the known issue in this release). What do you mean with not run? (II) intel(0): Initializing HW Cursor (II) intel(0): xf86BindGARTMemory: bind key 0 at 0x005ff000 (pgoffset 1535) (WW) intel(0): xf86BindGARTMemory: binding of gart memory with key 0 at offset 0x5ff000 failed (Invalid argument) Fatal server error: Couldn't bind memory for front buffer I thought I'd seen a thread about this issue, but I can't find it now. Is it known or am I seeing ghosts yet, Andrew? Can you send me a complete Xorg log file? and lspci -vv? my 945 works fine with my drm tree on top of Linus with clflush + my agp fix I just sent out .. Dave. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: X-freeze after clflush changes [Was: 2.6.23-rc6-mm1]
On Thu, 20 Sep 2007 11:42:29 +1000 Dave Airlie [EMAIL PROTECTED] wrote: From 225696d75e7ec0bafbb47b935bd700e3fbeefbde Mon Sep 17 00:00:00 2001 From: Dave Airlie [EMAIL PROTECTED] Date: Thu, 20 Sep 2007 11:30:41 +1000 Subject: [PATCH] agp: fix race condition between unmapping and freeing pages This fixes the hang-when-quitting-X on the Vaio. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/