Re: [PATCH v3] kconfig: nconf: stop endless search loops
* On 4/16/21 7:40 AM, Masahiro Yamada wrote: > Applied to linux-kbuild. Thanks. Thank you for your review and input. :) Mihai
[PATCH v3] kconfig: nconf: stop endless search loops
If the user selects the very first entry in a page and performs a search-up operation, or selects the very last entry in a page and performs a search-down operation that will not succeed (e.g., via [/]asdfzzz[Up Arrow]), nconf will never terminate searching the page. The reason is that in this case, the starting point will be set to -1 or n, which is then translated into (n - 1) (i.e., the last entry of the page) or 0 (i.e., the first entry of the page) and finally the search begins. This continues to work fine until the index reaches 0 or (n - 1), at which point it will be decremented to -1 or incremented to n, but not checked against the starting point right away. Instead, it's wrapped around to the bottom or top again, after which the starting point check occurs... and naturally fails. My original implementation added another check for -1 before wrapping the running index variable around, but Masahiro Yamada pointed out that the actual issue is that the comparison point (starting point) exceeds bounds (i.e., the [0,n-1] interval) in the first place and that, instead, the starting point should be fixed. This has the welcome side-effect of also fixing the case where the starting point was n while searching down, which also lead to an infinite loop. OTOH, this code is now essentially all his work. Amazingly, nobody seems to have been hit by this for 11 years - or at the very least nobody bothered to debug and fix this. Signed-off-by: Mihai Moldovan --- v2: swap constant in comparison to right side, as requested by Randy Dunlap v3: reimplement as suggested by Masahiro Yamada , which has the side-effect of also fixing endless looping in the symmetric down-direction scripts/kconfig/nconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index e0f965529166..af814b39b876 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -504,8 +504,8 @@ static int get_mext_match(const char *match_str, match_f flag) else if (flag == FIND_NEXT_MATCH_UP) --match_start; + match_start = (match_start + items_num) % items_num; index = match_start; - index = (index + items_num) % items_num; while (true) { char *str = k_menu_items[index].str; if (strcasestr(str, match_str) != NULL) -- 2.30.1
Re: [PATCH v2] kconfig: nconf: stop endless search-up loops
* On 4/10/21 7:47 AM, Masahiro Yamada wrote: > On Sun, Mar 28, 2021 at 6:52 PM Mihai Moldovan wrote: >> + if ((index == -1) && (index == match_start)) >> + return -1; > > We know 'index' is -1 in the second comparison. > So, you can also write like this: > >if (match_start == -1 && index == -1) > return -1; I know, but I sided for the other form for semantic reasons - this more closely directly describes what we actually care about (both being the same value and either one being -1). > But, it is not the correct fix, either. > > The root cause of the bug is match_start > becoming -1. > > > The following is the correct way to fix the bug > without increasing the number of lines. > > > > diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c > index e0f965529166..af814b39b876 100644 > [...] > + match_start = (match_start + items_num) % items_num; > index = match_start; > - index = (index + items_num) % items_num; This is probably more elegant and fixes two issues at the same time: match_start becoming -1 or n (which is likewise invalid, but was implicitly handled through the remainder operation). No objections from my side. Mihai
[PATCH v2] kconfig: nconf: stop endless search-up loops
If the user selects the very first entry in a page and performs a search-up operation (e.g., via [/][a][Up Arrow]), nconf will never terminate searching the page. The reason is that in this case, the starting point will be set to -1, which is then translated into (n - 1) (i.e., the last entry of the page) and finally the search begins. This continues to work fine until the index reaches 0, at which point it will be decremented to -1, but not checked against the starting point right away. Instead, it's wrapped around to the bottom again, after which the starting point check occurs... and naturally fails. We can easily avoid it by checking against the starting point directly if the current index is -1 (which should be safe, since it's the only magic value that can occur) and terminate the matching function. Amazingly, nobody seems to have been hit by this for 11 years - or at the very least nobody bothered to debug and fix this. Signed-off-by: Mihai Moldovan --- v2: swap constant in comparison to right side, as requested by Randy Dunlap scripts/kconfig/nconf.c | 9 + 1 file changed, 9 insertions(+) diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index e0f965529166..db0dc46bc5ee 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -515,6 +515,15 @@ static int get_mext_match(const char *match_str, match_f flag) --index; else ++index; + /* +* It's fine for index to become negative - think of an +* initial value for match_start of 0 with a match direction +* of up, eventually making it -1. +* +* Handle this as a special case. +*/ + if ((index == -1) && (index == match_start)) + return -1; index = (index + items_num) % items_num; if (index == match_start) return -1; -- 2.30.1
Re: [PATCH] kconfig: nconf: stop endless search-up loops
* On 3/27/21 11:26 PM, Randy Dunlap wrote: > There is a test for it in checkpatch.pl but I also used checkpatch.pl > without it complaining, so I don't know what it takes to make the script > complain. > > if ($lead !~ /(?:$Operators|\.)\s*$/ && > $to !~ /^(?:Constant|[A-Z_][A-Z0-9_]*)$/ && > WARN("CONSTANT_COMPARISON", >"Comparisons should place the constant on the > right side of the test\n" . $herecurr) && There are multiple issues, err, "challenges" there: - literal "Constant" instead of "$Constant" - the left part is misinterpreted as an operation due to the minus sign (arithmetic operator) - $Constant is defined as "qr{$Float|$Binary|$Octal|$Hex|$Int}" (which is okay), but all these types do not include their negative range. As far as I can tell, the latter is intentional. Making these types compatible with negative values causes a lot of other places to break, so I'm not keen on changing this. The minus sign being misinterpreted as an operator is highly difficult to fix in a sane manner. The original intention was to avoid misinterpreting expressions like (var - CONSTANT real_op ...) as a constant-on-left expression (and, more importantly, to not misfix it when --fix is given). The general idea is sane and we probably shouldn't change that, but it would be good to handle negative values as well. At first, I was thinking of overriding this detection by checking if the leading part matches "(-\s*$", which should only be true for negative values, assuming that there is always an opening parenthesis as part of a conditional statement/loop (if, while). After playing around with this and composing this message for a few hours, it dawned on me that there can easily be free- standing forms (for instance as part of for loops or assignment lines), so that wouldn't cut it. It really goes downhill from here. I assume that false negatives are nuisances due to stylistic errors in the code, but false positives actually harmful since a lot of them make code review by maintainers very tedious. So far, minus signs were always part of the leading capture group. I'd actually like to have them in the constant capture group instead: - $line =~ /^\+(.*)\b($Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) { + $line =~ /^\+(.*)(-?\s*$Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) { With that sorted, the next best thing I could come up with to weed out preceding variables was this (which shouldn't influence non-negative constants): - if ($lead !~ /(?:$Operators|\.)\s*$/ && + if ($lead !~ /(?:$Operators|\.|[a-z])\s*$/ && There still are a lot of expressions that won't match this, like "-1 + 0 == var" (i.e., "CONSTANT CONSTANT ...") or constellations like a simple "(CONSTANT) ..." (e.g., "(1) == var"). This is all fuzzy and getting this right would involve moving away from trying to make sense of C code with regular expressions in Perl, but actually parsing it to extract the semantics. Not exactly something I'd like to do... Thoughts on my workaround for this issue? Did I miss anything crucial or introduce a new bug inadvertently?
Re: [PATCH] kconfig: nconf: stop endless search-up loops
* On 3/27/21 4:58 PM, Randy Dunlap wrote: > On 3/27/21 5:01 AM, Mihai Moldovan wrote: >> +if ((-1 == index) && (index == match_start)) > > checkpatch doesn't complain about this (and I wonder how it's missed), but > kernel style is (mostly) "constant goes on right hand side of comparison", > so > if ((index == -1) && I can naturally send a V2 with that swapped. To my rationale: I made sure to use checkpatch, saw that it was accepted and even went for a quick git grep -- '-1 ==', which likewise returned enough results for me to call this consistent with the current code style. Maybe those matches were just frowned-upon, but forgotten-to-be-critized examples of this pattern being used. Mihai OpenPGP_signature Description: OpenPGP digital signature
[PATCH] kconfig: nconf: stop endless search-up loops
If the user selects the very first entry in a page and performs a search-up operation (e.g., via [/][a][Up Arrow]), nconf will never terminate searching the page. The reason is that in this case, the starting point will be set to -1, which is then translated into (n - 1) (i.e., the last entry of the page) and finally the search begins. This continues to work fine until the index reaches 0, at which point it will be decremented to -1, but not checked against the starting point right away. Instead, it's wrapped around to the bottom again, after which the starting point check occurs... and naturally fails. We can easily avoid it by checking against the starting point directly if the current index is -1 (which should be safe, since it's the only magic value that can occur) and terminate the matching function. Amazingly, nobody seems to have been hit by this for 11 years - or at the very least nobody bothered to debug and fix this. Signed-off-by: Mihai Moldovan --- scripts/kconfig/nconf.c | 9 + 1 file changed, 9 insertions(+) diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index e0f965529166..92a5403d8afa 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -515,6 +515,15 @@ static int get_mext_match(const char *match_str, match_f flag) --index; else ++index; + /* +* It's fine for index to become negative - think of an +* initial value for match_start of 0 with a match direction +* of up, eventually making it -1. +* +* Handle this as a special case. +*/ + if ((-1 == index) && (index == match_start)) + return -1; index = (index + items_num) % items_num; if (index == match_start) return -1; -- 2.30.1
Re: Userspace woes with 5.1.5 due to TIPC
* On 5/30/19 9:51 PM, Jon Maloy wrote: > Make sure the following three commits are present in TIPC *after* the > offending commit: > > commit 532b0f7ece4c "tipc: fix modprobe tipc failed after switch order of > device registration" This *is* the offending commit, as far as I understand. Merely rebased in linux-stable, and hence having a different SHA, but mentioning the original SHA (i.e., 532b0f7ece4c) in its commit message. > Since that patch one was flawed it had to be reverted: > commit 5593530e5694 ""Revert tipc: fix modprobe tipc failed after switch > order of device registration" > > It was then replaced with this one: > commit 526f5b851a96 "tipc: fix modprobe tipc failed after switch order of > device registration" Okay, these two are not part of 5.1.5. I've backported them (and only these two) to 5.1.5 and the issue(s) seem to be gone. Definitely something that should be backported to/included in 5.1.6. Thanks for pointing all that out! Unfortunately I didn't add anything useful but noise, since you obviously already knew, that this commit was broken. I'd urge Greg to release a new stable version including the fixes soon, if possible, though, for not being able to start/use userspace browsers sounds like a pretty bad regression to me. Mihai signature.asc Description: OpenPGP digital signature
Userspace woes with 5.1.5 due to TIPC
Hi I've had a few issues lately (mainly bad RAM only, hopefully, which should be fixed now) and generally upgraded everything. With 5.1.5, though, some programs exhibited very weird behavior: Chromium crashed while starting up due to not being able to launch a new zygote process, albeit started when using --no-sandbox (likely because that didn't try to create other processes); Opera (based upon Chromium) failed to start with SIGILL, but that was only a red herring triggered by the same problem I guess; Firefox started up, but was unable to render any content because its multi-process IPC didn't work (i.e., it couldn't start new rendering processes). Interestingly, most other programs I use daily still worked, even though they used networking and IPC (command-line browsers, MATE Terminal, electron-based programs), so this bug didn't make the machine completely unusable. Since I've been using 5.1.3 without problems before and the issue was straight-forward to test for, I did a bisection run and came to that conclusion: bisect log Bisecting: 124 revisions left to test after this (roughly 7 steps) [ee4c3e283f8f3286bea60e9038adc70436d87d02] s390/mm: convert to the generic get_user_pages_fast code Bisecting: 62 revisions left to test after this (roughly 6 steps) [f7346dc0634cbad7fca5d951b91ad2e13f497b0b] clk: mediatek: Disable tuner_en before change PLL rate Bisecting: 30 revisions left to test after this (roughly 5 steps) [5ac8e698528149bb1618111d64e22bd8bb784256] parisc: Allow live-patching of __meminit functions Bisecting: 15 revisions left to test after this (roughly 4 steps) [c89c9af998fef2af4e5b2b35fb723693f17e05ef] mlxsw: core: Prevent QSFP module initialization for old hardware Bisecting: 7 revisions left to test after this (roughly 3 steps) [912d8c4cf9f19c93dfdf06b822eeadec9d71494d] net: test nouarg before dereferencing zerocopy pointers Bisecting: 3 revisions left to test after this (roughly 2 steps) [92166190b8282d9925e90a66961879782c50d037] rtnetlink: always put IFLA_LINK for links with a link-netnsid Bisecting: 1 revision left to test after this (roughly 1 step) [7d29c9ad0ed525c1b10e29cfca4fb1eece1e93fb] vsock/virtio: free packets during the socket release Bisecting: 0 revisions left to test after this (roughly 0 steps) [2d08f204328acaf85ac2c6fe5d5d9d4760f12e13] tipc: fix modprobe tipc failed after switch order of device registration 2d08f204328acaf85ac2c6fe5d5d9d4760f12e13 is the first bad commit commit 2d08f204328acaf85ac2c6fe5d5d9d4760f12e13 Author: Junwei Hu Date: Fri May 17 19:27:34 2019 +0800 tipc: fix modprobe tipc failed after switch order of device registration [ Upstream commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e ] Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) is called by tipc_topsrv_create_listener() in the initialization process of tipc_net_ops, tipc_socket_init() must be execute before that. I move tipc_socket_init() into function tipc_init_net(). Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Signed-off-by: Junwei Hu Reported-by: Wang Wang Reviewed-by: Kang Zhou Reviewed-by: Suanming Mou Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman :04 04 13d9b014338ccf6ae0c32bdb2be779870bbf97da df8a9c2a9f1f8df212999c2904632a77adb03782 M net === / bisect log === My kernel config is tailored to my machine, so probably not very useful to others, but I'm including it anyway. The most obvious point being CONFIG_TIPC=y, i.e., TIPC being built statically into the kernel. Not sure why I've done that in the first place because TIPC is not something that would be useful to me, but I often err on the "might be useful later" side. I might rethink that decision and just disable TIPC for good in the future. With this patch applied, the kernel generally spews out a few wonky messages that I've never seen before. For completeness sake, I've attached a ring buffer log from running the last working and first bad version. TIPC messages NET: Registered protocol family 30 Failed to register TIPC socket type === / TIPC messages === Now, blindly reverting the patch would obviously a bad idea, since that would mean trading one regression for the (initial) other one. I'm thus CCing the maintainers to help. Mihai config-5.1.3.xz Description: application/xz dmesg-5.1.4-00013-g2d08f204328a.log.xz Description: application/xz
Re: NULL pointer dereference in netfilter
Actually, may I be seeing just another incarnation of http://www.spinics.net/lists/netfilter-devel/msg31134.html? If so, applying https://lkml.org/lkml/2014/3/27/294 seems appropriate. Could anybody please confirm this? Mihai smime.p7s Description: S/MIME Cryptographic Signature
NULL pointer dereference in netfilter
Hi earlier today, I experienced a kernel panic due to a NULL pointer dereference somewhere in the netfilter subsystem. Full kernel output (may contain typos): [360412.114033] BUG: unable to handle kernel NULL pointer dereference at 0010 [360412.115643] IP: [] nf_nat_setup_info+0x56e/0x900 [360412.117244] PGD: 0 [360412.117337] Oops: 0002 [#3] SMP [360412.117337] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 xt_conntrack xt_dscp kvm_intel kvm hfcsusb mISDN_core e1000e cp210x i915 rfkil ptp video pps_core drm_kms_helper backlight [last unloaded: cfg80211] [360412.117337] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G DO 3.14.2-OSS4.2 #2 [360412.117337] Hardware name: /DQ45CB, BIOS CBQ4510H.86A.0133.2011.0810.1010 08/10/2011 [360412.117337] task: 8802321c5540 ti: 8802321f4000 task.ti: 8802321f4 [360412.117337] RIP: 0010:[] [] nf_nat_setup_info+0x56e/0x900 [360412.117337] RSP: 0018:88023bd03668 EFLAGS: 10246 [360412.117337] RAX: RBX: 8800b073d380 RCX: 0ae3d87f [360412.117337] RDX: 88021cdc9800 RSI: b8061897 RDI: 824808b8 [360412.117337] RBP: 88023bd03748 R08: 88003773e000 R09: 820ac780 [360412.117337] R10: 88021cdc9800 R11: 88021cdc98e0 R12: 235d [360412.117337] R13: R14: 88023bd03698 R15: 88023bd036c0 [360412.117337] FS: () GS:88023bd0() knlGS: [360412.117337] CS: 0010 DS: ES: CR0: 8005003b [360412.117337] CR2: 0010 CR3: 0200b000 CR4: 000407e0 [360412.117337] Stack: [360412.117337] 820ac780 81d905b0 88023bd036c0 820ac780 [360412.117337] 81d964e0 81d906a0 df8e782a [360412.117337] 8343b75500027f96 0006bb06 8343b755 [360412.117337] Call Trace: [360412.117337] [360412.117337] [] xt_snat_target_v0+0x6f/0x90 [360412.117337] [] ipt_do_table+0x2c3/0x6c0 [360412.117337] [] ? ipt_do_table+0x326/0x6c0 [360412.117337] [] nf_nat_ipv6_fn+0x1d7/0x330 [360412.117337] [] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [] nf_nat_ipv4_out+0x58/0x100 [360412.117337] [] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [] nf_iterate+0x85/0xb0 [360412.117337] [] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [] nf_hook_slow+0x6c/0x130 [360412.117337] [] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [] ip_output+0x82/0x90 [360412.117337] [] ip_local_out+0x24/0x30 [360412.117337] [] reject_tg+0x4d2/0x4e0 [360412.117337] [] ipt_do_table+0x2c3/0x6c0 [360412.117337] [] ? ip_rcv_finish+0x360/0x360 [360412.117337] [] iptable_filter_hook+0x34/0x70 [360412.117337] [] nf_iterate+0x85/0xb0 [360412.117337] [] ? ip_rcv_finish+0x360/0x360 [360412.117337] [] nf_hook_slow+0x6c/0x130 [360412.117337] [] ? ip_rcv_finish+0x360/0x360 [360412.117337] [] ip_local_deliver+0x73/0x80 [360412.117337] [] ip_rcv_finish+0x83/0x360 [360412.117337] [] ip_rcv+0x2a8/0x3e0 [360412.117337] [] __netif_receive_skb_core+0x632/0x7a0 [360412.117337] [] __netif_receive_skb+0x1c/0x70 [360412.117337] [] process_backlog+0x9c/0x170 [360412.117337] [] net_rx_action+0xfb/0x1a0 [360412.117337] [] __do_softirq+0xd5/0x1f0 [360412.117337] [] irq_exit+0x95/0xa0 [360412.117337] [] do_IRQ+0x62/0x110 [360412.117337] [] common_interrupt_0x67/0x67 [360412.117337] [360412.117337] [] ? cpuidle_enter_state+0x56/0xd0 [360412.117337] [] ? cpuidle_enter_state+0x52/0xd0 [360412.117337] [] cpuidle_idle_call+0x9a/0x140 [360412.117337] [] arch_cpu_idle+0x9/0x20 [360412.117337] [] cpu_startup_entry+0xda/0x1c0 [360412.117337] [] start_secondary+0x20d/0x2c0 [360412.117337] Code: e0 e8 a7 a9 1b 00 48 8b 93 e0 00 00 00 49 c1 ec 20 48 85 d2 74 0c 0f b6 42 11 84 c0 0f 85 93 02 00 00 31 c0 4c 8b 8d 38 ff ff ff <48> 89 58 10 49 8b 91 70 0b 00 00 4a 8d 14 e2 48 8b 0a 48 89 50 [360412.117337] RIP [] nf_nat_setup_info+0x56e/0x900 [360412.117337] RSP [360412.117337] CR2: 0010 [360412.117337] - - -[ end trace 691638412d73c338 ]- - - [360412.117337] Kernel panic - not syncing: Fatal exception in interrupt [360412.117337] Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) [360412.117337] drm_kms_helper: panic occurred, switching back to text console decodecode: All code 0:e0 e8loopne 0xffea 2:a7 cmpsl %es:(%rdi),%ds:(%rsi) 3:a9 1b 00 48 8b test $0x8b48001b,%eax 8:93 xchg %eax,%ebx 9:e0 00loopne 0xb b:00 00add%al,(%rax) d:49 c1 ec 20 shr$0x20,%r12 11:48 85 d2 test %rdx,%rdx 14:74 0cje 0x22 16:0f b6 42 11
NULL pointer dereference in netfilter
Hi earlier today, I experienced a kernel panic due to a NULL pointer dereference somewhere in the netfilter subsystem. Full kernel output (may contain typos): [360412.114033] BUG: unable to handle kernel NULL pointer dereference at 0010 [360412.115643] IP: [81865efe] nf_nat_setup_info+0x56e/0x900 [360412.117244] PGD: 0 [360412.117337] Oops: 0002 [#3] SMP [360412.117337] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 xt_conntrack xt_dscp kvm_intel kvm hfcsusb mISDN_core e1000e cp210x i915 rfkil ptp video pps_core drm_kms_helper backlight [last unloaded: cfg80211] [360412.117337] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G DO 3.14.2-OSS4.2 #2 [360412.117337] Hardware name: /DQ45CB, BIOS CBQ4510H.86A.0133.2011.0810.1010 08/10/2011 [360412.117337] task: 8802321c5540 ti: 8802321f4000 task.ti: 8802321f4 [360412.117337] RIP: 0010:[81865efe] [81865efe] nf_nat_setup_info+0x56e/0x900 [360412.117337] RSP: 0018:88023bd03668 EFLAGS: 10246 [360412.117337] RAX: RBX: 8800b073d380 RCX: 0ae3d87f [360412.117337] RDX: 88021cdc9800 RSI: b8061897 RDI: 824808b8 [360412.117337] RBP: 88023bd03748 R08: 88003773e000 R09: 820ac780 [360412.117337] R10: 88021cdc9800 R11: 88021cdc98e0 R12: 235d [360412.117337] R13: R14: 88023bd03698 R15: 88023bd036c0 [360412.117337] FS: () GS:88023bd0() knlGS: [360412.117337] CS: 0010 DS: ES: CR0: 8005003b [360412.117337] CR2: 0010 CR3: 0200b000 CR4: 000407e0 [360412.117337] Stack: [360412.117337] 820ac780 81d905b0 88023bd036c0 820ac780 [360412.117337] 81d964e0 81d906a0 df8e782a [360412.117337] 8343b75500027f96 0006bb06 8343b755 [360412.117337] Call Trace: [360412.117337] IRQ [360412.117337] [81874e9f] xt_snat_target_v0+0x6f/0x90 [360412.117337] [818e0453] ipt_do_table+0x2c3/0x6c0 [360412.117337] [818e04b6] ? ipt_do_table+0x326/0x6c0 [360412.117337] [818e0d07] nf_nat_ipv6_fn+0x1d7/0x330 [360412.117337] [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [818e1068] nf_nat_ipv4_out+0x58/0x100 [360412.117337] [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [81846b75] nf_iterate+0x85/0xb0 [360412.117337] [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [81846c0c] nf_hook_slow+0x6c/0x130 [360412.117337] [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30 [360412.117337] [81889bb2] ip_output+0x82/0x90 [360412.117337] [81889314] ip_local_out+0x24/0x30 [360412.117337] [818e2182] reject_tg+0x4d2/0x4e0 [360412.117337] [818e0453] ipt_do_table+0x2c3/0x6c0 [360412.117337] [81883f30] ? ip_rcv_finish+0x360/0x360 [360412.117337] [818e0924] iptable_filter_hook+0x34/0x70 [360412.117337] [81846b75] nf_iterate+0x85/0xb0 [360412.117337] [81883f30] ? ip_rcv_finish+0x360/0x360 [360412.117337] [81846c0c] nf_hook_slow+0x6c/0x130 [360412.117337] [81883f30] ? ip_rcv_finish+0x360/0x360 [360412.117337] [81884303] ip_local_deliver+0x73/0x80 [360412.117337] [81883c53] ip_rcv_finish+0x83/0x360 [360412.117337] [818845b8] ip_rcv+0x2a8/0x3e0 [360412.117337] [817e7bb2] __netif_receive_skb_core+0x632/0x7a0 [360412.117337] [817e7d3c] __netif_receive_skb+0x1c/0x70 [360412.117337] [817e7e2c] process_backlog+0x9c/0x170 [360412.117337] [817e823b] net_rx_action+0xfb/0x1a0 [360412.117337] [810c3e65] __do_softirq+0xd5/0x1f0 [360412.117337] [810c4185] irq_exit+0x95/0xa0 [360412.117337] [81003d82] do_IRQ+0x62/0x110 [360412.117337] [81a20d67] common_interrupt_0x67/0x67 [360412.117337] EOI [360412.117337] [81791ce6] ? cpuidle_enter_state+0x56/0xd0 [360412.117337] [81791ce2] ? cpuidle_enter_state+0x52/0xd0 [360412.117337] [81791dfa] cpuidle_idle_call+0x9a/0x140 [360412.117337] [8100afe9] arch_cpu_idle+0x9/0x20 [360412.117337] [8110a81a] cpu_startup_entry+0xda/0x1c0 [360412.117337] [8102a1ad] start_secondary+0x20d/0x2c0 [360412.117337] Code: e0 e8 a7 a9 1b 00 48 8b 93 e0 00 00 00 49 c1 ec 20 48 85 d2 74 0c 0f b6 42 11 84 c0 0f 85 93 02 00 00 31 c0 4c 8b 8d 38 ff ff ff 48 89 58 10 49 8b 91 70 0b 00 00 4a 8d 14 e2 48 8b 0a 48 89 50 [360412.117337] RIP [81865efe] nf_nat_setup_info+0x56e/0x900 [360412.117337] RSP 88023bd03668 [360412.117337] CR2: 0010 [360412.117337] - - -[ end trace 691638412d73c338 ]- - - [360412.117337] Kernel panic - not syncing: Fatal exception in interrupt [360412.117337] Kernel Offset: 0x0 from
Re: NULL pointer dereference in netfilter
Actually, may I be seeing just another incarnation of http://www.spinics.net/lists/netfilter-devel/msg31134.html? If so, applying https://lkml.org/lkml/2014/3/27/294 seems appropriate. Could anybody please confirm this? Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: Oops (NULL ptr deref) while loading some module
* On 15.07.2013 01:54 AM, Mihai Moldovan wrote: > This is obviously happening while booting and udev is loading *some* module, > but > I have no idea which module is affected as such. Quick correction: actually, at that time udev hasn't even started. udev is being started by my initramfs one good second later, so at the time of those Oopses, the root fs wasn't even mounted yet. Maybe the initramfs, but I'm not too sure either. [4.769188] dracut: dracut-029 [4.789227] systemd-udevd[1984]: starting version 204 What is the kernel trying to modprobe? Off what location, exactly? It can't be /, as that isn't even mounted yet. The initramfs? Maybe, but this has NO modules packed up whatsoever. I just double-checked. root@valery/tmp/foo# ls lib/modules/3.10.1-OSS4.2-dirty modules.alias modules.alias.bin modules.builtin modules.builtin.bin modules.dep modules.dep.bin modules.devname modules.order modules.softdep modules.symbols modules.symbols.bin The initramfs is solely used for assembling the RAID arrays when booting and does not include any modules. I just upgraded to 3.10.1 and am still seeing this. Interesting issue, isn't it? :) Mihai smime.p7s Description: S/MIME Cryptographic Signature
Oops (NULL ptr deref) while loading some module
Hi all, I'm seeing following oopses when booting up my kernel: [3.173479] BUG: unable to handle kernel NULL pointer dereference at (null) [3.173602] IP: [] futex_wake+0x74/0x130 [3.173679] PGD 231d65067 PUD 231d64067 PMD 0 [3.173783] Oops: [#1] SMP [3.173870] Modules linked in: [3.173936] CPU 0 [3.173959] Pid: 615, comm: modprobe Not tainted 3.9.6-OSS4.2-dirty #34 /DQ45CB [3.174091] RIP: 0010:[] [] futex_wake+0x74/0x130 [3.174195] RSP: 0018:8802311dbda8 EFLAGS: 00010246 [3.174249] RAX: RBX: RCX: 7f125139 [3.174306] RDX: RSI: 3c28288f RDI: 8222ee70 [3.174363] RBP: 8802311dbe08 R08: efa13b63 R09: [3.174420] R10: R11: 0202 R12: 8222ee70 [3.174477] R13: R14: 8222ee78 R15: [3.174535] FS: 7ff44c2a3700() GS:88023bc0() knlGS: [3.174620] CS: 0010 DS: ES: CR0: 80050033 [3.174676] CR2: CR3: 000231d61000 CR4: 000407f0 [3.174734] DR0: DR1: DR2: [3.174791] DR3: DR6: 0ff0 DR7: 0400 [3.174849] Process modprobe (pid: 615, threadinfo 8802311da000, task 880231e272c0) [3.174935] Stack: [3.174984] 880231d62a10 00010001 07f8 7fff78d0a000 [3.175139] 8802311e8000 091c 8802311dbdf8 [3.175293] 0001 7fff78d0a91c 0001 [3.175447] Call Trace: [3.175499] [] do_futex+0x100/0xab0 [3.17] [] ? __do_page_fault+0x244/0x4e0 [3.175611] [] ? mntput+0x21/0x30 [3.175666] [] ? __fput+0x16b/0x240 [3.175721] [] sys_futex+0x88/0x180 [3.175775] [] ? do_page_fault+0x9/0x10 [3.175830] [] system_call_fastpath+0x16/0x1b [3.175886] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9 ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 <48> 8b 18 48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f [3.176678] RIP [] futex_wake+0x74/0x130 [3.176678] RSP [3.176678] CR2: [3.177366] ---[ end trace 7213d911e494c10b ]--- [3.177823] BUG: unable to handle kernel NULL pointer dereference at (null) [3.177944] IP: [] futex_wake+0x74/0x130 [3.178017] PGD 2311f4067 PUD 2311f5067 PMD 0 [3.178122] Oops: [#2] SMP [3.178207] Modules linked in: [3.178274] CPU 0 [3.178296] Pid: 617, comm: modprobe Tainted: G D 3.9.6-OSS4.2-dirty #34 /DQ45CB [3.178428] RIP: 0010:[] [] futex_wake+0x74/0x130 [3.178531] RSP: 0018:880231213da8 EFLAGS: 00010246 [3.178585] RAX: RBX: RCX: 006a3b48 [3.178643] RDX: RSI: 1d796f0a RDI: 8222ec60 [3.178700] RBP: 880231213e08 R08: cbc14f19 R09: [3.178758] R10: R11: 0202 R12: 8222ec60 [3.178816] R13: R14: 8222ec68 R15: [3.178873] FS: 7f5baf639700() GS:88023bc0() knlGS: [3.178958] CS: 0010 DS: ES: CR0: 80050033 [3.179013] CR2: CR3: 0002311f7000 CR4: 000407f0 [3.179071] DR0: DR1: DR2: [3.179128] DR3: DR6: 0ff0 DR7: 0400 [3.179185] Process modprobe (pid: 617, threadinfo 880231212000, task 880231e26540) [3.179270] Stack: [3.179318] 8802311f3a10 00010001 07f0 7fff80ed6000 [3.179472] 8802311e8340 082c 880231213df8 [3.179626] 0001 7fff80ed682c 0001 [3.179780] Call Trace: [3.179829] [] do_futex+0x100/0xab0 [3.179884] [] ? __do_page_fault+0x244/0x4e0 [3.179940] [] ? mntput+0x21/0x30 [3.179994] [] ? __fput+0x16b/0x240 [3.180071] [] sys_futex+0x88/0x180 [3.180126] [] ? do_page_fault+0x9/0x10 [3.180183] [] system_call_fastpath+0x16/0x1b [3.180238] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9 ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 <48> 8b 18 48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f [3.180892] RIP [] futex_wake+0x74/0x130 [3.180892] RSP [3.180892] CR2: [3.181699] ---[ end trace 7213d911e494c10c ]--- This is obviously happening while booting and udev is loading *some* module, but I have no idea which module is affected as such. Luckily, my module list is quite concise: Module
Oops (NULL ptr deref) while loading some module
Hi all, I'm seeing following oopses when booting up my kernel: [3.173479] BUG: unable to handle kernel NULL pointer dereference at (null) [3.173602] IP: [810d2f54] futex_wake+0x74/0x130 [3.173679] PGD 231d65067 PUD 231d64067 PMD 0 [3.173783] Oops: [#1] SMP [3.173870] Modules linked in: [3.173936] CPU 0 [3.173959] Pid: 615, comm: modprobe Not tainted 3.9.6-OSS4.2-dirty #34 /DQ45CB [3.174091] RIP: 0010:[810d2f54] [810d2f54] futex_wake+0x74/0x130 [3.174195] RSP: 0018:8802311dbda8 EFLAGS: 00010246 [3.174249] RAX: RBX: RCX: 7f125139 [3.174306] RDX: RSI: 3c28288f RDI: 8222ee70 [3.174363] RBP: 8802311dbe08 R08: efa13b63 R09: [3.174420] R10: R11: 0202 R12: 8222ee70 [3.174477] R13: R14: 8222ee78 R15: [3.174535] FS: 7ff44c2a3700() GS:88023bc0() knlGS: [3.174620] CS: 0010 DS: ES: CR0: 80050033 [3.174676] CR2: CR3: 000231d61000 CR4: 000407f0 [3.174734] DR0: DR1: DR2: [3.174791] DR3: DR6: 0ff0 DR7: 0400 [3.174849] Process modprobe (pid: 615, threadinfo 8802311da000, task 880231e272c0) [3.174935] Stack: [3.174984] 880231d62a10 00010001 07f8 7fff78d0a000 [3.175139] 8802311e8000 091c 8802311dbdf8 [3.175293] 0001 7fff78d0a91c 0001 [3.175447] Call Trace: [3.175499] [810d4d40] do_futex+0x100/0xab0 [3.17] [819772d4] ? __do_page_fault+0x244/0x4e0 [3.175611] [811806f1] ? mntput+0x21/0x30 [3.175666] [81164c7b] ? __fput+0x16b/0x240 [3.175721] [810d5778] sys_futex+0x88/0x180 [3.175775] [81977579] ? do_page_fault+0x9/0x10 [3.175830] [8197a252] system_call_fastpath+0x16/0x1b [3.175886] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9 ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 48 8b 18 48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f [3.176678] RIP [810d2f54] futex_wake+0x74/0x130 [3.176678] RSP 8802311dbda8 [3.176678] CR2: [3.177366] ---[ end trace 7213d911e494c10b ]--- [3.177823] BUG: unable to handle kernel NULL pointer dereference at (null) [3.177944] IP: [810d2f54] futex_wake+0x74/0x130 [3.178017] PGD 2311f4067 PUD 2311f5067 PMD 0 [3.178122] Oops: [#2] SMP [3.178207] Modules linked in: [3.178274] CPU 0 [3.178296] Pid: 617, comm: modprobe Tainted: G D 3.9.6-OSS4.2-dirty #34 /DQ45CB [3.178428] RIP: 0010:[810d2f54] [810d2f54] futex_wake+0x74/0x130 [3.178531] RSP: 0018:880231213da8 EFLAGS: 00010246 [3.178585] RAX: RBX: RCX: 006a3b48 [3.178643] RDX: RSI: 1d796f0a RDI: 8222ec60 [3.178700] RBP: 880231213e08 R08: cbc14f19 R09: [3.178758] R10: R11: 0202 R12: 8222ec60 [3.178816] R13: R14: 8222ec68 R15: [3.178873] FS: 7f5baf639700() GS:88023bc0() knlGS: [3.178958] CS: 0010 DS: ES: CR0: 80050033 [3.179013] CR2: CR3: 0002311f7000 CR4: 000407f0 [3.179071] DR0: DR1: DR2: [3.179128] DR3: DR6: 0ff0 DR7: 0400 [3.179185] Process modprobe (pid: 617, threadinfo 880231212000, task 880231e26540) [3.179270] Stack: [3.179318] 8802311f3a10 00010001 07f0 7fff80ed6000 [3.179472] 8802311e8340 082c 880231213df8 [3.179626] 0001 7fff80ed682c 0001 [3.179780] Call Trace: [3.179829] [810d4d40] do_futex+0x100/0xab0 [3.179884] [819772d4] ? __do_page_fault+0x244/0x4e0 [3.179940] [811806f1] ? mntput+0x21/0x30 [3.179994] [81164c7b] ? __fput+0x16b/0x240 [3.180071] [810d5778] sys_futex+0x88/0x180 [3.180126] [81977579] ? do_page_fault+0x9/0x10 [3.180183] [8197a252] system_call_fastpath+0x16/0x1b [3.180238] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9 ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 48 8b 18 48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f [
Re: Oops (NULL ptr deref) while loading some module
* On 15.07.2013 01:54 AM, Mihai Moldovan wrote: This is obviously happening while booting and udev is loading *some* module, but I have no idea which module is affected as such. Quick correction: actually, at that time udev hasn't even started. udev is being started by my initramfs one good second later, so at the time of those Oopses, the root fs wasn't even mounted yet. Maybe the initramfs, but I'm not too sure either. [4.769188] dracut: dracut-029 [4.789227] systemd-udevd[1984]: starting version 204 What is the kernel trying to modprobe? Off what location, exactly? It can't be /, as that isn't even mounted yet. The initramfs? Maybe, but this has NO modules packed up whatsoever. I just double-checked. root@valery/tmp/foo# ls lib/modules/3.10.1-OSS4.2-dirty modules.alias modules.alias.bin modules.builtin modules.builtin.bin modules.dep modules.dep.bin modules.devname modules.order modules.softdep modules.symbols modules.symbols.bin The initramfs is solely used for assembling the RAID arrays when booting and does not include any modules. I just upgraded to 3.10.1 and am still seeing this. Interesting issue, isn't it? :) Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: [ 092/128] iommu/intel: disable DMAR for g4x integrated gfx
* On 03.02.2013 03:48 PM, Ben Hutchings wrote: > [...] > +static void quirk_iommu_g4x_gfx(struct pci_dev *dev) > +{ Shouldn't __devinit be used here too, like for quirk_iommu_rwbf? It probably doesn't matter too much. especially on platforms with Intel IOMMU, but... it makes the code coherent. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: [ 092/128] iommu/intel: disable DMAR for g4x integrated gfx
* On 03.02.2013 03:48 PM, Ben Hutchings wrote: [...] +static void quirk_iommu_g4x_gfx(struct pci_dev *dev) +{ Shouldn't __devinit be used here too, like for quirk_iommu_rwbf? It probably doesn't matter too much. especially on platforms with Intel IOMMU, but... it makes the code coherent. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Panic during interrupt handling while terminating hostapd
Hi, I've found yet another problem with (at least) 3.7.4 and 3.8-rc4. When terminating hostapd via SIGINT, this bug and panic came up: BUG: unable to handle kernel paging request at 001d8000 IP: [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0 PGD 21c3db067 PUD 0 Oops: [#1] SMP Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211 kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill CPU 2 Pid: 6972, comm: modprobe Tainted: GW3.7.4-OSS4.2 #3 /DQ45CB RIP: 0010:[<-ADDRESS>] [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0 RSP: 0018:-ADDRESS EFLAGS: 00010206 RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS FS: -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS CS: 0010 DS: ES: CR0: -ADDRESS CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS DR0: -ADDRESS CR1: -ADDRESS DR2: -ADDRESS DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS Process modprobe (pid: 6972, threadinfo -ADDRESS, task -ADDRESS) Stack: -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS Call Trace: [<-ADDRESS>] __d_alloc+0x2f/0x180 [<-ADDRESS>] d_alloc+0x13/0x70 [<-ADDRESS>] lookup_dcache+0xa3/0xd0 [<-ADDRESS>] ? path_get+0x26/0x40 [<-ADDRESS>] lookup_open+0x54/0x1c0 [<-ADDRESS>] do_last+0x319/0x830 [<-ADDRESS>] path_openat+0xae/0x4c0 [<-ADDRESS>] ? handle_mm_fault+0x210/0x2d0 [<-ADDRESS>] do_filp_open+0x3d/0xa0 [<-ADDRESS>] ? __alloc_fd+0x45/0x120 [<-ADDRESS>] do_sys_open+0xf9/0x1e0 [<-ADDRESS>] sys_openat+0xf/0x20 [<-ADDRESS>] system_call_fastpath+0x16/0x1b Code: 5d e0 4c 89 65 e8 49 8b 4d 00 65 48 03 0c 25 28 cd 00 00 48 8b 51 08 4c 8b 21 4d 85 e4 74 62 49 63 45 20 48 8d 4a 01 49 8b 7d 00 <49> 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c8 49 63 RIP [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0 RSP <-ADDRESS> CR2: -ADDRESS general protection fault: [#2] SMP Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211 kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill CPU 2 Pid: 0, comm: swapper/2 Tainted: G D W3.7.4-OSS4.2 #3 /DQ45CB RIP: 0010[<-ADDRESS>] [<-ADDRESS>] rcu_do_batch.isra.37+0x131/0x290 RSP: 0018:-ADDRESS EFLAGS: 00010212 RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS R13: -ADDRESS R14: -ADDRESS R15: -ADDRESS FS: -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS CS: 0010 DS: ES: CR0: -ADDRESS CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS DR0: -ADDRESS DR1: -ADDRESS DR2: -ADDRESS DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS Process swapper/2 (pid: 0, threadinfo -ADDRESS, task -ADDRESS) Stack: -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS Call Trace: [<-ADDRESS>] ? tick_program_event+0x1f/0x30 [<-ADDRESS>] __rcu_process_callbacks+0xaa/0x140 [<-ADDRESS>] rcu_process_callbacks+0x48/0x70 [<-ADDRESS>] __do_softirq+0xa8/0x150 [<-ADDRESS>] call_softirq+0x1c/0x30 [<-ADDRESS>] do_softirq+0x4d/0x80 [<-ADDRESS>] irq_exit+0x8e/0xb0 [<-ADDRESS>] do_IRQ+0x5e/0xd0 [<-ADDRESS>] common_interrupt+0x67/0x67 [<-ADDRESS>] ? acpi_idle_enter_simple+0xbd/0xf4 [<-ADDRESS>] ? acpi_idle_enter_simple+0xb8/0xf4 [<-ADDRESS>] acpi_idle_enter_bm+0xe1/0x24b [<-ADDRESS>] ? menu_select+0xe4/0x300 [<-ADDRESS>] cpuidle_enter+0x19/0x20 [<-ADDRESS>] cpuidle_idle_call+0x8b/0xf0 [<-ADDRESS>] cpu_idle+0xbf/0x110 [<-ADDRESS>] start_secondary+0xb3/0xb5 Code: b8 8b 92 ac 01 00 00 85 d2 75 2f 4d 85 ff 74 2a 4c 89 ff 48 8b 57 08 4c 8b 3f 48 81 fa ff 0f 00 00 41 0f 18 0f 76 ab 48 89 45 a8 d2 48 8b 45 a8 eb b4 0f 1f 80 00 00 00 00 48 89 c1 9c 41 5d RIP [<-ADDRESS>]
Panic during interrupt handling while terminating hostapd
Hi, I've found yet another problem with (at least) 3.7.4 and 3.8-rc4. When terminating hostapd via SIGINT, this bug and panic came up: BUG: unable to handle kernel paging request at 001d8000 IP: [-ADDRESS] kmem_cache_alloc+0x43/0xb0 PGD 21c3db067 PUD 0 Oops: [#1] SMP Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211 kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill CPU 2 Pid: 6972, comm: modprobe Tainted: GW3.7.4-OSS4.2 #3 /DQ45CB RIP: 0010:[-ADDRESS] [-ADDRESS] kmem_cache_alloc+0x43/0xb0 RSP: 0018:-ADDRESS EFLAGS: 00010206 RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS FS: -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS CS: 0010 DS: ES: CR0: -ADDRESS CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS DR0: -ADDRESS CR1: -ADDRESS DR2: -ADDRESS DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS Process modprobe (pid: 6972, threadinfo -ADDRESS, task -ADDRESS) Stack: -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS Call Trace: [-ADDRESS] __d_alloc+0x2f/0x180 [-ADDRESS] d_alloc+0x13/0x70 [-ADDRESS] lookup_dcache+0xa3/0xd0 [-ADDRESS] ? path_get+0x26/0x40 [-ADDRESS] lookup_open+0x54/0x1c0 [-ADDRESS] do_last+0x319/0x830 [-ADDRESS] path_openat+0xae/0x4c0 [-ADDRESS] ? handle_mm_fault+0x210/0x2d0 [-ADDRESS] do_filp_open+0x3d/0xa0 [-ADDRESS] ? __alloc_fd+0x45/0x120 [-ADDRESS] do_sys_open+0xf9/0x1e0 [-ADDRESS] sys_openat+0xf/0x20 [-ADDRESS] system_call_fastpath+0x16/0x1b Code: 5d e0 4c 89 65 e8 49 8b 4d 00 65 48 03 0c 25 28 cd 00 00 48 8b 51 08 4c 8b 21 4d 85 e4 74 62 49 63 45 20 48 8d 4a 01 49 8b 7d 00 49 8b 1c 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c8 49 63 RIP [-ADDRESS] kmem_cache_alloc+0x43/0xb0 RSP -ADDRESS CR2: -ADDRESS general protection fault: [#2] SMP Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211 kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill CPU 2 Pid: 0, comm: swapper/2 Tainted: G D W3.7.4-OSS4.2 #3 /DQ45CB RIP: 0010[-ADDRESS] [-ADDRESS] rcu_do_batch.isra.37+0x131/0x290 RSP: 0018:-ADDRESS EFLAGS: 00010212 RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS R13: -ADDRESS R14: -ADDRESS R15: -ADDRESS FS: -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS CS: 0010 DS: ES: CR0: -ADDRESS CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS DR0: -ADDRESS DR1: -ADDRESS DR2: -ADDRESS DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS Process swapper/2 (pid: 0, threadinfo -ADDRESS, task -ADDRESS) Stack: -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS -ADDRESS Call Trace: IRQ [-ADDRESS] ? tick_program_event+0x1f/0x30 [-ADDRESS] __rcu_process_callbacks+0xaa/0x140 [-ADDRESS] rcu_process_callbacks+0x48/0x70 [-ADDRESS] __do_softirq+0xa8/0x150 [-ADDRESS] call_softirq+0x1c/0x30 [-ADDRESS] do_softirq+0x4d/0x80 [-ADDRESS] irq_exit+0x8e/0xb0 [-ADDRESS] do_IRQ+0x5e/0xd0 [-ADDRESS] common_interrupt+0x67/0x67 EOI [-ADDRESS] ? acpi_idle_enter_simple+0xbd/0xf4 [-ADDRESS] ? acpi_idle_enter_simple+0xb8/0xf4 [-ADDRESS] acpi_idle_enter_bm+0xe1/0x24b [-ADDRESS] ? menu_select+0xe4/0x300 [-ADDRESS] cpuidle_enter+0x19/0x20 [-ADDRESS] cpuidle_idle_call+0x8b/0xf0 [-ADDRESS] cpu_idle+0xbf/0x110 [-ADDRESS] start_secondary+0xb3/0xb5 Code: b8 8b 92 ac 01 00 00 85 d2 75 2f 4d 85 ff 74 2a 4c 89 ff 48 8b 57 08 4c 8b 3f 48 81 fa ff 0f 00 00 41 0f 18 0f 76 ab 48 89 45 a8 ff d2 48 8b 45 a8 eb b4 0f 1f 80 00 00 00 00 48 89 c1 9c 41 5d RIP [-ADDRESS] rcu_do_batch.isra.37+0x131/0x290 RSP -ADDRESS Kernel panic - not syncing:
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 21.01.2013 07:11 PM, Mihai Moldovan wrote: > I'm also currently testing a kernel without the Intel IOMMU feature > [CONFIG_INTEL_IOMMU=n, but CONFIG_IOMMU_SUPPORT=y]. [...] At least > not seeing USB and PCI(e) issues. I'll leave the box running for some > more [time] [...] No freezes for >22h, seems to be fine. > [...] and will afterwards disable IOMMU as a whole to see if I hit > USB and PCI(e) issues again with that combination. The systems seems to run stable with CONFIG_IOMMU_SUPPORT=n set, too. This is expected. However: unlike during earlier tests when I disabled IOMMU and Intel IOMMU via kernel/boot parameters, I am not seeing any DMA mapping errors. There seems to be a difference between disabling IOMMU/Intel IOMMU statically in the kernel compared to disabling it via kernel parameter. Is this another bug? I've attached both kernel ring buffer logs (minus the timings for easier diffing.) [*] kern-new-iommu_off.log.bz2 disables IOMMU and Intel IOMMU via boot parameter [*] kern-iommu_static_off.log.bz2 has CONFIG_IOMMU_SUPPORT=n set and any IOMMU support statically disabled (also consequently DMAR) Mihai kern-new-iommu_off.log.bz2 Description: BZip2 compressed data kern-iommu_static_off.log.bz2 Description: BZip2 compressed data smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 21.01.2013 07:11 PM, Mihai Moldovan wrote: I'm also currently testing a kernel without the Intel IOMMU feature [CONFIG_INTEL_IOMMU=n, but CONFIG_IOMMU_SUPPORT=y]. [...] At least not seeing USB and PCI(e) issues. I'll leave the box running for some more [time] [...] No freezes for 22h, seems to be fine. [...] and will afterwards disable IOMMU as a whole to see if I hit USB and PCI(e) issues again with that combination. The systems seems to run stable with CONFIG_IOMMU_SUPPORT=n set, too. This is expected. However: unlike during earlier tests when I disabled IOMMU and Intel IOMMU via kernel/boot parameters, I am not seeing any DMA mapping errors. There seems to be a difference between disabling IOMMU/Intel IOMMU statically in the kernel compared to disabling it via kernel parameter. Is this another bug? I've attached both kernel ring buffer logs (minus the timings for easier diffing.) [*] kern-new-iommu_off.log.bz2 disables IOMMU and Intel IOMMU via boot parameter [*] kern-iommu_static_off.log.bz2 has CONFIG_IOMMU_SUPPORT=n set and any IOMMU support statically disabled (also consequently DMAR) Mihai kern-new-iommu_off.log.bz2 Description: BZip2 compressed data kern-iommu_static_off.log.bz2 Description: BZip2 compressed data smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 20.01.2013 11:49 PM, Daniel Vetter wrote: > Thanks for testing, I've just submitted the patch for review. It > should included in a -fixes tree soon and the get backported to > stable kernels. Thanks. :) > Please let me know when this works solidly for you, so that I can > put it into a real patch and also submit it for inclusion. No freeze for >24h, I guess we can conclude the quirk does indeed fix the random freeze issue as well. :) I'm all for inclusion. I'm also currently testing a kernel without the Intel IOMMU feature. This seems to work, too, but also disables Intel TXT and VT-d... At least not seeing USB and PCI(e) issues. I'll leave the box running for some more and will afterwards disable IOMMU as a whole to see if I hit USB and PCI(e) issues again with that combination. Best regards, Mihai [resending to include all previous CC's] smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 20.01.2013 11:49 PM, Daniel Vetter wrote: Thanks for testing, I've just submitted the patch for review. It should included in a -fixes tree soon and the get backported to stable kernels. Thanks. :) Please let me know when this works solidly for you, so that I can put it into a real patch and also submit it for inclusion. No freeze for 24h, I guess we can conclude the quirk does indeed fix the random freeze issue as well. :) I'm all for inclusion. I'm also currently testing a kernel without the Intel IOMMU feature. This seems to work, too, but also disables Intel TXT and VT-d... At least not seeing USB and PCI(e) issues. I'll leave the box running for some more and will afterwards disable IOMMU as a whole to see if I hit USB and PCI(e) issues again with that combination. Best regards, Mihai [resending to include all previous CC's] smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
Hi Daniel, the patch does work, i.e., it turns off DMAR for the graphics card and alleviates the freezes when loading i915/kms. However, still seeing random machine freezes with it (being consistent with the behavior I've experienced with intel_iommu=igfx_off). The patch + forcing RWBF is working, too. Interestingly, this version didn't randomly freeze yet, after more than 5 hours of uptime! I'll leave the box running until tomorrow to make sure I did stick around long enough. All those tested kernels were able to handle USB and PCI(e) devices. I still have to test turning off IOMMU in general and Intel IOMMU specifically. Will probably do this tomorrow. Thank you so far! :) Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
Hi Daniel, the patch does work, i.e., it turns off DMAR for the graphics card and alleviates the freezes when loading i915/kms. However, still seeing random machine freezes with it (being consistent with the behavior I've experienced with intel_iommu=igfx_off). The patch + forcing RWBF is working, too. Interestingly, this version didn't randomly freeze yet, after more than 5 hours of uptime! I'll leave the box running until tomorrow to make sure I did stick around long enough. All those tested kernels were able to handle USB and PCI(e) devices. I still have to test turning off IOMMU in general and Intel IOMMU specifically. Will probably do this tomorrow. Thank you so far! :) Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 05:13 PM, Mihai Moldovan wrote: > * On 19.01.2013 02:27 PM, Daniel Vetter wrote: >> You have a gen4.5 chipset which is known to be utterly broken for >> IOMMU+intel gpu. > Nice description for what I'm seeing. ;) > > After some more hours of uptime I'm inclined to say, that "intel_iommu=off > iommu=off" fixes my random freezes as well. > Alas, the USB and PCI(e) problems are still around, but I could test > recompiling > 3.7.2 with Intel IOMMU turned off completely in the kernel config. > Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even > *have* > support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those > problems on all previous versions. > > >> [...] and we've never added the proper >> quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a >> proposed patch to fix this (i.e. automatically set >> intel_iommu=igfx_off for affected platfroms). Testing highly welcome. > From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 > is > missing. > [...] Which of course will work, as 2e10 is my DRAM controller as reported by lspci, sorry. But, shouldn't the "DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2eXX, quirk_iommu_rwbf);" calls be rather " DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_g4x_gfx);" ? The current patch errors out on my while compiling as quirk_iommu_rwbf is not yet defined at that place. smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 02:27 PM, Daniel Vetter wrote: > You have a gen4.5 chipset which is known to be utterly broken for > IOMMU+intel gpu. Nice description for what I'm seeing. ;) After some more hours of uptime I'm inclined to say, that "intel_iommu=off iommu=off" fixes my random freezes as well. Alas, the USB and PCI(e) problems are still around, but I could test recompiling 3.7.2 with Intel IOMMU turned off completely in the kernel config. Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even *have* support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those problems on all previous versions. > [...] and we've never added the proper > quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a > proposed patch to fix this (i.e. automatically set > intel_iommu=igfx_off for affected platfroms). Testing highly welcome. From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 is missing. I'll add it to the relevant section. But even if it worked, I'd still have the "box freezes randomly" issue (mostly within 5 to 60 minutes of uptime). :( The only way to get rid of this is disabling Intel IOMMU as a whole via kernel parameters intel_iommu=off iommu=off. Anyway, I'll give it a try. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 02:27 PM, Daniel Vetter wrote: You have a gen4.5 chipset which is known to be utterly broken for IOMMU+intel gpu. Nice description for what I'm seeing. ;) After some more hours of uptime I'm inclined to say, that intel_iommu=off iommu=off fixes my random freezes as well. Alas, the USB and PCI(e) problems are still around, but I could test recompiling 3.7.2 with Intel IOMMU turned off completely in the kernel config. Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even *have* support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those problems on all previous versions. [...] and we've never added the proper quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a proposed patch to fix this (i.e. automatically set intel_iommu=igfx_off for affected platfroms). Testing highly welcome. From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 is missing. I'll add it to the relevant section. But even if it worked, I'd still have the box freezes randomly issue (mostly within 5 to 60 minutes of uptime). :( The only way to get rid of this is disabling Intel IOMMU as a whole via kernel parameters intel_iommu=off iommu=off. Anyway, I'll give it a try. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 05:13 PM, Mihai Moldovan wrote: * On 19.01.2013 02:27 PM, Daniel Vetter wrote: You have a gen4.5 chipset which is known to be utterly broken for IOMMU+intel gpu. Nice description for what I'm seeing. ;) After some more hours of uptime I'm inclined to say, that intel_iommu=off iommu=off fixes my random freezes as well. Alas, the USB and PCI(e) problems are still around, but I could test recompiling 3.7.2 with Intel IOMMU turned off completely in the kernel config. Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even *have* support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those problems on all previous versions. [...] and we've never added the proper quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a proposed patch to fix this (i.e. automatically set intel_iommu=igfx_off for affected platfroms). Testing highly welcome. From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 is missing. [...] Which of course will work, as 2e10 is my DRAM controller as reported by lspci, sorry. But, shouldn't the DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2eXX, quirk_iommu_rwbf); calls be rather DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_g4x_gfx); ? The current patch errors out on my while compiling as quirk_iommu_rwbf is not yet defined at that place. smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 12:48 AM, Mihai Moldovan wrote: > Testing further, I rebooted using iommu=off and intel_iommu=off. So far, I had > no random crashes, but the system uptime of REPLACEME minutes is too > small to draw conclusions yet. And by "REPLACEME", I meant 50 (minutes). That's embarrassing, sorry. smime.p7s Description: S/MIME Cryptographic Signature
Re: i915-related and general system freezes with specific kernel config // IOMMU
* On 19.01.2013 12:48 AM, Mihai Moldovan wrote: Testing further, I rebooted using iommu=off and intel_iommu=off. So far, I had no random crashes, but the system uptime of REPLACEME minutes is too small to draw conclusions yet. And by REPLACEME, I meant 50 (minutes). That's embarrassing, sorry. smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
Had another look at the code and would like to apologize for the confusion... * On 13.08.2012 05:27 PM, Mihai Moldovan wrote: > Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled > "disabled" and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled "reserved", > which > neither should be touched). Wrong. struct intel_gmbus gmbus[GMBUS_NUM_PORTS]; thus starting at 0 to GMBUS_NUM_PORTS-1, no more reserved or disabled ports. I have totally overlooked the definition, sorry. Ignore the rest of my comments and the patch, as they are based on false assumptions (gmbus still containing the disabled and reserved ports.) Instead, I'd like to ACK Jani's patch. The module can now be loaded fine, there's no null ptr dereference anymore and only some gmbus warnings show up, though this time only one message per port, so basically it's falling back to bit banging on all gmbus ports as it should: [ 14.722454] i915 :00:02.0: setting latency timer to 64 [ 14.796032] [drm] GMBUS [i915 gmbus ssc] timed out, falling back to bit banging on pin 1 [ 15.044039] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3 [ 15.420067] [drm] GMBUS [i915 gmbus dpd] timed out, falling back to bit banging on pin 6 [ 15.548121] i915 :00:02.0: irq 55 for MSI/MSI-X [ 15.842123] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on minor 0 Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
* On 13.08.2012 05:09 PM, Daniel Vetter wrote: > On Mon, Aug 13, 2012 at 05:03:24PM +0200, Mihai Moldovan wrote: >> Hi Jani, >> >> The reason sounds sane to me, but while looking through the code, I have >> seen a >> few other problems, too. >> >> To my understanding, we should use port for dev_priv->gmbus[], not the pin >> mapping (which is only used for gmbus_ports[]). >> Don't forget to add the +1 for pin -> port mapping to the error case. >> >> Also, intel_gmbus_get_adapter is already accepting a port value (I made sure >> to >> look at the calls in other files too), so don't map the port back to a pin. >> >> Keep the same in mind for the intel_teardown_gmbus "destructor". >> >> The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, >> which is >> known as "disabled" and shouldn't be used (previously has_gpio was set to >> false >> for those ports to not do any transfer on those ports.) >> >> I may be wrong, could you review this and maybe add it to your patch? > This seems to essentially undo > > commit 2ed06c93a1fce057808894d73167aae03c76deaf > Author: Daniel Kurtz > Date: Wed Mar 28 02:36:15 2012 +0800 > > drm/i915/intel_i2c: gmbus disabled and reserved ports are invalid > > Note that port numbers start at 1, whereas the array is 0-index based. So > you patch here would blow up if you don't extend the dev_priv->gmbus > array. Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled "disabled" and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled "reserved", which neither should be touched). Thus, in effect, it starts with 1 and ends with 6, but the current code does not take that into account, instead accessing elements from 0 onwards: The code currently would access *dev_priv->gmbus[0] in the first iteration, which is labeled as "disabled" and shouldn't be touched. Instead, we should do a pin->port mapping and access *dev_priv->gmbus[1, 2, 3 ... 6] instead (with *dev_priv->gmbus[7] left out, as it's marked as "reserved" and again shouldn't be touched.) However, accessing gmbus_ports[0] is fine, and we can then copy gmbus_ports[0].name to *dev_priv->gmbus[1]->adapter.name ^ pin ^ port Blowing up seems impossible too, as GMBUS_NUM_PORTS is #defined as END_PORT - BEGIN_PORT + 1 which will evaluate to 6 and be the last index used. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
Hi Jani, * On 13.08.2012 04:33 PM, Jani Nikula wrote: > Hi Mihai, could you test the following patch to see if it fixes the problem, > please? > > BR, > Jani. > > > Jani Nikula (1): > drm/i915: ensure i2c adapter is all set before adding it > > drivers/gpu/drm/i915/intel_i2c.c |7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > The reason sounds sane to me, but while looking through the code, I have seen a few other problems, too. To my understanding, we should use port for dev_priv->gmbus[], not the pin mapping (which is only used for gmbus_ports[]). Don't forget to add the +1 for pin -> port mapping to the error case. Also, intel_gmbus_get_adapter is already accepting a port value (I made sure to look at the calls in other files too), so don't map the port back to a pin. Keep the same in mind for the intel_teardown_gmbus "destructor". The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, which is known as "disabled" and shouldn't be used (previously has_gpio was set to false for those ports to not do any transfer on those ports.) I may be wrong, could you review this and maybe add it to your patch? diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index 1991a44..b725993 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -472,8 +474,8 @@ int intel_setup_gmbus(struct drm_device *dev) mutex_init(_priv->gmbus_mutex); for (i = 0; i < GMBUS_NUM_PORTS; i++) { - struct intel_gmbus *bus = _priv->gmbus[i]; u32 port = i + 1; /* +1 to map gmbus index to pin pair */ + struct intel_gmbus *bus = _priv->gmbus[port]; bus->adapter.owner = THIS_MODULE; bus->adapter.class = I2C_CLASS_DDC; @@ -506,7 +508,7 @@ int intel_setup_gmbus(struct drm_device *dev) err: while (--i) { - struct intel_gmbus *bus = _priv->gmbus[i]; + struct intel_gmbus *bus = _priv->gmbus[i + 1]; i2c_del_adapter(>adapter); } return ret; @@ -516,9 +518,8 @@ struct i2c_adapter *intel_gmbus_get_adapter(struct drm_i915_private *dev_priv, unsigned port) { WARN_ON(!intel_gmbus_is_port_valid(port)); - /* -1 to map pin pair to gmbus index */ return (intel_gmbus_is_port_valid(port)) ? - _priv->gmbus[port - 1].adapter : NULL; + _priv->gmbus[port].adapter : NULL; } void intel_gmbus_set_speed(struct i2c_adapter *adapter, int speed) @@ -543,8 +544,9 @@ void intel_teardown_gmbus(struct drm_device *dev) if (dev_priv->gmbus == NULL) return; +/* +1 to map gmbus index to pin pair */ for (i = 0; i < GMBUS_NUM_PORTS; i++) { - struct intel_gmbus *bus = _priv->gmbus[i]; + struct intel_gmbus *bus = _priv->gmbus[i + 1]; i2c_del_adapter(>adapter); } } smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
Hi Jani, * On 13.08.2012 04:33 PM, Jani Nikula wrote: Hi Mihai, could you test the following patch to see if it fixes the problem, please? BR, Jani. Jani Nikula (1): drm/i915: ensure i2c adapter is all set before adding it drivers/gpu/drm/i915/intel_i2c.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) The reason sounds sane to me, but while looking through the code, I have seen a few other problems, too. To my understanding, we should use port for dev_priv-gmbus[], not the pin mapping (which is only used for gmbus_ports[]). Don't forget to add the +1 for pin - port mapping to the error case. Also, intel_gmbus_get_adapter is already accepting a port value (I made sure to look at the calls in other files too), so don't map the port back to a pin. Keep the same in mind for the intel_teardown_gmbus destructor. The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, which is known as disabled and shouldn't be used (previously has_gpio was set to false for those ports to not do any transfer on those ports.) I may be wrong, could you review this and maybe add it to your patch? diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index 1991a44..b725993 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -472,8 +474,8 @@ int intel_setup_gmbus(struct drm_device *dev) mutex_init(dev_priv-gmbus_mutex); for (i = 0; i GMBUS_NUM_PORTS; i++) { - struct intel_gmbus *bus = dev_priv-gmbus[i]; u32 port = i + 1; /* +1 to map gmbus index to pin pair */ + struct intel_gmbus *bus = dev_priv-gmbus[port]; bus-adapter.owner = THIS_MODULE; bus-adapter.class = I2C_CLASS_DDC; @@ -506,7 +508,7 @@ int intel_setup_gmbus(struct drm_device *dev) err: while (--i) { - struct intel_gmbus *bus = dev_priv-gmbus[i]; + struct intel_gmbus *bus = dev_priv-gmbus[i + 1]; i2c_del_adapter(bus-adapter); } return ret; @@ -516,9 +518,8 @@ struct i2c_adapter *intel_gmbus_get_adapter(struct drm_i915_private *dev_priv, unsigned port) { WARN_ON(!intel_gmbus_is_port_valid(port)); - /* -1 to map pin pair to gmbus index */ return (intel_gmbus_is_port_valid(port)) ? - dev_priv-gmbus[port - 1].adapter : NULL; + dev_priv-gmbus[port].adapter : NULL; } void intel_gmbus_set_speed(struct i2c_adapter *adapter, int speed) @@ -543,8 +544,9 @@ void intel_teardown_gmbus(struct drm_device *dev) if (dev_priv-gmbus == NULL) return; +/* +1 to map gmbus index to pin pair */ for (i = 0; i GMBUS_NUM_PORTS; i++) { - struct intel_gmbus *bus = dev_priv-gmbus[i]; + struct intel_gmbus *bus = dev_priv-gmbus[i + 1]; i2c_del_adapter(bus-adapter); } } smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
* On 13.08.2012 05:09 PM, Daniel Vetter wrote: On Mon, Aug 13, 2012 at 05:03:24PM +0200, Mihai Moldovan wrote: Hi Jani, The reason sounds sane to me, but while looking through the code, I have seen a few other problems, too. To my understanding, we should use port for dev_priv-gmbus[], not the pin mapping (which is only used for gmbus_ports[]). Don't forget to add the +1 for pin - port mapping to the error case. Also, intel_gmbus_get_adapter is already accepting a port value (I made sure to look at the calls in other files too), so don't map the port back to a pin. Keep the same in mind for the intel_teardown_gmbus destructor. The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, which is known as disabled and shouldn't be used (previously has_gpio was set to false for those ports to not do any transfer on those ports.) I may be wrong, could you review this and maybe add it to your patch? This seems to essentially undo commit 2ed06c93a1fce057808894d73167aae03c76deaf Author: Daniel Kurtz djku...@chromium.org Date: Wed Mar 28 02:36:15 2012 +0800 drm/i915/intel_i2c: gmbus disabled and reserved ports are invalid Note that port numbers start at 1, whereas the array is 0-index based. So you patch here would blow up if you don't extend the dev_priv-gmbus array. Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled disabled and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled reserved, which neither should be touched). Thus, in effect, it starts with 1 and ends with 6, but the current code does not take that into account, instead accessing elements from 0 onwards: The code currently would access *dev_priv-gmbus[0] in the first iteration, which is labeled as disabled and shouldn't be touched. Instead, we should do a pin-port mapping and access *dev_priv-gmbus[1, 2, 3 ... 6] instead (with *dev_priv-gmbus[7] left out, as it's marked as reserved and again shouldn't be touched.) However, accessing gmbus_ports[0] is fine, and we can then copy gmbus_ports[0].name to *dev_priv-gmbus[1]-adapter.name ^ pin ^ port Blowing up seems impossible too, as GMBUS_NUM_PORTS is #defined as END_PORT - BEGIN_PORT + 1 which will evaluate to 6 and be the last index used. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load
Had another look at the code and would like to apologize for the confusion... * On 13.08.2012 05:27 PM, Mihai Moldovan wrote: Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled disabled and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled reserved, which neither should be touched). Wrong. struct intel_gmbus gmbus[GMBUS_NUM_PORTS]; thus starting at 0 to GMBUS_NUM_PORTS-1, no more reserved or disabled ports. I have totally overlooked the definition, sorry. Ignore the rest of my comments and the patch, as they are based on false assumptions (gmbus still containing the disabled and reserved ports.) Instead, I'd like to ACK Jani's patch. The module can now be loaded fine, there's no null ptr dereference anymore and only some gmbus warnings show up, though this time only one message per port, so basically it's falling back to bit banging on all gmbus ports as it should: [ 14.722454] i915 :00:02.0: setting latency timer to 64 [ 14.796032] [drm] GMBUS [i915 gmbus ssc] timed out, falling back to bit banging on pin 1 [ 15.044039] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit banging on pin 3 [ 15.420067] [drm] GMBUS [i915 gmbus dpd] timed out, falling back to bit banging on pin 6 [ 15.548121] i915 :00:02.0: irq 55 for MSI/MSI-X [ 15.842123] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on minor 0 Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: null pointer dereference while loading i915
* On 10.08.2012 07:44 PM, Mihai Moldovan wrote: > Hm, OK. > > Well, I'm done now. > > bisect log: > > git bisect start > # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2 > git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610 > # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5 > git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92 > # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of > git://oss.sgi.com/xfs/xfs > git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3 > # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4 > # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of > git://oss.sgi.com/xfs/xfs > git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10 > # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag > 'ktest-v3.5-spelling' of > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest > git bisect good 927ad551031798d4cba49766549600bbb33872d7 > # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input > git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9 > # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll->pll > mode work > git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a > # bad: [8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8] drm/i915: Unconditionally > initialise the interrupt workers > git bisect bad 8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8 > # bad: [f637fde434c9e3687798730c7ddd367e93666013] drm/i915: inline > enable/disable_irq into ring->get/put_irq > git bisect bad f637fde434c9e3687798730c7ddd367e93666013 > # bad: [23e3f9b37e7368ee8530ba99907508363feebc14] drm/i915: check for disabled > interrupts on ValleyView > git bisect bad 23e3f9b37e7368ee8530ba99907508363feebc14 > # good: [8489731c9bd22c27ab17a2190cd7444604abf95f] drm/i915: move clflushing > into shmem_pread > git bisect good 8489731c9bd22c27ab17a2190cd7444604abf95f > # good: [3bd7d90938f1fe77de5991dc4b727843c4980b2a] drm/i915/intel_i2c: > refactor > using intel_gmbus_get_adapter > git bisect good 3bd7d90938f1fe77de5991dc4b727843c4980b2a > # bad: [57f350b6722f9569f407872f6ead56e2d221d98a] drm/i915: add DPIO support > git bisect bad 57f350b6722f9569f407872f6ead56e2d221d98a > # bad: [93e537a10f2c8c0f2e74409b6cb473fc221758fa] drm/i915: split LVDS update > code out of i9xx_crtc_mode_set > git bisect bad 93e537a10f2c8c0f2e74409b6cb473fc221758fa > # bad: [f2c9677be3158c31ba19f527e2be0f7a519e19d1] drm/i915/intel_i2c: allocate > gmbus array as part of drm_i915_private > git bisect bad f2c9677be3158c31ba19f527e2be0f7a519e19d1 > # bad: [2ed06c93a1fce057808894d73167aae03c76deaf] drm/i915/intel_i2c: gmbus > disabled and reserved ports are invalid > git bisect bad 2ed06c93a1fce057808894d73167aae03c76deaf Just to be safe, I also tested git HEAD (3.6.0-rc1-00209-gf62bf17), no dice either. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: null pointer dereference while loading i915
* On 10.08.2012 06:39 PM, Daniel Vetter wrote: > On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan wrote: >> * On 10.08.2012 12:10 PM, Daniel Vetter wrote: >>> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan wrote: >>>> Hi Daniel, hi list >>>> >>>> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working >>>> fine), >>>> my box is crashing when loading the i915 driver (mode-setting enabled.) >>>> >>>> The current version I'm testing with is 3.5.0. >>>> >>>> I was able to get the BUG output (please forgive any errors/flips in the >>>> output, >>>> I have had to transcribe the messages from the screen/images), however, >>>> I'm not >>>> able to find out what's wrong. >>>> >>>> If I see it correctly, there's a null pointer dereference in a printk >>>> called >>>> from inside gmbus_xfer. The only printk calls I can see in >>>> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the >>>> DRM_DEBUG_KMS() and DRM_INFO() macros. >>>> Neither call looks wrong to me, I even tried to swap adapter->name with >>>> bus->adapter.name and make *sure* i < num is true, but haven't had any >>>> success. >>>> >>>> I'd really like to see this bug fixed, as it's preventing me from updating >>>> the >>>> kernel for over a year now. >>>> >>>> Also, while 3.0.2 works, it *does* spew error/warning messages related to >>>> gmbus >>>> and I've had corrupted VTs in the past (albeit after a long uptime with >>>> multiple >>>> X restarting and DVI cable unplugging/reattaching events), so maybe >>>> there's a >>>> lot more broken than "expected". >>> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0, >>> since exactly this issue might happen. We've re-enabled gmbus again on >>> 3.5 after having fixed this bug. Are you sure that this is plain 3.2 >>> you're running? >> Sorry, I messed up the version numbers. Started bisecting yesterday and >> noticed, >> that 3.0 up to 3.2 still work "fine" (see below), instead I've had another >> problem with 3.2 (completely lockup after the kernel is running for a few >> minutes, but I have no idea where this issue is coming from. Seems to be >> happening with 3.2.0 only, so... *shrug*) >> >> 3.0.2 => working, gmbus warnings as posted. >> 3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below) >> 3.2-rc2 to rc4 => working, NO gmbus warnings, but render errors (see below) >> --- (stopped bisecting 3.0 to 3.2 as this was pointless) --- >> --- (restarted bisecting with 3.2 to 3.5) --- >> 3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors >> (see below) >> 3.4.0-07487 => working, gmbus warnings, hang errors (see below) >> ... >> >> I've done more steps, but have not yet finished bisecting, so stay tuned. >> All those render errors look like that: >> >> [drm] capturing error event; look for more information in >> /debug/dri/0/i915_error_state >> render error detected, EIR: 0x0010 >> IPEIR: 0x >> IPEHR: 0x0200 >> INSTDONE: 0x >> INSTPS: 0x8001e025 >> INSTDONE1: 0xbfbb >> ACTHD: 0x00a4203c >> page table error >> PGTBL_ER: 0x0010 >> [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking >> >> I'll finish bisecting (and hope, that my guess was right, concerning the >> varaiant I wasn't able to build) and will post the bisect log when done. >> >> Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been >> enabled as I'm pretty sure I always saw those errors when booting (just >> confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34, >> 2.6.36.1, 3.1-..., 3.2-... though. > Yeah, we've enabled gmbus a few times and then disabled it again due > to bugs. Also, the usual debug messsage says gmbus even when gmbus > isn't on ... yeah, slightly confusing, but that should be fixed, too. Hm, OK. Well, I'm done now. bisect log: git bisect start # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2 git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610 # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5 git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92 # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
Re: null pointer dereference while loading i915
* On 10.08.2012 12:10 PM, Daniel Vetter wrote: > On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan wrote: >> Hi Daniel, hi list >> >> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working >> fine), >> my box is crashing when loading the i915 driver (mode-setting enabled.) >> >> The current version I'm testing with is 3.5.0. >> >> I was able to get the BUG output (please forgive any errors/flips in the >> output, >> I have had to transcribe the messages from the screen/images), however, I'm >> not >> able to find out what's wrong. >> >> If I see it correctly, there's a null pointer dereference in a printk called >> from inside gmbus_xfer. The only printk calls I can see in >> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the >> DRM_DEBUG_KMS() and DRM_INFO() macros. >> Neither call looks wrong to me, I even tried to swap adapter->name with >> bus->adapter.name and make *sure* i < num is true, but haven't had any >> success. >> >> I'd really like to see this bug fixed, as it's preventing me from updating >> the >> kernel for over a year now. >> >> Also, while 3.0.2 works, it *does* spew error/warning messages related to >> gmbus >> and I've had corrupted VTs in the past (albeit after a long uptime with >> multiple >> X restarting and DVI cable unplugging/reattaching events), so maybe there's a >> lot more broken than "expected". > > Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0, > since exactly this issue might happen. We've re-enabled gmbus again on > 3.5 after having fixed this bug. Are you sure that this is plain 3.2 > you're running? Sorry, I messed up the version numbers. Started bisecting yesterday and noticed, that 3.0 up to 3.2 still work "fine" (see below), instead I've had another problem with 3.2 (completely lockup after the kernel is running for a few minutes, but I have no idea where this issue is coming from. Seems to be happening with 3.2.0 only, so... *shrug*) 3.0.2 => working, gmbus warnings as posted. 3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below) 3.2-rc2 to rc4 => working, NO gmbus warnings, but render errors (see below) --- (stopped bisecting 3.0 to 3.2 as this was pointless) --- --- (restarted bisecting with 3.2 to 3.5) --- 3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors (see below) 3.4.0-07487 => working, gmbus warnings, hang errors (see below) ... I've done more steps, but have not yet finished bisecting, so stay tuned. All those render errors look like that: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state render error detected, EIR: 0x0010 IPEIR: 0x IPEHR: 0x0200 INSTDONE: 0x INSTPS: 0x8001e025 INSTDONE1: 0xbfbb ACTHD: 0x00a4203c page table error PGTBL_ER: 0x0010 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking I'll finish bisecting (and hope, that my guess was right, concerning the varaiant I wasn't able to build) and will post the bisect log when done. Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been enabled as I'm pretty sure I always saw those errors when booting (just confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34, 2.6.36.1, 3.1-..., 3.2-... though. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: null pointer dereference while loading i915
* On 10.08.2012 12:10 PM, Daniel Vetter wrote: On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan io...@ionic.de wrote: Hi Daniel, hi list ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine), my box is crashing when loading the i915 driver (mode-setting enabled.) The current version I'm testing with is 3.5.0. I was able to get the BUG output (please forgive any errors/flips in the output, I have had to transcribe the messages from the screen/images), however, I'm not able to find out what's wrong. If I see it correctly, there's a null pointer dereference in a printk called from inside gmbus_xfer. The only printk calls I can see in drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the DRM_DEBUG_KMS() and DRM_INFO() macros. Neither call looks wrong to me, I even tried to swap adapter-name with bus-adapter.name and make *sure* i num is true, but haven't had any success. I'd really like to see this bug fixed, as it's preventing me from updating the kernel for over a year now. Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus and I've had corrupted VTs in the past (albeit after a long uptime with multiple X restarting and DVI cable unplugging/reattaching events), so maybe there's a lot more broken than expected. Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0, since exactly this issue might happen. We've re-enabled gmbus again on 3.5 after having fixed this bug. Are you sure that this is plain 3.2 you're running? Sorry, I messed up the version numbers. Started bisecting yesterday and noticed, that 3.0 up to 3.2 still work fine (see below), instead I've had another problem with 3.2 (completely lockup after the kernel is running for a few minutes, but I have no idea where this issue is coming from. Seems to be happening with 3.2.0 only, so... *shrug*) 3.0.2 = working, gmbus warnings as posted. 3.1-09933/07170 = working, NO gmbus warnings, but render errors (see below) 3.2-rc2 to rc4 = working, NO gmbus warnings, but render errors (see below) --- (stopped bisecting 3.0 to 3.2 as this was pointless) --- --- (restarted bisecting with 3.2 to 3.5) --- 3.3.0-06109 = working, gmbus warnings just like with 3.0, render errors (see below) 3.4.0-07487 = working, gmbus warnings, hang errors (see below) ... I've done more steps, but have not yet finished bisecting, so stay tuned. All those render errors look like that: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state render error detected, EIR: 0x0010 IPEIR: 0x IPEHR: 0x0200 INSTDONE: 0x INSTPS: 0x8001e025 INSTDONE1: 0xbfbb ACTHD: 0x00a4203c page table error PGTBL_ER: 0x0010 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking I'll finish bisecting (and hope, that my guess was right, concerning the varaiant I wasn't able to build) and will post the bisect log when done. Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been enabled as I'm pretty sure I always saw those errors when booting (just confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34, 2.6.36.1, 3.1-..., 3.2-... though. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
Re: null pointer dereference while loading i915
* On 10.08.2012 06:39 PM, Daniel Vetter wrote: On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan io...@ionic.de wrote: * On 10.08.2012 12:10 PM, Daniel Vetter wrote: On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan io...@ionic.de wrote: Hi Daniel, hi list ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine), my box is crashing when loading the i915 driver (mode-setting enabled.) The current version I'm testing with is 3.5.0. I was able to get the BUG output (please forgive any errors/flips in the output, I have had to transcribe the messages from the screen/images), however, I'm not able to find out what's wrong. If I see it correctly, there's a null pointer dereference in a printk called from inside gmbus_xfer. The only printk calls I can see in drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the DRM_DEBUG_KMS() and DRM_INFO() macros. Neither call looks wrong to me, I even tried to swap adapter-name with bus-adapter.name and make *sure* i num is true, but haven't had any success. I'd really like to see this bug fixed, as it's preventing me from updating the kernel for over a year now. Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus and I've had corrupted VTs in the past (albeit after a long uptime with multiple X restarting and DVI cable unplugging/reattaching events), so maybe there's a lot more broken than expected. Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0, since exactly this issue might happen. We've re-enabled gmbus again on 3.5 after having fixed this bug. Are you sure that this is plain 3.2 you're running? Sorry, I messed up the version numbers. Started bisecting yesterday and noticed, that 3.0 up to 3.2 still work fine (see below), instead I've had another problem with 3.2 (completely lockup after the kernel is running for a few minutes, but I have no idea where this issue is coming from. Seems to be happening with 3.2.0 only, so... *shrug*) 3.0.2 = working, gmbus warnings as posted. 3.1-09933/07170 = working, NO gmbus warnings, but render errors (see below) 3.2-rc2 to rc4 = working, NO gmbus warnings, but render errors (see below) --- (stopped bisecting 3.0 to 3.2 as this was pointless) --- --- (restarted bisecting with 3.2 to 3.5) --- 3.3.0-06109 = working, gmbus warnings just like with 3.0, render errors (see below) 3.4.0-07487 = working, gmbus warnings, hang errors (see below) ... I've done more steps, but have not yet finished bisecting, so stay tuned. All those render errors look like that: [drm] capturing error event; look for more information in /debug/dri/0/i915_error_state render error detected, EIR: 0x0010 IPEIR: 0x IPEHR: 0x0200 INSTDONE: 0x INSTPS: 0x8001e025 INSTDONE1: 0xbfbb ACTHD: 0x00a4203c page table error PGTBL_ER: 0x0010 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking I'll finish bisecting (and hope, that my guess was right, concerning the varaiant I wasn't able to build) and will post the bisect log when done. Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been enabled as I'm pretty sure I always saw those errors when booting (just confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34, 2.6.36.1, 3.1-..., 3.2-... though. Yeah, we've enabled gmbus a few times and then disabled it again due to bugs. Also, the usual debug messsage says gmbus even when gmbus isn't on ... yeah, slightly confusing, but that should be fixed, too. Hm, OK. Well, I'm done now. bisect log: git bisect start # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2 git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610 # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5 git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92 # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3 # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4 # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10 # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag 'ktest-v3.5-spelling' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest git bisect good 927ad551031798d4cba49766549600bbb33872d7 # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9 # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll-pll mode work git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a # bad
Re: null pointer dereference while loading i915
* On 10.08.2012 07:44 PM, Mihai Moldovan wrote: Hm, OK. Well, I'm done now. bisect log: git bisect start # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2 git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610 # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5 git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92 # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3 # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4 # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10 # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag 'ktest-v3.5-spelling' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest git bisect good 927ad551031798d4cba49766549600bbb33872d7 # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9 # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll-pll mode work git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a # bad: [8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8] drm/i915: Unconditionally initialise the interrupt workers git bisect bad 8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8 # bad: [f637fde434c9e3687798730c7ddd367e93666013] drm/i915: inline enable/disable_irq into ring-get/put_irq git bisect bad f637fde434c9e3687798730c7ddd367e93666013 # bad: [23e3f9b37e7368ee8530ba99907508363feebc14] drm/i915: check for disabled interrupts on ValleyView git bisect bad 23e3f9b37e7368ee8530ba99907508363feebc14 # good: [8489731c9bd22c27ab17a2190cd7444604abf95f] drm/i915: move clflushing into shmem_pread git bisect good 8489731c9bd22c27ab17a2190cd7444604abf95f # good: [3bd7d90938f1fe77de5991dc4b727843c4980b2a] drm/i915/intel_i2c: refactor using intel_gmbus_get_adapter git bisect good 3bd7d90938f1fe77de5991dc4b727843c4980b2a # bad: [57f350b6722f9569f407872f6ead56e2d221d98a] drm/i915: add DPIO support git bisect bad 57f350b6722f9569f407872f6ead56e2d221d98a # bad: [93e537a10f2c8c0f2e74409b6cb473fc221758fa] drm/i915: split LVDS update code out of i9xx_crtc_mode_set git bisect bad 93e537a10f2c8c0f2e74409b6cb473fc221758fa # bad: [f2c9677be3158c31ba19f527e2be0f7a519e19d1] drm/i915/intel_i2c: allocate gmbus array as part of drm_i915_private git bisect bad f2c9677be3158c31ba19f527e2be0f7a519e19d1 # bad: [2ed06c93a1fce057808894d73167aae03c76deaf] drm/i915/intel_i2c: gmbus disabled and reserved ports are invalid git bisect bad 2ed06c93a1fce057808894d73167aae03c76deaf Just to be safe, I also tested git HEAD (3.6.0-rc1-00209-gf62bf17), no dice either. Best regards, Mihai smime.p7s Description: S/MIME Cryptographic Signature
null pointer dereference while loading i915
Hi Daniel, hi list ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine), my box is crashing when loading the i915 driver (mode-setting enabled.) The current version I'm testing with is 3.5.0. I was able to get the BUG output (please forgive any errors/flips in the output, I have had to transcribe the messages from the screen/images), however, I'm not able to find out what's wrong. If I see it correctly, there's a null pointer dereference in a printk called from inside gmbus_xfer. The only printk calls I can see in drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the DRM_DEBUG_KMS() and DRM_INFO() macros. Neither call looks wrong to me, I even tried to swap adapter->name with bus->adapter.name and make *sure* i < num is true, but haven't had any success. I'd really like to see this bug fixed, as it's preventing me from updating the kernel for over a year now. Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus and I've had corrupted VTs in the past (albeit after a long uptime with multiple X restarting and DVI cable unplugging/reattaching events), so maybe there's a lot more broken than "expected". PCI-IDs: 00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e12] (rev 03) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device [8086:1003] 00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e13] (rev 03) Subsystem: Intel Corporation Device [8086:1003] Messages are attached. Any help is appreciated, thanks. :) Best regards, Mihai i915_kernel_BUG_gmbus_nullptrderef.txt.bz2 Description: BZip2 compressed data i915_3.0.2_warning_messages.txt.bz2 Description: BZip2 compressed data smime.p7s Description: S/MIME Cryptographic Signature
null pointer dereference while loading i915
Hi Daniel, hi list ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine), my box is crashing when loading the i915 driver (mode-setting enabled.) The current version I'm testing with is 3.5.0. I was able to get the BUG output (please forgive any errors/flips in the output, I have had to transcribe the messages from the screen/images), however, I'm not able to find out what's wrong. If I see it correctly, there's a null pointer dereference in a printk called from inside gmbus_xfer. The only printk calls I can see in drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the DRM_DEBUG_KMS() and DRM_INFO() macros. Neither call looks wrong to me, I even tried to swap adapter-name with bus-adapter.name and make *sure* i num is true, but haven't had any success. I'd really like to see this bug fixed, as it's preventing me from updating the kernel for over a year now. Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus and I've had corrupted VTs in the past (albeit after a long uptime with multiple X restarting and DVI cable unplugging/reattaching events), so maybe there's a lot more broken than expected. PCI-IDs: 00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e12] (rev 03) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device [8086:1003] 00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset Integrated Graphics Controller [8086:2e13] (rev 03) Subsystem: Intel Corporation Device [8086:1003] Messages are attached. Any help is appreciated, thanks. :) Best regards, Mihai i915_kernel_BUG_gmbus_nullptrderef.txt.bz2 Description: BZip2 compressed data i915_3.0.2_warning_messages.txt.bz2 Description: BZip2 compressed data smime.p7s Description: S/MIME Cryptographic Signature