Re: [PATCH v3] kconfig: nconf: stop endless search loops

2021-04-16 Thread Mihai Moldovan
* On 4/16/21 7:40 AM, Masahiro Yamada wrote:
> Applied to linux-kbuild. Thanks.


Thank you for your review and input. :)



Mihai



[PATCH v3] kconfig: nconf: stop endless search loops

2021-04-15 Thread Mihai Moldovan
If the user selects the very first entry in a page and performs a
search-up operation, or selects the very last entry in a page and
performs a search-down operation that will not succeed (e.g., via
[/]asdfzzz[Up Arrow]), nconf will never terminate searching the page.

The reason is that in this case, the starting point will be set to -1
or n, which is then translated into (n - 1) (i.e., the last entry of
the page) or 0 (i.e., the first entry of the page) and finally the
search begins. This continues to work fine until the index reaches 0 or
(n - 1), at which point it will be decremented to -1 or incremented to
n, but not checked against the starting point right away. Instead, it's
wrapped around to the bottom or top again, after which the starting
point check occurs... and naturally fails.

My original implementation added another check for -1 before wrapping
the running index variable around, but Masahiro Yamada pointed out that
the actual issue is that the comparison point (starting point) exceeds
bounds (i.e., the [0,n-1] interval) in the first place and that,
instead, the starting point should be fixed.

This has the welcome side-effect of also fixing the case where the
starting point was n while searching down, which also lead to an
infinite loop.

OTOH, this code is now essentially all his work.

Amazingly, nobody seems to have been hit by this for 11 years - or at
the very least nobody bothered to debug and fix this.

Signed-off-by: Mihai Moldovan 
---
v2: swap constant in comparison to right side, as requested by
Randy Dunlap 
v3: reimplement as suggested by Masahiro Yamada ,
which has the side-effect of also fixing endless looping in the
symmetric down-direction

 scripts/kconfig/nconf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c
index e0f965529166..af814b39b876 100644
--- a/scripts/kconfig/nconf.c
+++ b/scripts/kconfig/nconf.c
@@ -504,8 +504,8 @@ static int get_mext_match(const char *match_str, match_f 
flag)
else if (flag == FIND_NEXT_MATCH_UP)
--match_start;
 
+   match_start = (match_start + items_num) % items_num;
index = match_start;
-   index = (index + items_num) % items_num;
while (true) {
char *str = k_menu_items[index].str;
if (strcasestr(str, match_str) != NULL)
-- 
2.30.1



Re: [PATCH v2] kconfig: nconf: stop endless search-up loops

2021-04-10 Thread Mihai Moldovan
* On 4/10/21 7:47 AM, Masahiro Yamada wrote:
> On Sun, Mar 28, 2021 at 6:52 PM Mihai Moldovan  wrote:
>> +   if ((index == -1) && (index == match_start))
>> +   return -1;
> 
> We know 'index' is -1 in the second comparison.
> So, you can also write like this:
> 
>if (match_start == -1 && index == -1)
> return -1;

I know, but I sided for the other form for semantic reasons - this more closely
directly describes what we actually care about (both being the same value and
either one being -1).


> But, it is not the correct fix, either.
> 
> The root cause of the bug is match_start
> becoming -1.
> 
> 
> The following is the correct way to fix the bug
> without increasing the number of lines.
> 
> 
> 
> diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c
> index e0f965529166..af814b39b876 100644
> [...]
> +   match_start = (match_start + items_num) % items_num;
> index = match_start;
> -   index = (index + items_num) % items_num;

This is probably more elegant and fixes two issues at the same time: match_start
becoming -1 or n (which is likewise invalid, but was implicitly handled through
the remainder operation).

No objections from my side.



Mihai


[PATCH v2] kconfig: nconf: stop endless search-up loops

2021-03-28 Thread Mihai Moldovan
If the user selects the very first entry in a page and performs a
search-up operation (e.g., via [/][a][Up Arrow]), nconf will never
terminate searching the page.

The reason is that in this case, the starting point will be set to -1,
which is then translated into (n - 1) (i.e., the last entry of the
page) and finally the search begins. This continues to work fine until
the index reaches 0, at which point it will be decremented to -1, but
not checked against the starting point right away. Instead, it's
wrapped around to the bottom again, after which the starting point
check occurs... and naturally fails.

We can easily avoid it by checking against the starting point directly
if the current index is -1 (which should be safe, since it's the only
magic value that can occur) and terminate the matching function.

Amazingly, nobody seems to have been hit by this for 11 years - or at
the very least nobody bothered to debug and fix this.

Signed-off-by: Mihai Moldovan 
---
v2: swap constant in comparison to right side, as requested by
Randy Dunlap 

 scripts/kconfig/nconf.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c
index e0f965529166..db0dc46bc5ee 100644
--- a/scripts/kconfig/nconf.c
+++ b/scripts/kconfig/nconf.c
@@ -515,6 +515,15 @@ static int get_mext_match(const char *match_str, match_f 
flag)
--index;
else
++index;
+   /*
+* It's fine for index to become negative - think of an
+* initial value for match_start of 0 with a match direction
+* of up, eventually making it -1.
+*
+* Handle this as a special case.
+*/
+   if ((index == -1) && (index == match_start))
+   return -1;
index = (index + items_num) % items_num;
if (index == match_start)
return -1;
-- 
2.30.1



Re: [PATCH] kconfig: nconf: stop endless search-up loops

2021-03-28 Thread Mihai Moldovan
* On 3/27/21 11:26 PM, Randy Dunlap wrote:
> There is a test for it in checkpatch.pl but I also used checkpatch.pl
> without it complaining, so I don't know what it takes to make the script
> complain.
> 
>   if ($lead !~ /(?:$Operators|\.)\s*$/ &&
>   $to !~ /^(?:Constant|[A-Z_][A-Z0-9_]*)$/ &&
>   WARN("CONSTANT_COMPARISON",
>"Comparisons should place the constant on the 
> right side of the test\n" . $herecurr) &&

There are multiple issues, err, "challenges" there:
  - literal "Constant" instead of "$Constant"
  - the left part is misinterpreted as an operation due to the minus sign
(arithmetic operator)
  - $Constant is defined as "qr{$Float|$Binary|$Octal|$Hex|$Int}" (which is
okay), but all these types do not include their negative range.

As far as I can tell, the latter is intentional. Making these types compatible
with negative values causes a lot of other places to break, so I'm not keen on
changing this.

The minus sign being misinterpreted as an operator is highly difficult to fix
in a sane manner. The original intention was to avoid misinterpreting
expressions like (var - CONSTANT real_op ...) as a constant-on-left expression
(and, more importantly, to not misfix it when --fix is given).

The general idea is sane and we probably shouldn't change that, but it would
be good to handle negative values as well.

At first, I was thinking of overriding this detection by checking if the
leading part matches "(-\s*$", which should only be true for negative values,
assuming that there is always an opening parenthesis as part of a conditional
statement/loop (if, while). After playing around with this and composing this
message for a few hours, it dawned on me that there can easily be free-
standing forms (for instance as part of for loops or assignment lines), so
that wouldn't cut it.

It really goes downhill from here.

I assume that false negatives are nuisances due to stylistic errors in the
code, but false positives actually harmful since a lot of them make code
review by maintainers very tedious.

So far, minus signs were always part of the leading capture group. I'd
actually like to have them in the constant capture group instead:

-   $line =~ 
/^\+(.*)\b($Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) {
+   $line =~ 
/^\+(.*)(-?\s*$Constant|[A-Z_][A-Z0-9_]*)\s*($Compare)\s*($LvalOrFunc)/) {

With that sorted, the next best thing I could come up with to weed out
preceding variables was this (which shouldn't influence non-negative
constants):

-   if ($lead !~ /(?:$Operators|\.)\s*$/ &&
+   if ($lead !~ /(?:$Operators|\.|[a-z])\s*$/ &&


There still are a lot of expressions that won't match this, like
"-1 + 0 == var" (i.e., "CONSTANT  CONSTANT  ...") or
constellations like a simple "(CONSTANT)  ..." (e.g.,
"(1) == var").

This is all fuzzy and getting this right would involve moving away from
trying to make sense of C code with regular expressions in Perl, but actually
parsing it to extract the semantics. Not exactly something I'd like to do...

Thoughts on my workaround for this issue? Did I miss anything crucial or
introduce a new bug inadvertently?


Re: [PATCH] kconfig: nconf: stop endless search-up loops

2021-03-27 Thread Mihai Moldovan
* On 3/27/21 4:58 PM, Randy Dunlap wrote:
> On 3/27/21 5:01 AM, Mihai Moldovan wrote:
>> +if ((-1 == index) && (index == match_start))
> 
> checkpatch doesn't complain about this (and I wonder how it's missed), but
> kernel style is (mostly) "constant goes on right hand side of comparison",
> so
>   if ((index == -1) &&

I can naturally send a V2 with that swapped.

To my rationale: I made sure to use checkpatch, saw that it was accepted and
even went for a quick git grep -- '-1 ==', which likewise returned enough
results for me to call this consistent with the current code style.

Maybe those matches were just frowned-upon, but forgotten-to-be-critized
examples of this pattern being used.



Mihai




OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] kconfig: nconf: stop endless search-up loops

2021-03-27 Thread Mihai Moldovan
If the user selects the very first entry in a page and performs a
search-up operation (e.g., via [/][a][Up Arrow]), nconf will never
terminate searching the page.

The reason is that in this case, the starting point will be set to -1,
which is then translated into (n - 1) (i.e., the last entry of the
page) and finally the search begins. This continues to work fine until
the index reaches 0, at which point it will be decremented to -1, but
not checked against the starting point right away. Instead, it's
wrapped around to the bottom again, after which the starting point
check occurs... and naturally fails.

We can easily avoid it by checking against the starting point directly
if the current index is -1 (which should be safe, since it's the only
magic value that can occur) and terminate the matching function.

Amazingly, nobody seems to have been hit by this for 11 years - or at
the very least nobody bothered to debug and fix this.

Signed-off-by: Mihai Moldovan 
---
 scripts/kconfig/nconf.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c
index e0f965529166..92a5403d8afa 100644
--- a/scripts/kconfig/nconf.c
+++ b/scripts/kconfig/nconf.c
@@ -515,6 +515,15 @@ static int get_mext_match(const char *match_str, match_f 
flag)
--index;
else
++index;
+   /*
+* It's fine for index to become negative - think of an
+* initial value for match_start of 0 with a match direction
+* of up, eventually making it -1.
+*
+* Handle this as a special case.
+*/
+   if ((-1 == index) && (index == match_start))
+   return -1;
index = (index + items_num) % items_num;
if (index == match_start)
return -1;
-- 
2.30.1



Re: Userspace woes with 5.1.5 due to TIPC

2019-05-30 Thread Mihai Moldovan
* On 5/30/19 9:51 PM, Jon Maloy wrote:
> Make sure the following three commits are present in TIPC *after* the 
> offending commit:
> 
> commit 532b0f7ece4c "tipc: fix modprobe tipc failed after switch order of 
> device registration"

This *is* the offending commit, as far as I understand. Merely rebased in
linux-stable, and hence having a different SHA, but mentioning the original SHA
(i.e., 532b0f7ece4c) in its commit message.


> Since that patch one was flawed it had to be reverted:
> commit 5593530e5694  ""Revert tipc: fix modprobe tipc failed after switch 
> order of device registration"
> 
> It was then replaced with this one: 
> commit 526f5b851a96 "tipc: fix modprobe tipc failed after switch order of 
> device registration"

Okay, these two are not part of 5.1.5. I've backported them (and only these two)
to 5.1.5 and the issue(s) seem to be gone. Definitely something that should be
backported to/included in 5.1.6.


Thanks for pointing all that out! Unfortunately I didn't add anything useful but
noise, since you obviously already knew, that this commit was broken. I'd urge
Greg to release a new stable version including the fixes soon, if possible,
though, for not being able to start/use userspace browsers sounds like a pretty
bad regression to me.



Mihai



signature.asc
Description: OpenPGP digital signature


Userspace woes with 5.1.5 due to TIPC

2019-05-30 Thread Mihai Moldovan
Hi


I've had a few issues lately (mainly bad RAM only, hopefully, which should be
fixed now) and generally upgraded everything.

With 5.1.5, though, some programs exhibited very weird behavior: Chromium
crashed while starting up due to not being able to launch a new zygote process,
albeit started when using --no-sandbox (likely because that didn't try to create
other processes); Opera (based upon Chromium) failed to start with SIGILL, but
that was only a red herring triggered by the same problem I guess; Firefox
started up, but was unable to render any content because its multi-process IPC
didn't work (i.e., it couldn't start new rendering processes). Interestingly,
most other programs I use daily still worked, even though they used networking
and IPC (command-line browsers, MATE Terminal, electron-based programs), so this
bug didn't make the machine completely unusable.

Since I've been using 5.1.3 without problems before and the issue was
straight-forward to test for, I did a bisection run and came to that conclusion:

 bisect log 
Bisecting: 124 revisions left to test after this (roughly 7 steps)
[ee4c3e283f8f3286bea60e9038adc70436d87d02] s390/mm: convert to the generic
get_user_pages_fast code
Bisecting: 62 revisions left to test after this (roughly 6 steps)
[f7346dc0634cbad7fca5d951b91ad2e13f497b0b] clk: mediatek: Disable tuner_en
before change PLL rate
Bisecting: 30 revisions left to test after this (roughly 5 steps)
[5ac8e698528149bb1618111d64e22bd8bb784256] parisc: Allow live-patching of
__meminit functions
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[c89c9af998fef2af4e5b2b35fb723693f17e05ef] mlxsw: core: Prevent QSFP module
initialization for old hardware
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[912d8c4cf9f19c93dfdf06b822eeadec9d71494d] net: test nouarg before dereferencing
zerocopy pointers
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[92166190b8282d9925e90a66961879782c50d037] rtnetlink: always put IFLA_LINK for
links with a link-netnsid
Bisecting: 1 revision left to test after this (roughly 1 step)
[7d29c9ad0ed525c1b10e29cfca4fb1eece1e93fb] vsock/virtio: free packets during the
socket release
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[2d08f204328acaf85ac2c6fe5d5d9d4760f12e13] tipc: fix modprobe tipc failed after
switch order of device registration
2d08f204328acaf85ac2c6fe5d5d9d4760f12e13 is the first bad commit
commit 2d08f204328acaf85ac2c6fe5d5d9d4760f12e13
Author: Junwei Hu 
Date:   Fri May 17 19:27:34 2019 +0800

tipc: fix modprobe tipc failed after switch order of device registration

[ Upstream commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e ]

Error message printed:
modprobe: ERROR: could not insert 'tipc': Address family not
supported by protocol.
when modprobe tipc after the following patch: switch order of
device registration, commit 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")

Because sock_create_kern(net, AF_TIPC, ...) is called by
tipc_topsrv_create_listener() in the initialization process
of tipc_net_ops, tipc_socket_init() must be execute before that.

I move tipc_socket_init() into function tipc_init_net().

Fixes: 7e27e8d6130c
("tipc: switch order of device registration to fix a crash")
Signed-off-by: Junwei Hu 
Reported-by: Wang Wang 
Reviewed-by: Kang Zhou 
Reviewed-by: Suanming Mou 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 

:04 04 13d9b014338ccf6ae0c32bdb2be779870bbf97da
df8a9c2a9f1f8df212999c2904632a77adb03782 M  net
=== / bisect log ===

My kernel config is tailored to my machine, so probably not very useful to
others, but I'm including it anyway. The most obvious point being CONFIG_TIPC=y,
i.e., TIPC being built statically into the kernel. Not sure why I've done that
in the first place because TIPC is not something that would be useful to me, but
I often err on the "might be useful later" side. I might rethink that decision
and just disable TIPC for good in the future.


With this patch applied, the kernel generally spews out a few wonky messages
that I've never seen before. For completeness sake, I've attached a ring buffer
log from running the last working and first bad version.

 TIPC messages 
NET: Registered protocol family 30
Failed to register TIPC socket type
=== / TIPC messages ===


Now, blindly reverting the patch would obviously a bad idea, since that would
mean trading one regression for the (initial) other one. I'm thus CCing the
maintainers to help.



Mihai


config-5.1.3.xz
Description: application/xz


dmesg-5.1.4-00013-g2d08f204328a.log.xz
Description: application/xz



Re: NULL pointer dereference in netfilter

2014-05-10 Thread Mihai Moldovan
Actually, may I be seeing just another incarnation of
http://www.spinics.net/lists/netfilter-devel/msg31134.html?

If so, applying https://lkml.org/lkml/2014/3/27/294 seems appropriate.

Could anybody please confirm this?



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


NULL pointer dereference in netfilter

2014-05-10 Thread Mihai Moldovan
Hi

earlier today, I experienced a kernel panic due to a NULL pointer dereference
somewhere in the netfilter subsystem.

Full kernel output (may contain typos):

[360412.114033] BUG: unable to handle kernel NULL pointer dereference at
0010
[360412.115643] IP: [] nf_nat_setup_info+0x56e/0x900
[360412.117244] PGD: 0
[360412.117337] Oops: 0002 [#3] SMP
[360412.117337] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211
cfg80211 xt_conntrack xt_dscp kvm_intel kvm hfcsusb mISDN_core e1000e cp210x
i915 rfkil ptp video pps_core drm_kms_helper backlight [last unloaded: cfg80211]
[360412.117337] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G   DO
3.14.2-OSS4.2 #2
[360412.117337] Hardware name:  /DQ45CB, BIOS
CBQ4510H.86A.0133.2011.0810.1010 08/10/2011
[360412.117337] task: 8802321c5540 ti: 8802321f4000 task.ti:
8802321f4
[360412.117337] RIP: 0010:[]  []
nf_nat_setup_info+0x56e/0x900
[360412.117337] RSP: 0018:88023bd03668   EFLAGS: 10246
[360412.117337] RAX:  RBX: 8800b073d380 RCX: 
0ae3d87f
[360412.117337] RDX: 88021cdc9800 RSI: b8061897 RDI: 
824808b8
[360412.117337] RBP: 88023bd03748 R08: 88003773e000 R09: 
820ac780
[360412.117337] R10: 88021cdc9800 R11: 88021cdc98e0 R12: 
235d
[360412.117337] R13:  R14: 88023bd03698 R15: 
88023bd036c0
[360412.117337] FS:  () GS:88023bd0()
knlGS:
[360412.117337] CS:  0010 DS:  ES:  CR0: 8005003b
[360412.117337] CR2: 0010 CR3: 0200b000 CR4: 
000407e0
[360412.117337] Stack:
[360412.117337]  820ac780 81d905b0 88023bd036c0 
820ac780
[360412.117337]  81d964e0 81d906a0 df8e782a 

[360412.117337]  8343b75500027f96  0006bb06 
8343b755
[360412.117337] Call Trace:
[360412.117337]  
[360412.117337]  [] xt_snat_target_v0+0x6f/0x90
[360412.117337]  [] ipt_do_table+0x2c3/0x6c0
[360412.117337]  [] ? ipt_do_table+0x326/0x6c0
[360412.117337]  [] nf_nat_ipv6_fn+0x1d7/0x330
[360412.117337]  [] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [] nf_nat_ipv4_out+0x58/0x100
[360412.117337]  [] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [] nf_iterate+0x85/0xb0
[360412.117337]  [] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [] nf_hook_slow+0x6c/0x130
[360412.117337]  [] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [] ip_output+0x82/0x90
[360412.117337]  [] ip_local_out+0x24/0x30
[360412.117337]  [] reject_tg+0x4d2/0x4e0
[360412.117337]  [] ipt_do_table+0x2c3/0x6c0
[360412.117337]  [] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [] iptable_filter_hook+0x34/0x70
[360412.117337]  [] nf_iterate+0x85/0xb0
[360412.117337]  [] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [] nf_hook_slow+0x6c/0x130
[360412.117337]  [] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [] ip_local_deliver+0x73/0x80
[360412.117337]  [] ip_rcv_finish+0x83/0x360
[360412.117337]  [] ip_rcv+0x2a8/0x3e0
[360412.117337]  [] __netif_receive_skb_core+0x632/0x7a0
[360412.117337]  [] __netif_receive_skb+0x1c/0x70
[360412.117337]  [] process_backlog+0x9c/0x170
[360412.117337]  [] net_rx_action+0xfb/0x1a0
[360412.117337]  [] __do_softirq+0xd5/0x1f0
[360412.117337]  [] irq_exit+0x95/0xa0
[360412.117337]  [] do_IRQ+0x62/0x110
[360412.117337]  [] common_interrupt_0x67/0x67
[360412.117337]  
[360412.117337]  [] ? cpuidle_enter_state+0x56/0xd0
[360412.117337]  [] ? cpuidle_enter_state+0x52/0xd0
[360412.117337]  [] cpuidle_idle_call+0x9a/0x140
[360412.117337]  [] arch_cpu_idle+0x9/0x20
[360412.117337]  [] cpu_startup_entry+0xda/0x1c0
[360412.117337]  [] start_secondary+0x20d/0x2c0
[360412.117337] Code: e0 e8 a7 a9 1b 00 48 8b 93 e0 00 00 00 49 c1 ec 20 48 85
d2 74 0c 0f b6 42 11 84 c0 0f 85 93 02 00 00 31 c0 4c 8b 8d 38 ff ff ff <48> 89
58 10 49 8b 91 70 0b 00 00 4a 8d 14 e2 48 8b 0a 48 89 50
[360412.117337] RIP  [] nf_nat_setup_info+0x56e/0x900
[360412.117337]  RSP 
[360412.117337] CR2: 0010
[360412.117337] - - -[ end trace 691638412d73c338 ]- - -
[360412.117337] Kernel panic - not syncing: Fatal exception in interrupt
[360412.117337] Kernel Offset: 0x0 from 0x8100 (relocation range:
0x8000-0x9fff)
[360412.117337] drm_kms_helper: panic occurred, switching back to text console


decodecode:

All code

   0:e0 e8loopne 0xffea
   2:a7   cmpsl  %es:(%rdi),%ds:(%rsi)
   3:a9 1b 00 48 8b   test   $0x8b48001b,%eax
   8:93   xchg   %eax,%ebx
   9:e0 00loopne 0xb
   b:00 00add%al,(%rax)
   d:49 c1 ec 20  shr$0x20,%r12
  11:48 85 d2 test   %rdx,%rdx
  14:74 0cje 0x22
  16:0f b6 42 11   

NULL pointer dereference in netfilter

2014-05-10 Thread Mihai Moldovan
Hi

earlier today, I experienced a kernel panic due to a NULL pointer dereference
somewhere in the netfilter subsystem.

Full kernel output (may contain typos):

[360412.114033] BUG: unable to handle kernel NULL pointer dereference at
0010
[360412.115643] IP: [81865efe] nf_nat_setup_info+0x56e/0x900
[360412.117244] PGD: 0
[360412.117337] Oops: 0002 [#3] SMP
[360412.117337] Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211
cfg80211 xt_conntrack xt_dscp kvm_intel kvm hfcsusb mISDN_core e1000e cp210x
i915 rfkil ptp video pps_core drm_kms_helper backlight [last unloaded: cfg80211]
[360412.117337] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G   DO
3.14.2-OSS4.2 #2
[360412.117337] Hardware name:  /DQ45CB, BIOS
CBQ4510H.86A.0133.2011.0810.1010 08/10/2011
[360412.117337] task: 8802321c5540 ti: 8802321f4000 task.ti:
8802321f4
[360412.117337] RIP: 0010:[81865efe]  [81865efe]
nf_nat_setup_info+0x56e/0x900
[360412.117337] RSP: 0018:88023bd03668   EFLAGS: 10246
[360412.117337] RAX:  RBX: 8800b073d380 RCX: 
0ae3d87f
[360412.117337] RDX: 88021cdc9800 RSI: b8061897 RDI: 
824808b8
[360412.117337] RBP: 88023bd03748 R08: 88003773e000 R09: 
820ac780
[360412.117337] R10: 88021cdc9800 R11: 88021cdc98e0 R12: 
235d
[360412.117337] R13:  R14: 88023bd03698 R15: 
88023bd036c0
[360412.117337] FS:  () GS:88023bd0()
knlGS:
[360412.117337] CS:  0010 DS:  ES:  CR0: 8005003b
[360412.117337] CR2: 0010 CR3: 0200b000 CR4: 
000407e0
[360412.117337] Stack:
[360412.117337]  820ac780 81d905b0 88023bd036c0 
820ac780
[360412.117337]  81d964e0 81d906a0 df8e782a 

[360412.117337]  8343b75500027f96  0006bb06 
8343b755
[360412.117337] Call Trace:
[360412.117337]  IRQ
[360412.117337]  [81874e9f] xt_snat_target_v0+0x6f/0x90
[360412.117337]  [818e0453] ipt_do_table+0x2c3/0x6c0
[360412.117337]  [818e04b6] ? ipt_do_table+0x326/0x6c0
[360412.117337]  [818e0d07] nf_nat_ipv6_fn+0x1d7/0x330
[360412.117337]  [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [818e1068] nf_nat_ipv4_out+0x58/0x100
[360412.117337]  [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [81846b75] nf_iterate+0x85/0xb0
[360412.117337]  [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [81846c0c] nf_hook_slow+0x6c/0x130
[360412.117337]  [81888e20] ? __ip_append_data.isra.43+0xa30/0xa30
[360412.117337]  [81889bb2] ip_output+0x82/0x90
[360412.117337]  [81889314] ip_local_out+0x24/0x30
[360412.117337]  [818e2182] reject_tg+0x4d2/0x4e0
[360412.117337]  [818e0453] ipt_do_table+0x2c3/0x6c0
[360412.117337]  [81883f30] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [818e0924] iptable_filter_hook+0x34/0x70
[360412.117337]  [81846b75] nf_iterate+0x85/0xb0
[360412.117337]  [81883f30] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [81846c0c] nf_hook_slow+0x6c/0x130
[360412.117337]  [81883f30] ? ip_rcv_finish+0x360/0x360
[360412.117337]  [81884303] ip_local_deliver+0x73/0x80
[360412.117337]  [81883c53] ip_rcv_finish+0x83/0x360
[360412.117337]  [818845b8] ip_rcv+0x2a8/0x3e0
[360412.117337]  [817e7bb2] __netif_receive_skb_core+0x632/0x7a0
[360412.117337]  [817e7d3c] __netif_receive_skb+0x1c/0x70
[360412.117337]  [817e7e2c] process_backlog+0x9c/0x170
[360412.117337]  [817e823b] net_rx_action+0xfb/0x1a0
[360412.117337]  [810c3e65] __do_softirq+0xd5/0x1f0
[360412.117337]  [810c4185] irq_exit+0x95/0xa0
[360412.117337]  [81003d82] do_IRQ+0x62/0x110
[360412.117337]  [81a20d67] common_interrupt_0x67/0x67
[360412.117337]  EOI
[360412.117337]  [81791ce6] ? cpuidle_enter_state+0x56/0xd0
[360412.117337]  [81791ce2] ? cpuidle_enter_state+0x52/0xd0
[360412.117337]  [81791dfa] cpuidle_idle_call+0x9a/0x140
[360412.117337]  [8100afe9] arch_cpu_idle+0x9/0x20
[360412.117337]  [8110a81a] cpu_startup_entry+0xda/0x1c0
[360412.117337]  [8102a1ad] start_secondary+0x20d/0x2c0
[360412.117337] Code: e0 e8 a7 a9 1b 00 48 8b 93 e0 00 00 00 49 c1 ec 20 48 85
d2 74 0c 0f b6 42 11 84 c0 0f 85 93 02 00 00 31 c0 4c 8b 8d 38 ff ff ff 48 89
58 10 49 8b 91 70 0b 00 00 4a 8d 14 e2 48 8b 0a 48 89 50
[360412.117337] RIP  [81865efe] nf_nat_setup_info+0x56e/0x900
[360412.117337]  RSP 88023bd03668
[360412.117337] CR2: 0010
[360412.117337] - - -[ end trace 691638412d73c338 ]- - -
[360412.117337] Kernel panic - not syncing: Fatal exception in interrupt
[360412.117337] Kernel Offset: 0x0 from 

Re: NULL pointer dereference in netfilter

2014-05-10 Thread Mihai Moldovan
Actually, may I be seeing just another incarnation of
http://www.spinics.net/lists/netfilter-devel/msg31134.html?

If so, applying https://lkml.org/lkml/2014/3/27/294 seems appropriate.

Could anybody please confirm this?



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: Oops (NULL ptr deref) while loading some module

2013-07-14 Thread Mihai Moldovan
* On 15.07.2013 01:54 AM, Mihai Moldovan wrote:
> This is obviously happening while booting and udev is loading *some* module, 
> but
> I have no idea which module is affected as such.

Quick correction: actually, at that time udev hasn't even started. udev is being
started by my initramfs one good second later, so at the time of those Oopses,
the root fs wasn't even mounted yet. Maybe the initramfs, but I'm not too sure
either.

[4.769188] dracut: dracut-029
[4.789227] systemd-udevd[1984]: starting version 204

What is the kernel trying to modprobe? Off what location, exactly?

It can't be /, as that isn't even mounted yet.

The initramfs? Maybe, but this has NO modules packed up whatsoever. I just
double-checked.

root@valery/tmp/foo# ls lib/modules/3.10.1-OSS4.2-dirty
modules.alias  modules.alias.bin  modules.builtin  modules.builtin.bin 
modules.dep  modules.dep.bin  modules.devname  modules.order  modules.softdep 
modules.symbols  modules.symbols.bin

The initramfs is solely used for assembling the RAID arrays when booting and
does not include any modules.

I just upgraded to 3.10.1 and am still seeing this.

Interesting issue, isn't it? :)



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Oops (NULL ptr deref) while loading some module

2013-07-14 Thread Mihai Moldovan
Hi all,

I'm seeing following oopses when booting up my kernel:

[3.173479] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[3.173602] IP: [] futex_wake+0x74/0x130
[3.173679] PGD 231d65067 PUD 231d64067 PMD 0
[3.173783] Oops:  [#1] SMP
[3.173870] Modules linked in:
[3.173936] CPU 0
[3.173959] Pid: 615, comm: modprobe Not tainted 3.9.6-OSS4.2-dirty
#34  /DQ45CB
[3.174091] RIP: 0010:[]  []
futex_wake+0x74/0x130
[3.174195] RSP: 0018:8802311dbda8  EFLAGS: 00010246
[3.174249] RAX:  RBX:  RCX: 7f125139
[3.174306] RDX:  RSI: 3c28288f RDI: 8222ee70
[3.174363] RBP: 8802311dbe08 R08: efa13b63 R09: 
[3.174420] R10:  R11: 0202 R12: 8222ee70
[3.174477] R13:  R14: 8222ee78 R15: 
[3.174535] FS:  7ff44c2a3700() GS:88023bc0()
knlGS:
[3.174620] CS:  0010 DS:  ES:  CR0: 80050033
[3.174676] CR2:  CR3: 000231d61000 CR4: 000407f0
[3.174734] DR0:  DR1:  DR2: 
[3.174791] DR3:  DR6: 0ff0 DR7: 0400
[3.174849] Process modprobe (pid: 615, threadinfo 8802311da000, task
880231e272c0)
[3.174935] Stack:
[3.174984]  880231d62a10 00010001 07f8 
7fff78d0a000
[3.175139]  8802311e8000 091c 8802311dbdf8 

[3.175293]   0001 7fff78d0a91c 
0001
[3.175447] Call Trace:
[3.175499]  [] do_futex+0x100/0xab0
[3.17]  [] ? __do_page_fault+0x244/0x4e0
[3.175611]  [] ? mntput+0x21/0x30
[3.175666]  [] ? __fput+0x16b/0x240
[3.175721]  [] sys_futex+0x88/0x180
[3.175775]  [] ? do_page_fault+0x9/0x10
[3.175830]  [] system_call_fastpath+0x16/0x1b
[3.175886] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9
ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 <48> 8b 18
48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f
[3.176678] RIP  [] futex_wake+0x74/0x130
[3.176678]  RSP 
[3.176678] CR2: 
[3.177366] ---[ end trace 7213d911e494c10b ]---
[3.177823] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[3.177944] IP: [] futex_wake+0x74/0x130
[3.178017] PGD 2311f4067 PUD 2311f5067 PMD 0
[3.178122] Oops:  [#2] SMP
[3.178207] Modules linked in:
[3.178274] CPU 0
[3.178296] Pid: 617, comm: modprobe Tainted: G  D 
3.9.6-OSS4.2-dirty #34  /DQ45CB
[3.178428] RIP: 0010:[]  []
futex_wake+0x74/0x130
[3.178531] RSP: 0018:880231213da8  EFLAGS: 00010246
[3.178585] RAX:  RBX:  RCX: 006a3b48
[3.178643] RDX:  RSI: 1d796f0a RDI: 8222ec60
[3.178700] RBP: 880231213e08 R08: cbc14f19 R09: 
[3.178758] R10:  R11: 0202 R12: 8222ec60
[3.178816] R13:  R14: 8222ec68 R15: 
[3.178873] FS:  7f5baf639700() GS:88023bc0()
knlGS:
[3.178958] CS:  0010 DS:  ES:  CR0: 80050033
[3.179013] CR2:  CR3: 0002311f7000 CR4: 000407f0
[3.179071] DR0:  DR1:  DR2: 
[3.179128] DR3:  DR6: 0ff0 DR7: 0400
[3.179185] Process modprobe (pid: 617, threadinfo 880231212000, task
880231e26540)
[3.179270] Stack:
[3.179318]  8802311f3a10 00010001 07f0 
7fff80ed6000
[3.179472]  8802311e8340 082c 880231213df8 

[3.179626]   0001 7fff80ed682c 
0001
[3.179780] Call Trace:
[3.179829]  [] do_futex+0x100/0xab0
[3.179884]  [] ? __do_page_fault+0x244/0x4e0
[3.179940]  [] ? mntput+0x21/0x30
[3.179994]  [] ? __fput+0x16b/0x240
[3.180071]  [] sys_futex+0x88/0x180
[3.180126]  [] ? do_page_fault+0x9/0x10
[3.180183]  [] system_call_fastpath+0x16/0x1b
[3.180238] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9
ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 <48> 8b 18
48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f
[3.180892] RIP  [] futex_wake+0x74/0x130
[3.180892]  RSP 
[3.180892] CR2: 
[3.181699] ---[ end trace 7213d911e494c10c ]---

This is obviously happening while booting and udev is loading *some* module, but
I have no idea which module is affected as such.

Luckily, my module list is quite concise:
Module  

Oops (NULL ptr deref) while loading some module

2013-07-14 Thread Mihai Moldovan
Hi all,

I'm seeing following oopses when booting up my kernel:

[3.173479] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[3.173602] IP: [810d2f54] futex_wake+0x74/0x130
[3.173679] PGD 231d65067 PUD 231d64067 PMD 0
[3.173783] Oops:  [#1] SMP
[3.173870] Modules linked in:
[3.173936] CPU 0
[3.173959] Pid: 615, comm: modprobe Not tainted 3.9.6-OSS4.2-dirty
#34  /DQ45CB
[3.174091] RIP: 0010:[810d2f54]  [810d2f54]
futex_wake+0x74/0x130
[3.174195] RSP: 0018:8802311dbda8  EFLAGS: 00010246
[3.174249] RAX:  RBX:  RCX: 7f125139
[3.174306] RDX:  RSI: 3c28288f RDI: 8222ee70
[3.174363] RBP: 8802311dbe08 R08: efa13b63 R09: 
[3.174420] R10:  R11: 0202 R12: 8222ee70
[3.174477] R13:  R14: 8222ee78 R15: 
[3.174535] FS:  7ff44c2a3700() GS:88023bc0()
knlGS:
[3.174620] CS:  0010 DS:  ES:  CR0: 80050033
[3.174676] CR2:  CR3: 000231d61000 CR4: 000407f0
[3.174734] DR0:  DR1:  DR2: 
[3.174791] DR3:  DR6: 0ff0 DR7: 0400
[3.174849] Process modprobe (pid: 615, threadinfo 8802311da000, task
880231e272c0)
[3.174935] Stack:
[3.174984]  880231d62a10 00010001 07f8 
7fff78d0a000
[3.175139]  8802311e8000 091c 8802311dbdf8 

[3.175293]   0001 7fff78d0a91c 
0001
[3.175447] Call Trace:
[3.175499]  [810d4d40] do_futex+0x100/0xab0
[3.17]  [819772d4] ? __do_page_fault+0x244/0x4e0
[3.175611]  [811806f1] ? mntput+0x21/0x30
[3.175666]  [81164c7b] ? __fput+0x16b/0x240
[3.175721]  [810d5778] sys_futex+0x88/0x180
[3.175775]  [81977579] ? do_page_fault+0x9/0x10
[3.175830]  [8197a252] system_call_fastpath+0x16/0x1b
[3.175886] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9
ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 48 8b 18
48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f
[3.176678] RIP  [810d2f54] futex_wake+0x74/0x130
[3.176678]  RSP 8802311dbda8
[3.176678] CR2: 
[3.177366] ---[ end trace 7213d911e494c10b ]---
[3.177823] BUG: unable to handle kernel NULL pointer dereference
at   (null)
[3.177944] IP: [810d2f54] futex_wake+0x74/0x130
[3.178017] PGD 2311f4067 PUD 2311f5067 PMD 0
[3.178122] Oops:  [#2] SMP
[3.178207] Modules linked in:
[3.178274] CPU 0
[3.178296] Pid: 617, comm: modprobe Tainted: G  D 
3.9.6-OSS4.2-dirty #34  /DQ45CB
[3.178428] RIP: 0010:[810d2f54]  [810d2f54]
futex_wake+0x74/0x130
[3.178531] RSP: 0018:880231213da8  EFLAGS: 00010246
[3.178585] RAX:  RBX:  RCX: 006a3b48
[3.178643] RDX:  RSI: 1d796f0a RDI: 8222ec60
[3.178700] RBP: 880231213e08 R08: cbc14f19 R09: 
[3.178758] R10:  R11: 0202 R12: 8222ec60
[3.178816] R13:  R14: 8222ec68 R15: 
[3.178873] FS:  7f5baf639700() GS:88023bc0()
knlGS:
[3.178958] CS:  0010 DS:  ES:  CR0: 80050033
[3.179013] CR2:  CR3: 0002311f7000 CR4: 000407f0
[3.179071] DR0:  DR1:  DR2: 
[3.179128] DR3:  DR6: 0ff0 DR7: 0400
[3.179185] Process modprobe (pid: 617, threadinfo 880231212000, task
880231e26540)
[3.179270] Stack:
[3.179318]  8802311f3a10 00010001 07f0 
7fff80ed6000
[3.179472]  8802311e8340 082c 880231213df8 

[3.179626]   0001 7fff80ed682c 
0001
[3.179780] Call Trace:
[3.179829]  [810d4d40] do_futex+0x100/0xab0
[3.179884]  [819772d4] ? __do_page_fault+0x244/0x4e0
[3.179940]  [811806f1] ? mntput+0x21/0x30
[3.179994]  [81164c7b] ? __fput+0x16b/0x240
[3.180071]  [810d5778] sys_futex+0x88/0x180
[3.180126]  [81977579] ? do_page_fault+0x9/0x10
[3.180183]  [8197a252] system_call_fastpath+0x16/0x1b
[3.180238] Code: ff ff 85 c0 41 89 c7 0f 85 b0 00 00 00 48 8d 7d b8 e8 61 f9
ff ff 49 89 c4 48 89 c7 e8 46 0d 8a 00 49 8b 44 24 08 4d 8d 74 24 08 48 8b 18
48 8d 78 e8 48 83 eb 18 49 39 c6 75 23 eb 6a 66 2e 0f
[

Re: Oops (NULL ptr deref) while loading some module

2013-07-14 Thread Mihai Moldovan
* On 15.07.2013 01:54 AM, Mihai Moldovan wrote:
 This is obviously happening while booting and udev is loading *some* module, 
 but
 I have no idea which module is affected as such.

Quick correction: actually, at that time udev hasn't even started. udev is being
started by my initramfs one good second later, so at the time of those Oopses,
the root fs wasn't even mounted yet. Maybe the initramfs, but I'm not too sure
either.

[4.769188] dracut: dracut-029
[4.789227] systemd-udevd[1984]: starting version 204

What is the kernel trying to modprobe? Off what location, exactly?

It can't be /, as that isn't even mounted yet.

The initramfs? Maybe, but this has NO modules packed up whatsoever. I just
double-checked.

root@valery/tmp/foo# ls lib/modules/3.10.1-OSS4.2-dirty
modules.alias  modules.alias.bin  modules.builtin  modules.builtin.bin 
modules.dep  modules.dep.bin  modules.devname  modules.order  modules.softdep 
modules.symbols  modules.symbols.bin

The initramfs is solely used for assembling the RAID arrays when booting and
does not include any modules.

I just upgraded to 3.10.1 and am still seeing this.

Interesting issue, isn't it? :)



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ 092/128] iommu/intel: disable DMAR for g4x integrated gfx

2013-02-03 Thread Mihai Moldovan
* On 03.02.2013 03:48 PM, Ben Hutchings wrote:
> [...]
> +static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
> +{
Shouldn't __devinit be used here too, like for quirk_iommu_rwbf?

It probably doesn't matter too much. especially on platforms with Intel IOMMU,
but... it makes the code coherent.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [ 092/128] iommu/intel: disable DMAR for g4x integrated gfx

2013-02-03 Thread Mihai Moldovan
* On 03.02.2013 03:48 PM, Ben Hutchings wrote:
 [...]
 +static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
 +{
Shouldn't __devinit be used here too, like for quirk_iommu_rwbf?

It probably doesn't matter too much. especially on platforms with Intel IOMMU,
but... it makes the code coherent.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Panic during interrupt handling while terminating hostapd

2013-01-27 Thread Mihai Moldovan
Hi,

I've found yet another problem with (at least) 3.7.4 and 3.8-rc4.

When terminating hostapd via SIGINT, this bug and panic came up:


BUG: unable to handle kernel paging request at 001d8000
IP: [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
PGD 21c3db067 PUD 0
Oops:  [#1] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 6972, comm: modprobe Tainted: GW3.7.4-OSS4.2
#3  /DQ45CB
RIP: 0010:[<-ADDRESS>]  [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
RSP: 0018:-ADDRESS  EFLAGS: 00010206
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS CR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process modprobe (pid: 6972, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 [<-ADDRESS>] __d_alloc+0x2f/0x180
 [<-ADDRESS>] d_alloc+0x13/0x70
 [<-ADDRESS>] lookup_dcache+0xa3/0xd0
 [<-ADDRESS>] ? path_get+0x26/0x40
 [<-ADDRESS>] lookup_open+0x54/0x1c0
 [<-ADDRESS>] do_last+0x319/0x830
 [<-ADDRESS>] path_openat+0xae/0x4c0
 [<-ADDRESS>] ? handle_mm_fault+0x210/0x2d0
 [<-ADDRESS>] do_filp_open+0x3d/0xa0
 [<-ADDRESS>] ? __alloc_fd+0x45/0x120
 [<-ADDRESS>] do_sys_open+0xf9/0x1e0
 [<-ADDRESS>] sys_openat+0xf/0x20
 [<-ADDRESS>] system_call_fastpath+0x16/0x1b
Code: 5d e0 4c 89 65 e8 49 8b 4d 00 65 48 03 0c 25 28 cd 00 00 48 8b 51 08 4c 8b
21 4d 85 e4 74 62 49 63 45 20 48 8d 4a 01 49 8b 7d 00 <49> 8b 1c
 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c8 49 63
RIP  [<-ADDRESS>] kmem_cache_alloc+0x43/0xb0
 RSP <-ADDRESS>
CR2: -ADDRESS
general protection fault:  [#2] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 0, comm: swapper/2 Tainted: G  D W3.7.4-OSS4.2 #3 
/DQ45CB
RIP: 0010[<-ADDRESS>]  [<-ADDRESS>] 
rcu_do_batch.isra.37+0x131/0x290
RSP: 0018:-ADDRESS  EFLAGS: 00010212
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
R13: -ADDRESS R14: -ADDRESS R15: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS DR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process swapper/2 (pid: 0, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 
 [<-ADDRESS>] ? tick_program_event+0x1f/0x30
 [<-ADDRESS>] __rcu_process_callbacks+0xaa/0x140
 [<-ADDRESS>] rcu_process_callbacks+0x48/0x70
 [<-ADDRESS>] __do_softirq+0xa8/0x150
 [<-ADDRESS>] call_softirq+0x1c/0x30
 [<-ADDRESS>] do_softirq+0x4d/0x80
 [<-ADDRESS>] irq_exit+0x8e/0xb0
 [<-ADDRESS>] do_IRQ+0x5e/0xd0
 [<-ADDRESS>] common_interrupt+0x67/0x67
 
 [<-ADDRESS>] ? acpi_idle_enter_simple+0xbd/0xf4
 [<-ADDRESS>] ? acpi_idle_enter_simple+0xb8/0xf4
 [<-ADDRESS>] acpi_idle_enter_bm+0xe1/0x24b
 [<-ADDRESS>] ? menu_select+0xe4/0x300
 [<-ADDRESS>] cpuidle_enter+0x19/0x20
 [<-ADDRESS>] cpuidle_idle_call+0x8b/0xf0
 [<-ADDRESS>] cpu_idle+0xbf/0x110
 [<-ADDRESS>] start_secondary+0xb3/0xb5
Code: b8 8b 92 ac 01 00 00 85 d2 75 2f 4d 85 ff 74 2a 4c 89 ff 48 8b 57 08 4c 8b
3f 48 81 fa ff 0f 00 00 41 0f 18 0f 76 ab 48 89 45 a8  d2 48
 8b 45 a8 eb b4 0f 1f 80 00 00 00 00 48 89 c1 9c 41 5d
RIP [<-ADDRESS>] 

Panic during interrupt handling while terminating hostapd

2013-01-27 Thread Mihai Moldovan
Hi,

I've found yet another problem with (at least) 3.7.4 and 3.8-rc4.

When terminating hostapd via SIGINT, this bug and panic came up:


BUG: unable to handle kernel paging request at 001d8000
IP: [-ADDRESS] kmem_cache_alloc+0x43/0xb0
PGD 21c3db067 PUD 0
Oops:  [#1] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 6972, comm: modprobe Tainted: GW3.7.4-OSS4.2
#3  /DQ45CB
RIP: 0010:[-ADDRESS]  [-ADDRESS] kmem_cache_alloc+0x43/0xb0
RSP: 0018:-ADDRESS  EFLAGS: 00010206
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS CR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process modprobe (pid: 6972, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 [-ADDRESS] __d_alloc+0x2f/0x180
 [-ADDRESS] d_alloc+0x13/0x70
 [-ADDRESS] lookup_dcache+0xa3/0xd0
 [-ADDRESS] ? path_get+0x26/0x40
 [-ADDRESS] lookup_open+0x54/0x1c0
 [-ADDRESS] do_last+0x319/0x830
 [-ADDRESS] path_openat+0xae/0x4c0
 [-ADDRESS] ? handle_mm_fault+0x210/0x2d0
 [-ADDRESS] do_filp_open+0x3d/0xa0
 [-ADDRESS] ? __alloc_fd+0x45/0x120
 [-ADDRESS] do_sys_open+0xf9/0x1e0
 [-ADDRESS] sys_openat+0xf/0x20
 [-ADDRESS] system_call_fastpath+0x16/0x1b
Code: 5d e0 4c 89 65 e8 49 8b 4d 00 65 48 03 0c 25 28 cd 00 00 48 8b 51 08 4c 8b
21 4d 85 e4 74 62 49 63 45 20 48 8d 4a 01 49 8b 7d 00 49 8b 1c
 04 4c 89 e0 65 48 0f c7 0f 0f 94 c0 84 c0 74 c8 49 63
RIP  [-ADDRESS] kmem_cache_alloc+0x43/0xb0
 RSP -ADDRESS
CR2: -ADDRESS
general protection fault:  [#2] SMP
Modules linked in: xt_conntrack xt_dscp i915 ath9k drm_kms_helper mac80211
kvm_intel video ath9k_common ath9k_hw kvm e1000e ath backlight cfg80211 rfkill
CPU 2
Pid: 0, comm: swapper/2 Tainted: G  D W3.7.4-OSS4.2 #3 
/DQ45CB
RIP: 0010[-ADDRESS]  [-ADDRESS] 
rcu_do_batch.isra.37+0x131/0x290
RSP: 0018:-ADDRESS  EFLAGS: 00010212
RAX: -ADDRESS RBX: -ADDRESS RCX: -ADDRESS
RDX: -ADDRESS RSI: -ADDRESS RDI: -ADDRESS
RBP: -ADDRESS R08: -ADDRESS R09: -ADDRESS
R10: -ADDRESS R11: -ADDRESS R12: -ADDRESS
R13: -ADDRESS R14: -ADDRESS R15: -ADDRESS
FS:  -ADDRESS() GS:-ADDRESS() knlGS:-ADDRESS
CS:  0010 DS:  ES:  CR0: -ADDRESS
CR2: -ADDRESS CR3: -ADDRESS CR4: -ADDRESS
DR0: -ADDRESS DR1: -ADDRESS DR2: -ADDRESS
DR3: -ADDRESS DR6: -ADDRESS DR7: -ADDRESS
Process swapper/2 (pid: 0, threadinfo -ADDRESS, task -ADDRESS)
Stack:
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
 -ADDRESS -ADDRESS -ADDRESS -ADDRESS
Call Trace:
 IRQ
 [-ADDRESS] ? tick_program_event+0x1f/0x30
 [-ADDRESS] __rcu_process_callbacks+0xaa/0x140
 [-ADDRESS] rcu_process_callbacks+0x48/0x70
 [-ADDRESS] __do_softirq+0xa8/0x150
 [-ADDRESS] call_softirq+0x1c/0x30
 [-ADDRESS] do_softirq+0x4d/0x80
 [-ADDRESS] irq_exit+0x8e/0xb0
 [-ADDRESS] do_IRQ+0x5e/0xd0
 [-ADDRESS] common_interrupt+0x67/0x67
 EOI
 [-ADDRESS] ? acpi_idle_enter_simple+0xbd/0xf4
 [-ADDRESS] ? acpi_idle_enter_simple+0xb8/0xf4
 [-ADDRESS] acpi_idle_enter_bm+0xe1/0x24b
 [-ADDRESS] ? menu_select+0xe4/0x300
 [-ADDRESS] cpuidle_enter+0x19/0x20
 [-ADDRESS] cpuidle_idle_call+0x8b/0xf0
 [-ADDRESS] cpu_idle+0xbf/0x110
 [-ADDRESS] start_secondary+0xb3/0xb5
Code: b8 8b 92 ac 01 00 00 85 d2 75 2f 4d 85 ff 74 2a 4c 89 ff 48 8b 57 08 4c 8b
3f 48 81 fa ff 0f 00 00 41 0f 18 0f 76 ab 48 89 45 a8 ff d2 48
 8b 45 a8 eb b4 0f 1f 80 00 00 00 00 48 89 c1 9c 41 5d
RIP [-ADDRESS] rcu_do_batch.isra.37+0x131/0x290
RSP -ADDRESS
Kernel panic - not syncing: 

Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-22 Thread Mihai Moldovan
* On 21.01.2013 07:11 PM, Mihai Moldovan wrote:
> I'm also currently testing a kernel without the Intel IOMMU feature
> [CONFIG_INTEL_IOMMU=n, but CONFIG_IOMMU_SUPPORT=y]. [...] At least
> not seeing USB and PCI(e) issues. I'll leave the box running for some
> more [time] [...]

No freezes for >22h, seems to be fine.


> [...] and will afterwards disable IOMMU as a whole to see if I hit
> USB and PCI(e) issues again with that combination.

The systems seems to run stable with CONFIG_IOMMU_SUPPORT=n set, too. This is
expected.
However: unlike during earlier tests when I disabled IOMMU and Intel IOMMU via
kernel/boot parameters, I am not seeing any DMA mapping errors.

There seems to be a difference between disabling IOMMU/Intel IOMMU statically in
the kernel compared to disabling it via kernel parameter. Is this another bug?

I've attached both kernel ring buffer logs (minus the timings for easier 
diffing.)

  [*] kern-new-iommu_off.log.bz2 disables IOMMU and Intel IOMMU via boot 
parameter
  [*] kern-iommu_static_off.log.bz2 has CONFIG_IOMMU_SUPPORT=n set and any IOMMU
support statically disabled (also consequently DMAR)



Mihai



kern-new-iommu_off.log.bz2
Description: BZip2 compressed data


kern-iommu_static_off.log.bz2
Description: BZip2 compressed data


smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-22 Thread Mihai Moldovan
* On 21.01.2013 07:11 PM, Mihai Moldovan wrote:
 I'm also currently testing a kernel without the Intel IOMMU feature
 [CONFIG_INTEL_IOMMU=n, but CONFIG_IOMMU_SUPPORT=y]. [...] At least
 not seeing USB and PCI(e) issues. I'll leave the box running for some
 more [time] [...]

No freezes for 22h, seems to be fine.


 [...] and will afterwards disable IOMMU as a whole to see if I hit
 USB and PCI(e) issues again with that combination.

The systems seems to run stable with CONFIG_IOMMU_SUPPORT=n set, too. This is
expected.
However: unlike during earlier tests when I disabled IOMMU and Intel IOMMU via
kernel/boot parameters, I am not seeing any DMA mapping errors.

There seems to be a difference between disabling IOMMU/Intel IOMMU statically in
the kernel compared to disabling it via kernel parameter. Is this another bug?

I've attached both kernel ring buffer logs (minus the timings for easier 
diffing.)

  [*] kern-new-iommu_off.log.bz2 disables IOMMU and Intel IOMMU via boot 
parameter
  [*] kern-iommu_static_off.log.bz2 has CONFIG_IOMMU_SUPPORT=n set and any IOMMU
support statically disabled (also consequently DMAR)



Mihai



kern-new-iommu_off.log.bz2
Description: BZip2 compressed data


kern-iommu_static_off.log.bz2
Description: BZip2 compressed data


smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-21 Thread Mihai Moldovan
* On 20.01.2013 11:49 PM, Daniel Vetter wrote:
> Thanks for testing, I've just submitted the patch for review. It
> should included in a -fixes tree soon and the get backported to
> stable kernels.

Thanks. :)


> Please let me know when this works solidly for you, so that I can
> put it into a real patch and also submit it for inclusion.

No freeze for >24h, I guess we can conclude the quirk does indeed
fix the random freeze issue as well. :)
I'm all for inclusion.


I'm also currently testing a kernel without the Intel IOMMU feature. This seems
to work, too, but also disables Intel TXT and VT-d...
At least not seeing USB and PCI(e) issues. I'll leave the box running for some
more and will afterwards disable IOMMU as a whole to see if I hit USB and PCI(e)
issues again with that combination.

Best regards,



Mihai

[resending to include all previous CC's]



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-21 Thread Mihai Moldovan
* On 20.01.2013 11:49 PM, Daniel Vetter wrote:
 Thanks for testing, I've just submitted the patch for review. It
 should included in a -fixes tree soon and the get backported to
 stable kernels.

Thanks. :)


 Please let me know when this works solidly for you, so that I can
 put it into a real patch and also submit it for inclusion.

No freeze for 24h, I guess we can conclude the quirk does indeed
fix the random freeze issue as well. :)
I'm all for inclusion.


I'm also currently testing a kernel without the Intel IOMMU feature. This seems
to work, too, but also disables Intel TXT and VT-d...
At least not seeing USB and PCI(e) issues. I'll leave the box running for some
more and will afterwards disable IOMMU as a whole to see if I hit USB and PCI(e)
issues again with that combination.

Best regards,



Mihai

[resending to include all previous CC's]



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-20 Thread Mihai Moldovan
Hi Daniel,

the patch does work, i.e., it turns off DMAR for the graphics card and
alleviates the freezes when loading i915/kms.

However, still seeing random machine freezes with it (being consistent with the
behavior I've experienced with intel_iommu=igfx_off).

The patch + forcing RWBF is working, too. Interestingly, this version didn't
randomly freeze yet, after more than 5 hours of uptime! I'll leave the box
running until tomorrow to make sure I did stick around long enough.

All those tested kernels were able to handle USB and PCI(e) devices.

I still have to test turning off IOMMU in general and Intel IOMMU specifically.
Will probably do this tomorrow.

Thank you so far! :)



Mihai







smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-20 Thread Mihai Moldovan
Hi Daniel,

the patch does work, i.e., it turns off DMAR for the graphics card and
alleviates the freezes when loading i915/kms.

However, still seeing random machine freezes with it (being consistent with the
behavior I've experienced with intel_iommu=igfx_off).

The patch + forcing RWBF is working, too. Interestingly, this version didn't
randomly freeze yet, after more than 5 hours of uptime! I'll leave the box
running until tomorrow to make sure I did stick around long enough.

All those tested kernels were able to handle USB and PCI(e) devices.

I still have to test turning off IOMMU in general and Intel IOMMU specifically.
Will probably do this tomorrow.

Thank you so far! :)



Mihai







smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-19 Thread Mihai Moldovan
* On 19.01.2013 05:13 PM, Mihai Moldovan wrote:
> * On 19.01.2013 02:27 PM, Daniel Vetter wrote:
>> You have a gen4.5 chipset which is known to be utterly broken for
>> IOMMU+intel gpu.
> Nice description for what I'm seeing. ;)
>
> After some more hours of uptime I'm inclined to say, that "intel_iommu=off
> iommu=off" fixes my random freezes as well.
> Alas, the USB and PCI(e) problems are still around, but I could test 
> recompiling
> 3.7.2 with Intel IOMMU turned off completely in the kernel config.
> Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even 
> *have*
> support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those
> problems on all previous versions.
>
>
>> [...] and we've never added the proper
>> quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a
>> proposed patch to fix this (i.e. automatically set
>> intel_iommu=igfx_off for affected platfroms). Testing highly welcome.
> From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 
> is
> missing.
> [...]

Which of course will work, as 2e10 is my DRAM controller as reported by lspci,
sorry.

But, shouldn't the
"DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2eXX, quirk_iommu_rwbf);"
calls be rather
" DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_g4x_gfx);"
?

The current patch errors out on my while compiling as quirk_iommu_rwbf is not
yet defined at that place.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-19 Thread Mihai Moldovan
* On 19.01.2013 02:27 PM, Daniel Vetter wrote:
> You have a gen4.5 chipset which is known to be utterly broken for
> IOMMU+intel gpu.

Nice description for what I'm seeing. ;)

After some more hours of uptime I'm inclined to say, that "intel_iommu=off
iommu=off" fixes my random freezes as well.
Alas, the USB and PCI(e) problems are still around, but I could test recompiling
3.7.2 with Intel IOMMU turned off completely in the kernel config.
Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even *have*
support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those
problems on all previous versions.


> [...] and we've never added the proper
> quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a
> proposed patch to fix this (i.e. automatically set
> intel_iommu=igfx_off for affected platfroms). Testing highly welcome.

From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 is
missing.
I'll add it to the relevant section.

But even if it worked, I'd still have the "box freezes randomly" issue (mostly
within 5 to 60 minutes of uptime). :(
The only way to get rid of this is disabling Intel IOMMU as a whole via kernel
parameters intel_iommu=off iommu=off.

Anyway, I'll give it a try.

Best regards,



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-19 Thread Mihai Moldovan
* On 19.01.2013 02:27 PM, Daniel Vetter wrote:
 You have a gen4.5 chipset which is known to be utterly broken for
 IOMMU+intel gpu.

Nice description for what I'm seeing. ;)

After some more hours of uptime I'm inclined to say, that intel_iommu=off
iommu=off fixes my random freezes as well.
Alas, the USB and PCI(e) problems are still around, but I could test recompiling
3.7.2 with Intel IOMMU turned off completely in the kernel config.
Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even *have*
support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those
problems on all previous versions.


 [...] and we've never added the proper
 quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a
 proposed patch to fix this (i.e. automatically set
 intel_iommu=igfx_off for affected platfroms). Testing highly welcome.

From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 is
missing.
I'll add it to the relevant section.

But even if it worked, I'd still have the box freezes randomly issue (mostly
within 5 to 60 minutes of uptime). :(
The only way to get rid of this is disabling Intel IOMMU as a whole via kernel
parameters intel_iommu=off iommu=off.

Anyway, I'll give it a try.

Best regards,



Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-19 Thread Mihai Moldovan
* On 19.01.2013 05:13 PM, Mihai Moldovan wrote:
 * On 19.01.2013 02:27 PM, Daniel Vetter wrote:
 You have a gen4.5 chipset which is known to be utterly broken for
 IOMMU+intel gpu.
 Nice description for what I'm seeing. ;)

 After some more hours of uptime I'm inclined to say, that intel_iommu=off
 iommu=off fixes my random freezes as well.
 Alas, the USB and PCI(e) problems are still around, but I could test 
 recompiling
 3.7.2 with Intel IOMMU turned off completely in the kernel config.
 Interestingly, my 3.0.2 kernel which worked fine for so long doesn't even 
 *have*
 support for VT-d/Intel IOMMU. This could explain why I wasn't bit by those
 problems on all previous versions.


 [...] and we've never added the proper
 quirks. See https://bugzilla.kernel.org/show_bug.cgi?id=51921 for a
 proposed patch to fix this (i.e. automatically set
 intel_iommu=igfx_off for affected platfroms). Testing highly welcome.
 From a quick glance, I don't think this patch will work as-is, my PCI ID 2e12 
 is
 missing.
 [...]

Which of course will work, as 2e10 is my DRAM controller as reported by lspci,
sorry.

But, shouldn't the
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2eXX, quirk_iommu_rwbf);
calls be rather
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_g4x_gfx);
?

The current patch errors out on my while compiling as quirk_iommu_rwbf is not
yet defined at that place.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-18 Thread Mihai Moldovan
* On 19.01.2013 12:48 AM, Mihai Moldovan wrote:
> Testing further, I rebooted using iommu=off and intel_iommu=off. So far, I had
> no random crashes, but the system uptime of REPLACEME minutes is too
> small to draw conclusions yet.

And by "REPLACEME", I meant 50 (minutes). That's embarrassing, sorry.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: i915-related and general system freezes with specific kernel config // IOMMU

2013-01-18 Thread Mihai Moldovan
* On 19.01.2013 12:48 AM, Mihai Moldovan wrote:
 Testing further, I rebooted using iommu=off and intel_iommu=off. So far, I had
 no random crashes, but the system uptime of REPLACEME minutes is too
 small to draw conclusions yet.

And by REPLACEME, I meant 50 (minutes). That's embarrassing, sorry.



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
Had another look at the code and would like to apologize for the confusion...

* On 13.08.2012 05:27 PM, Mihai Moldovan wrote:
> Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled
> "disabled" and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled "reserved", 
> which
> neither should be touched).

Wrong.
struct intel_gmbus gmbus[GMBUS_NUM_PORTS];
thus starting at 0 to GMBUS_NUM_PORTS-1, no more reserved or disabled ports. I
have totally overlooked the definition, sorry.

Ignore the rest of my comments and the patch, as they are based on false
assumptions (gmbus still containing the disabled and reserved ports.)

Instead, I'd like to ACK Jani's patch. The module can now be loaded fine,
there's no null ptr dereference anymore and only some gmbus warnings show up,
though this time only one message per port, so basically it's falling back to
bit banging on all gmbus ports as it should:

[   14.722454] i915 :00:02.0: setting latency timer to 64
[   14.796032] [drm] GMBUS [i915 gmbus ssc] timed out, falling back to bit
banging on pin 1
[   15.044039] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit
banging on pin 3
[   15.420067] [drm] GMBUS [i915 gmbus dpd] timed out, falling back to bit
banging on pin 6
[   15.548121] i915 :00:02.0: irq 55 for MSI/MSI-X
[   15.842123] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on minor 0

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
* On 13.08.2012 05:09 PM, Daniel Vetter wrote:
> On Mon, Aug 13, 2012 at 05:03:24PM +0200, Mihai Moldovan wrote:
>> Hi Jani,
>>
>> The reason sounds sane to me, but while looking through the code, I have 
>> seen a
>> few other problems, too.
>>
>> To my understanding, we should use port for dev_priv->gmbus[], not the pin
>> mapping (which is only used for gmbus_ports[]).
>> Don't forget to add the +1 for pin -> port mapping to the error case.
>>
>> Also, intel_gmbus_get_adapter is already accepting a port value (I made sure 
>> to
>> look at the calls in other files too), so don't map the port back to a pin.
>>
>> Keep the same in mind for the intel_teardown_gmbus "destructor".
>>
>> The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, 
>> which is
>> known as "disabled" and shouldn't be used (previously has_gpio was set to 
>> false
>> for those ports to not do any transfer on those ports.)
>>
>> I may be wrong, could you review this and maybe add it to your patch?
> This seems to essentially undo
>
> commit 2ed06c93a1fce057808894d73167aae03c76deaf
> Author: Daniel Kurtz 
> Date:   Wed Mar 28 02:36:15 2012 +0800
>
> drm/i915/intel_i2c: gmbus disabled and reserved ports are invalid
>
> Note that port numbers start at 1, whereas the array is 0-index based. So
> you patch here would blow up if you don't extend the dev_priv->gmbus
> array.

Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled
"disabled" and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled "reserved", which
neither should be touched).
Thus, in effect, it starts with 1 and ends with 6, but the current code does not
take that into account, instead accessing elements from 0 onwards:

The code currently would access *dev_priv->gmbus[0] in the first iteration,
which is labeled as "disabled" and shouldn't be touched. Instead, we should do a
pin->port mapping and access *dev_priv->gmbus[1, 2, 3 ... 6] instead (with
*dev_priv->gmbus[7] left out, as it's marked as "reserved" and again shouldn't
be touched.)

However, accessing gmbus_ports[0] is fine, and we can then copy
gmbus_ports[0].name to  *dev_priv->gmbus[1]->adapter.name
 ^ pin  
 ^ port

Blowing up seems impossible too, as GMBUS_NUM_PORTS is #defined as END_PORT -
BEGIN_PORT + 1 which will evaluate to 6 and be the last index used.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
Hi Jani,

* On 13.08.2012 04:33 PM, Jani Nikula wrote:
> Hi Mihai, could you test the following patch to see if it fixes the problem,
> please?
>
> BR,
> Jani.
>
>
> Jani Nikula (1):
>   drm/i915: ensure i2c adapter is all set before adding it
>
>  drivers/gpu/drm/i915/intel_i2c.c |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>

The reason sounds sane to me, but while looking through the code, I have seen a
few other problems, too.

To my understanding, we should use port for dev_priv->gmbus[], not the pin
mapping (which is only used for gmbus_ports[]).
Don't forget to add the +1 for pin -> port mapping to the error case.

Also, intel_gmbus_get_adapter is already accepting a port value (I made sure to
look at the calls in other files too), so don't map the port back to a pin.

Keep the same in mind for the intel_teardown_gmbus "destructor".

The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, which is
known as "disabled" and shouldn't be used (previously has_gpio was set to false
for those ports to not do any transfer on those ports.)

I may be wrong, could you review this and maybe add it to your patch?


diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index 1991a44..b725993 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -472,8 +474,8 @@ int intel_setup_gmbus(struct drm_device *dev)
mutex_init(_priv->gmbus_mutex);

for (i = 0; i < GMBUS_NUM_PORTS; i++) {
-   struct intel_gmbus *bus = _priv->gmbus[i];
u32 port = i + 1; /* +1 to map gmbus index to pin pair */
+   struct intel_gmbus *bus = _priv->gmbus[port];

bus->adapter.owner = THIS_MODULE;
bus->adapter.class = I2C_CLASS_DDC;
@@ -506,7 +508,7 @@ int intel_setup_gmbus(struct drm_device *dev)

 err:
while (--i) {
-   struct intel_gmbus *bus = _priv->gmbus[i];
+   struct intel_gmbus *bus = _priv->gmbus[i + 1];
i2c_del_adapter(>adapter);
}
return ret;
@@ -516,9 +518,8 @@ struct i2c_adapter *intel_gmbus_get_adapter(struct
drm_i915_private *dev_priv,
unsigned port)
 {
WARN_ON(!intel_gmbus_is_port_valid(port));
-   /* -1 to map pin pair to gmbus index */
return (intel_gmbus_is_port_valid(port)) ?
-   _priv->gmbus[port - 1].adapter : NULL;
+   _priv->gmbus[port].adapter : NULL;
 }

 void intel_gmbus_set_speed(struct i2c_adapter *adapter, int speed)
@@ -543,8 +544,9 @@ void intel_teardown_gmbus(struct drm_device *dev)
if (dev_priv->gmbus == NULL)
return;

+/* +1 to map gmbus index to pin pair */
for (i = 0; i < GMBUS_NUM_PORTS; i++) {
-   struct intel_gmbus *bus = _priv->gmbus[i];
+   struct intel_gmbus *bus = _priv->gmbus[i + 1];
i2c_del_adapter(>adapter);
}
 }





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
Hi Jani,

* On 13.08.2012 04:33 PM, Jani Nikula wrote:
 Hi Mihai, could you test the following patch to see if it fixes the problem,
 please?

 BR,
 Jani.


 Jani Nikula (1):
   drm/i915: ensure i2c adapter is all set before adding it

  drivers/gpu/drm/i915/intel_i2c.c |7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)


The reason sounds sane to me, but while looking through the code, I have seen a
few other problems, too.

To my understanding, we should use port for dev_priv-gmbus[], not the pin
mapping (which is only used for gmbus_ports[]).
Don't forget to add the +1 for pin - port mapping to the error case.

Also, intel_gmbus_get_adapter is already accepting a port value (I made sure to
look at the calls in other files too), so don't map the port back to a pin.

Keep the same in mind for the intel_teardown_gmbus destructor.

The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, which is
known as disabled and shouldn't be used (previously has_gpio was set to false
for those ports to not do any transfer on those ports.)

I may be wrong, could you review this and maybe add it to your patch?


diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c
index 1991a44..b725993 100644
--- a/drivers/gpu/drm/i915/intel_i2c.c
+++ b/drivers/gpu/drm/i915/intel_i2c.c
@@ -472,8 +474,8 @@ int intel_setup_gmbus(struct drm_device *dev)
mutex_init(dev_priv-gmbus_mutex);

for (i = 0; i  GMBUS_NUM_PORTS; i++) {
-   struct intel_gmbus *bus = dev_priv-gmbus[i];
u32 port = i + 1; /* +1 to map gmbus index to pin pair */
+   struct intel_gmbus *bus = dev_priv-gmbus[port];

bus-adapter.owner = THIS_MODULE;
bus-adapter.class = I2C_CLASS_DDC;
@@ -506,7 +508,7 @@ int intel_setup_gmbus(struct drm_device *dev)

 err:
while (--i) {
-   struct intel_gmbus *bus = dev_priv-gmbus[i];
+   struct intel_gmbus *bus = dev_priv-gmbus[i + 1];
i2c_del_adapter(bus-adapter);
}
return ret;
@@ -516,9 +518,8 @@ struct i2c_adapter *intel_gmbus_get_adapter(struct
drm_i915_private *dev_priv,
unsigned port)
 {
WARN_ON(!intel_gmbus_is_port_valid(port));
-   /* -1 to map pin pair to gmbus index */
return (intel_gmbus_is_port_valid(port)) ?
-   dev_priv-gmbus[port - 1].adapter : NULL;
+   dev_priv-gmbus[port].adapter : NULL;
 }

 void intel_gmbus_set_speed(struct i2c_adapter *adapter, int speed)
@@ -543,8 +544,9 @@ void intel_teardown_gmbus(struct drm_device *dev)
if (dev_priv-gmbus == NULL)
return;

+/* +1 to map gmbus index to pin pair */
for (i = 0; i  GMBUS_NUM_PORTS; i++) {
-   struct intel_gmbus *bus = dev_priv-gmbus[i];
+   struct intel_gmbus *bus = dev_priv-gmbus[i + 1];
i2c_del_adapter(bus-adapter);
}
 }





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
* On 13.08.2012 05:09 PM, Daniel Vetter wrote:
 On Mon, Aug 13, 2012 at 05:03:24PM +0200, Mihai Moldovan wrote:
 Hi Jani,

 The reason sounds sane to me, but while looking through the code, I have 
 seen a
 few other problems, too.

 To my understanding, we should use port for dev_priv-gmbus[], not the pin
 mapping (which is only used for gmbus_ports[]).
 Don't forget to add the +1 for pin - port mapping to the error case.

 Also, intel_gmbus_get_adapter is already accepting a port value (I made sure 
 to
 look at the calls in other files too), so don't map the port back to a pin.

 Keep the same in mind for the intel_teardown_gmbus destructor.

 The current code adds the gmbus algorithm (gmbus_xfer) to gmbus port 0, 
 which is
 known as disabled and shouldn't be used (previously has_gpio was set to 
 false
 for those ports to not do any transfer on those ports.)

 I may be wrong, could you review this and maybe add it to your patch?
 This seems to essentially undo

 commit 2ed06c93a1fce057808894d73167aae03c76deaf
 Author: Daniel Kurtz djku...@chromium.org
 Date:   Wed Mar 28 02:36:15 2012 +0800

 drm/i915/intel_i2c: gmbus disabled and reserved ports are invalid

 Note that port numbers start at 1, whereas the array is 0-index based. So
 you patch here would blow up if you don't extend the dev_priv-gmbus
 array.

Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled
disabled and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled reserved, which
neither should be touched).
Thus, in effect, it starts with 1 and ends with 6, but the current code does not
take that into account, instead accessing elements from 0 onwards:

The code currently would access *dev_priv-gmbus[0] in the first iteration,
which is labeled as disabled and shouldn't be touched. Instead, we should do a
pin-port mapping and access *dev_priv-gmbus[1, 2, 3 ... 6] instead (with
*dev_priv-gmbus[7] left out, as it's marked as reserved and again shouldn't
be touched.)

However, accessing gmbus_ports[0] is fine, and we can then copy
gmbus_ports[0].name to  *dev_priv-gmbus[1]-adapter.name
 ^ pin  
 ^ port

Blowing up seems impossible too, as GMBUS_NUM_PORTS is #defined as END_PORT -
BEGIN_PORT + 1 which will evaluate to 6 and be the last index used.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH 0/1] hopefully fix null pointer dereference on i915 load

2012-08-13 Thread Mihai Moldovan
Had another look at the code and would like to apologize for the confusion...

* On 13.08.2012 05:27 PM, Mihai Moldovan wrote:
 Uhm, no, quite on the contrary. gmbus starts at 0 (with idx 0 being labeled
 disabled and idx ((GMBUS_NUM_PORTS == 6) + 1) being labeled reserved, 
 which
 neither should be touched).

Wrong.
struct intel_gmbus gmbus[GMBUS_NUM_PORTS];
thus starting at 0 to GMBUS_NUM_PORTS-1, no more reserved or disabled ports. I
have totally overlooked the definition, sorry.

Ignore the rest of my comments and the patch, as they are based on false
assumptions (gmbus still containing the disabled and reserved ports.)

Instead, I'd like to ACK Jani's patch. The module can now be loaded fine,
there's no null ptr dereference anymore and only some gmbus warnings show up,
though this time only one message per port, so basically it's falling back to
bit banging on all gmbus ports as it should:

[   14.722454] i915 :00:02.0: setting latency timer to 64
[   14.796032] [drm] GMBUS [i915 gmbus ssc] timed out, falling back to bit
banging on pin 1
[   15.044039] [drm] GMBUS [i915 gmbus panel] timed out, falling back to bit
banging on pin 3
[   15.420067] [drm] GMBUS [i915 gmbus dpd] timed out, falling back to bit
banging on pin 6
[   15.548121] i915 :00:02.0: irq 55 for MSI/MSI-X
[   15.842123] [drm] Initialized i915 1.6.0 20080730 for :00:02.0 on minor 0

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 07:44 PM, Mihai Moldovan wrote:
> Hm, OK.
>
> Well, I'm done now.
>
> bisect log:
>
> git bisect start
> # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
> git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
> # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
> git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
> # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
> git://oss.sgi.com/xfs/xfs
> git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3
> # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4
> # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of
> git://oss.sgi.com/xfs/xfs
> git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10
> # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag
> 'ktest-v3.5-spelling' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
> git bisect good 927ad551031798d4cba49766549600bbb33872d7
> # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
> git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9
> # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll->pll
> mode work
> git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a
> # bad: [8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8] drm/i915: Unconditionally
> initialise the interrupt workers
> git bisect bad 8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8
> # bad: [f637fde434c9e3687798730c7ddd367e93666013] drm/i915: inline
> enable/disable_irq into ring->get/put_irq
> git bisect bad f637fde434c9e3687798730c7ddd367e93666013
> # bad: [23e3f9b37e7368ee8530ba99907508363feebc14] drm/i915: check for disabled
> interrupts on ValleyView
> git bisect bad 23e3f9b37e7368ee8530ba99907508363feebc14
> # good: [8489731c9bd22c27ab17a2190cd7444604abf95f] drm/i915: move clflushing
> into shmem_pread
> git bisect good 8489731c9bd22c27ab17a2190cd7444604abf95f
> # good: [3bd7d90938f1fe77de5991dc4b727843c4980b2a] drm/i915/intel_i2c: 
> refactor
> using intel_gmbus_get_adapter
> git bisect good 3bd7d90938f1fe77de5991dc4b727843c4980b2a
> # bad: [57f350b6722f9569f407872f6ead56e2d221d98a] drm/i915: add DPIO support
> git bisect bad 57f350b6722f9569f407872f6ead56e2d221d98a
> # bad: [93e537a10f2c8c0f2e74409b6cb473fc221758fa] drm/i915: split LVDS update
> code out of i9xx_crtc_mode_set
> git bisect bad 93e537a10f2c8c0f2e74409b6cb473fc221758fa
> # bad: [f2c9677be3158c31ba19f527e2be0f7a519e19d1] drm/i915/intel_i2c: allocate
> gmbus array as part of drm_i915_private
> git bisect bad f2c9677be3158c31ba19f527e2be0f7a519e19d1
> # bad: [2ed06c93a1fce057808894d73167aae03c76deaf] drm/i915/intel_i2c: gmbus
> disabled and reserved ports are invalid
> git bisect bad 2ed06c93a1fce057808894d73167aae03c76deaf

Just to be safe, I also tested git HEAD (3.6.0-rc1-00209-gf62bf17), no dice 
either.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 06:39 PM, Daniel Vetter wrote:
> On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan  wrote:
>> * On 10.08.2012 12:10 PM, Daniel Vetter wrote:
>>> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan  wrote:
>>>> Hi Daniel, hi list
>>>>
>>>> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
>>>> fine),
>>>> my box is crashing when loading the i915 driver (mode-setting enabled.)
>>>>
>>>> The current version I'm testing with is 3.5.0.
>>>>
>>>> I was able to get the BUG output (please forgive any errors/flips in the 
>>>> output,
>>>> I have had to transcribe the messages from the screen/images), however, 
>>>> I'm not
>>>> able to find out what's wrong.
>>>>
>>>> If I see it correctly, there's a null pointer dereference in a printk 
>>>> called
>>>> from inside gmbus_xfer. The only printk calls I can see in
>>>> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
>>>> DRM_DEBUG_KMS() and DRM_INFO() macros.
>>>> Neither call looks wrong to me, I even tried to swap adapter->name with
>>>> bus->adapter.name and make *sure* i < num is true, but haven't had any 
>>>> success.
>>>>
>>>> I'd really like to see this bug fixed, as it's preventing me from updating 
>>>> the
>>>> kernel for over a year now.
>>>>
>>>> Also, while 3.0.2 works, it *does* spew error/warning messages related to 
>>>> gmbus
>>>> and I've had corrupted VTs in the past (albeit after a long uptime with 
>>>> multiple
>>>> X restarting and DVI cable unplugging/reattaching events), so maybe 
>>>> there's a
>>>> lot more broken than "expected".
>>> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
>>> since exactly this issue might happen. We've re-enabled gmbus again on
>>> 3.5 after having fixed this bug. Are you sure that this is plain 3.2
>>> you're running?
>> Sorry, I messed up the version numbers. Started bisecting yesterday and 
>> noticed,
>> that 3.0 up to 3.2 still work "fine" (see below), instead I've had another
>> problem with 3.2 (completely lockup after the kernel is running for a few
>> minutes, but I have no idea where this issue is coming from. Seems to be
>> happening with 3.2.0 only, so... *shrug*)
>>
>> 3.0.2   => working, gmbus warnings as posted.
>> 3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below)
>> 3.2-rc2 to rc4  => working, NO gmbus warnings, but render errors (see below)
>> --- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
>> --- (restarted bisecting with 3.2 to 3.5) ---
>> 3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors
>> (see below)
>> 3.4.0-07487 => working, gmbus warnings, hang errors (see below)
>> ...
>>
>> I've done more steps, but have not yet finished bisecting, so stay tuned.
>> All those render errors look like that:
>>
>> [drm] capturing error event; look for more information in
>> /debug/dri/0/i915_error_state
>> render error detected, EIR: 0x0010
>>   IPEIR: 0x
>>   IPEHR: 0x0200
>>   INSTDONE: 0x
>>   INSTPS: 0x8001e025
>>   INSTDONE1: 0xbfbb
>>   ACTHD: 0x00a4203c
>> page table error
>>   PGTBL_ER: 0x0010
>> [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking
>>
>> I'll finish bisecting (and hope, that my guess was right, concerning the
>> varaiant I wasn't able to build) and will post the bisect log when done.
>>
>> Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
>> enabled as I'm pretty sure I always saw those errors when booting (just
>> confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
>> 2.6.36.1, 3.1-..., 3.2-... though.
> Yeah, we've enabled gmbus a few times and then disabled it again due
> to bugs. Also, the usual debug messsage says gmbus even when gmbus
> isn't on ... yeah, slightly confusing, but that should be fixed, too.

Hm, OK.

Well, I'm done now.

bisect log:

git bisect start
# good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
# bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
# good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of

Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 12:10 PM, Daniel Vetter wrote:
> On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan  wrote:
>> Hi Daniel, hi list
>>
>> ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
>> fine),
>> my box is crashing when loading the i915 driver (mode-setting enabled.)
>>
>> The current version I'm testing with is 3.5.0.
>>
>> I was able to get the BUG output (please forgive any errors/flips in the 
>> output,
>> I have had to transcribe the messages from the screen/images), however, I'm 
>> not
>> able to find out what's wrong.
>>
>> If I see it correctly, there's a null pointer dereference in a printk called
>> from inside gmbus_xfer. The only printk calls I can see in
>> drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
>> DRM_DEBUG_KMS() and DRM_INFO() macros.
>> Neither call looks wrong to me, I even tried to swap adapter->name with
>> bus->adapter.name and make *sure* i < num is true, but haven't had any 
>> success.
>>
>> I'd really like to see this bug fixed, as it's preventing me from updating 
>> the
>> kernel for over a year now.
>>
>> Also, while 3.0.2 works, it *does* spew error/warning messages related to 
>> gmbus
>> and I've had corrupted VTs in the past (albeit after a long uptime with 
>> multiple
>> X restarting and DVI cable unplugging/reattaching events), so maybe there's a
>> lot more broken than "expected".
>
> Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
> since exactly this issue might happen. We've re-enabled gmbus again on
> 3.5 after having fixed this bug. Are you sure that this is plain 3.2
> you're running?

Sorry, I messed up the version numbers. Started bisecting yesterday and noticed,
that 3.0 up to 3.2 still work "fine" (see below), instead I've had another
problem with 3.2 (completely lockup after the kernel is running for a few
minutes, but I have no idea where this issue is coming from. Seems to be
happening with 3.2.0 only, so... *shrug*)

3.0.2   => working, gmbus warnings as posted.
3.1-09933/07170 => working, NO gmbus warnings, but render errors (see below)
3.2-rc2 to rc4  => working, NO gmbus warnings, but render errors (see below)
--- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
--- (restarted bisecting with 3.2 to 3.5) ---
3.3.0-06109 => working, gmbus warnings just like with 3.0, render errors
(see below)
3.4.0-07487 => working, gmbus warnings, hang errors (see below)
...

I've done more steps, but have not yet finished bisecting, so stay tuned.
All those render errors look like that:

[drm] capturing error event; look for more information in
/debug/dri/0/i915_error_state
render error detected, EIR: 0x0010
  IPEIR: 0x
  IPEHR: 0x0200
  INSTDONE: 0x
  INSTPS: 0x8001e025
  INSTDONE1: 0xbfbb
  ACTHD: 0x00a4203c
page table error
  PGTBL_ER: 0x0010
[drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking

I'll finish bisecting (and hope, that my guess was right, concerning the
varaiant I wasn't able to build) and will post the bisect log when done.

Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
enabled as I'm pretty sure I always saw those errors when booting (just
confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
2.6.36.1, 3.1-..., 3.2-... though.

Best regards,


Mihai




smime.p7s
Description: S/MIME Cryptographic Signature


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 12:10 PM, Daniel Vetter wrote:
 On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan io...@ionic.de wrote:
 Hi Daniel, hi list

 ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
 fine),
 my box is crashing when loading the i915 driver (mode-setting enabled.)

 The current version I'm testing with is 3.5.0.

 I was able to get the BUG output (please forgive any errors/flips in the 
 output,
 I have had to transcribe the messages from the screen/images), however, I'm 
 not
 able to find out what's wrong.

 If I see it correctly, there's a null pointer dereference in a printk called
 from inside gmbus_xfer. The only printk calls I can see in
 drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
 DRM_DEBUG_KMS() and DRM_INFO() macros.
 Neither call looks wrong to me, I even tried to swap adapter-name with
 bus-adapter.name and make *sure* i  num is true, but haven't had any 
 success.

 I'd really like to see this bug fixed, as it's preventing me from updating 
 the
 kernel for over a year now.

 Also, while 3.0.2 works, it *does* spew error/warning messages related to 
 gmbus
 and I've had corrupted VTs in the past (albeit after a long uptime with 
 multiple
 X restarting and DVI cable unplugging/reattaching events), so maybe there's a
 lot more broken than expected.

 Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
 since exactly this issue might happen. We've re-enabled gmbus again on
 3.5 after having fixed this bug. Are you sure that this is plain 3.2
 you're running?

Sorry, I messed up the version numbers. Started bisecting yesterday and noticed,
that 3.0 up to 3.2 still work fine (see below), instead I've had another
problem with 3.2 (completely lockup after the kernel is running for a few
minutes, but I have no idea where this issue is coming from. Seems to be
happening with 3.2.0 only, so... *shrug*)

3.0.2   = working, gmbus warnings as posted.
3.1-09933/07170 = working, NO gmbus warnings, but render errors (see below)
3.2-rc2 to rc4  = working, NO gmbus warnings, but render errors (see below)
--- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
--- (restarted bisecting with 3.2 to 3.5) ---
3.3.0-06109 = working, gmbus warnings just like with 3.0, render errors
(see below)
3.4.0-07487 = working, gmbus warnings, hang errors (see below)
...

I've done more steps, but have not yet finished bisecting, so stay tuned.
All those render errors look like that:

[drm] capturing error event; look for more information in
/debug/dri/0/i915_error_state
render error detected, EIR: 0x0010
  IPEIR: 0x
  IPEHR: 0x0200
  INSTDONE: 0x
  INSTPS: 0x8001e025
  INSTDONE1: 0xbfbb
  ACTHD: 0x00a4203c
page table error
  PGTBL_ER: 0x0010
[drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking

I'll finish bisecting (and hope, that my guess was right, concerning the
varaiant I wasn't able to build) and will post the bisect log when done.

Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
enabled as I'm pretty sure I always saw those errors when booting (just
confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
2.6.36.1, 3.1-..., 3.2-... though.

Best regards,


Mihai




smime.p7s
Description: S/MIME Cryptographic Signature


Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 06:39 PM, Daniel Vetter wrote:
 On Fri, Aug 10, 2012 at 6:05 PM, Mihai Moldovan io...@ionic.de wrote:
 * On 10.08.2012 12:10 PM, Daniel Vetter wrote:
 On Wed, Aug 8, 2012 at 6:50 AM, Mihai Moldovan io...@ionic.de wrote:
 Hi Daniel, hi list

 ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working 
 fine),
 my box is crashing when loading the i915 driver (mode-setting enabled.)

 The current version I'm testing with is 3.5.0.

 I was able to get the BUG output (please forgive any errors/flips in the 
 output,
 I have had to transcribe the messages from the screen/images), however, 
 I'm not
 able to find out what's wrong.

 If I see it correctly, there's a null pointer dereference in a printk 
 called
 from inside gmbus_xfer. The only printk calls I can see in
 drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
 DRM_DEBUG_KMS() and DRM_INFO() macros.
 Neither call looks wrong to me, I even tried to swap adapter-name with
 bus-adapter.name and make *sure* i  num is true, but haven't had any 
 success.

 I'd really like to see this bug fixed, as it's preventing me from updating 
 the
 kernel for over a year now.

 Also, while 3.0.2 works, it *does* spew error/warning messages related to 
 gmbus
 and I've had corrupted VTs in the past (albeit after a long uptime with 
 multiple
 X restarting and DVI cable unplugging/reattaching events), so maybe 
 there's a
 lot more broken than expected.
 Hm, this is rather strange. gmbus should not be enable on 3.2 nor 3.0,
 since exactly this issue might happen. We've re-enabled gmbus again on
 3.5 after having fixed this bug. Are you sure that this is plain 3.2
 you're running?
 Sorry, I messed up the version numbers. Started bisecting yesterday and 
 noticed,
 that 3.0 up to 3.2 still work fine (see below), instead I've had another
 problem with 3.2 (completely lockup after the kernel is running for a few
 minutes, but I have no idea where this issue is coming from. Seems to be
 happening with 3.2.0 only, so... *shrug*)

 3.0.2   = working, gmbus warnings as posted.
 3.1-09933/07170 = working, NO gmbus warnings, but render errors (see below)
 3.2-rc2 to rc4  = working, NO gmbus warnings, but render errors (see below)
 --- (stopped bisecting 3.0 to 3.2 as this was pointless) ---
 --- (restarted bisecting with 3.2 to 3.5) ---
 3.3.0-06109 = working, gmbus warnings just like with 3.0, render errors
 (see below)
 3.4.0-07487 = working, gmbus warnings, hang errors (see below)
 ...

 I've done more steps, but have not yet finished bisecting, so stay tuned.
 All those render errors look like that:

 [drm] capturing error event; look for more information in
 /debug/dri/0/i915_error_state
 render error detected, EIR: 0x0010
   IPEIR: 0x
   IPEHR: 0x0200
   INSTDONE: 0x
   INSTPS: 0x8001e025
   INSTDONE1: 0xbfbb
   ACTHD: 0x00a4203c
 page table error
   PGTBL_ER: 0x0010
 [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x0010, masking

 I'll finish bisecting (and hope, that my guess was right, concerning the
 varaiant I wasn't able to build) and will post the bisect log when done.

 Meanwhile: at least for 3.0.2 and even older versions, gmbus must have been
 enabled as I'm pretty sure I always saw those errors when booting (just
 confirmed via logs for 3.0.0, 26.38.6, 2.6.39). Doesn't come up with 2.6.34,
 2.6.36.1, 3.1-..., 3.2-... though.
 Yeah, we've enabled gmbus a few times and then disabled it again due
 to bugs. Also, the usual debug messsage says gmbus even when gmbus
 isn't on ... yeah, slightly confusing, but that should be fixed, too.

Hm, OK.

Well, I'm done now.

bisect log:

git bisect start
# good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
# bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
# good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
git://oss.sgi.com/xfs/xfs
git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3
# good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4
# bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of
git://oss.sgi.com/xfs/xfs
git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10
# good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag
'ktest-v3.5-spelling' of
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
git bisect good 927ad551031798d4cba49766549600bbb33872d7
# good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9
# bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll-pll
mode work
git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a
# bad

Re: null pointer dereference while loading i915

2012-08-10 Thread Mihai Moldovan
* On 10.08.2012 07:44 PM, Mihai Moldovan wrote:
 Hm, OK.

 Well, I'm done now.

 bisect log:

 git bisect start
 # good: [805a6af8dba5dfdd35ec35dc52ec0122400b2610] Linux 3.2
 git bisect good 805a6af8dba5dfdd35ec35dc52ec0122400b2610
 # bad: [28a33cbc24e4256c143dce96c7d93bf423229f92] Linux 3.5
 git bisect bad 28a33cbc24e4256c143dce96c7d93bf423229f92
 # good: [49d99a2f9c4d033cc3965958a1397b1fad573dd3] Merge branch 'for-linus' of
 git://oss.sgi.com/xfs/xfs
 git bisect good 49d99a2f9c4d033cc3965958a1397b1fad573dd3
 # good: [813a95e5b4fa936bbde10ef89188932745dcd7f4] Merge tag 'pinctrl' of
 git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
 git bisect good 813a95e5b4fa936bbde10ef89188932745dcd7f4
 # bad: [9978306e31a8f89bd81fbc4c49fd9aefb1d30d10] Merge branch 'for-linus' of
 git://oss.sgi.com/xfs/xfs
 git bisect bad 9978306e31a8f89bd81fbc4c49fd9aefb1d30d10
 # good: [927ad551031798d4cba49766549600bbb33872d7] Merge tag
 'ktest-v3.5-spelling' of
 git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
 git bisect good 927ad551031798d4cba49766549600bbb33872d7
 # good: [2c01e7bc46f10e9190818437e564f7e0db875ae9] Merge branch 'for-linus' of
 git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
 git bisect good 2c01e7bc46f10e9190818437e564f7e0db875ae9
 # bad: [5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a] drm/nva3/pm: make pll-pll
 mode work
 git bisect bad 5f54d29ee9dace1e2ef4e8c9873ad4dd7a06d11a
 # bad: [8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8] drm/i915: Unconditionally
 initialise the interrupt workers
 git bisect bad 8b2e326dc7c5aa6952c88656d04d0d81fd85a6f8
 # bad: [f637fde434c9e3687798730c7ddd367e93666013] drm/i915: inline
 enable/disable_irq into ring-get/put_irq
 git bisect bad f637fde434c9e3687798730c7ddd367e93666013
 # bad: [23e3f9b37e7368ee8530ba99907508363feebc14] drm/i915: check for disabled
 interrupts on ValleyView
 git bisect bad 23e3f9b37e7368ee8530ba99907508363feebc14
 # good: [8489731c9bd22c27ab17a2190cd7444604abf95f] drm/i915: move clflushing
 into shmem_pread
 git bisect good 8489731c9bd22c27ab17a2190cd7444604abf95f
 # good: [3bd7d90938f1fe77de5991dc4b727843c4980b2a] drm/i915/intel_i2c: 
 refactor
 using intel_gmbus_get_adapter
 git bisect good 3bd7d90938f1fe77de5991dc4b727843c4980b2a
 # bad: [57f350b6722f9569f407872f6ead56e2d221d98a] drm/i915: add DPIO support
 git bisect bad 57f350b6722f9569f407872f6ead56e2d221d98a
 # bad: [93e537a10f2c8c0f2e74409b6cb473fc221758fa] drm/i915: split LVDS update
 code out of i9xx_crtc_mode_set
 git bisect bad 93e537a10f2c8c0f2e74409b6cb473fc221758fa
 # bad: [f2c9677be3158c31ba19f527e2be0f7a519e19d1] drm/i915/intel_i2c: allocate
 gmbus array as part of drm_i915_private
 git bisect bad f2c9677be3158c31ba19f527e2be0f7a519e19d1
 # bad: [2ed06c93a1fce057808894d73167aae03c76deaf] drm/i915/intel_i2c: gmbus
 disabled and reserved ports are invalid
 git bisect bad 2ed06c93a1fce057808894d73167aae03c76deaf

Just to be safe, I also tested git HEAD (3.6.0-rc1-00209-gf62bf17), no dice 
either.

Best regards,


Mihai



smime.p7s
Description: S/MIME Cryptographic Signature


null pointer dereference while loading i915

2012-08-07 Thread Mihai Moldovan
Hi Daniel, hi list

ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine),
my box is crashing when loading the i915 driver (mode-setting enabled.)

The current version I'm testing with is 3.5.0.

I was able to get the BUG output (please forgive any errors/flips in the output,
I have had to transcribe the messages from the screen/images), however, I'm not
able to find out what's wrong.

If I see it correctly, there's a null pointer dereference in a printk called
from inside gmbus_xfer. The only printk calls I can see in
drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
DRM_DEBUG_KMS() and DRM_INFO() macros.
Neither call looks wrong to me, I even tried to swap adapter->name with
bus->adapter.name and make *sure* i < num is true, but haven't had any success.

I'd really like to see this bug fixed, as it's preventing me from updating the
kernel for over a year now.

Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus
and I've had corrupted VTs in the past (albeit after a long uptime with multiple
X restarting and DVI cable unplugging/reattaching events), so maybe there's a
lot more broken than "expected".

PCI-IDs:

00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e12] (rev 03) (prog-if 00 [VGA 
controller])
Subsystem: Intel Corporation Device [8086:1003]
00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset Integrated
Graphics Controller [8086:2e13] (rev 03)
Subsystem: Intel Corporation Device [8086:1003]

Messages are attached.

Any help is appreciated, thanks. :)

Best regards,


Mihai


i915_kernel_BUG_gmbus_nullptrderef.txt.bz2
Description: BZip2 compressed data


i915_3.0.2_warning_messages.txt.bz2
Description: BZip2 compressed data


smime.p7s
Description: S/MIME Cryptographic Signature


null pointer dereference while loading i915

2012-08-07 Thread Mihai Moldovan
Hi Daniel, hi list

ever since version 3.2.0 (maybe even earlier, but 3.0.2 is still working fine),
my box is crashing when loading the i915 driver (mode-setting enabled.)

The current version I'm testing with is 3.5.0.

I was able to get the BUG output (please forgive any errors/flips in the output,
I have had to transcribe the messages from the screen/images), however, I'm not
able to find out what's wrong.

If I see it correctly, there's a null pointer dereference in a printk called
from inside gmbus_xfer. The only printk calls I can see in
drivers/gpu/drm/i915/intel_i2c.c gmbus_xfer() however are issued by the
DRM_DEBUG_KMS() and DRM_INFO() macros.
Neither call looks wrong to me, I even tried to swap adapter-name with
bus-adapter.name and make *sure* i  num is true, but haven't had any success.

I'd really like to see this bug fixed, as it's preventing me from updating the
kernel for over a year now.

Also, while 3.0.2 works, it *does* spew error/warning messages related to gmbus
and I've had corrupted VTs in the past (albeit after a long uptime with multiple
X restarting and DVI cable unplugging/reattaching events), so maybe there's a
lot more broken than expected.

PCI-IDs:

00:02.0 VGA compatible controller [0300]: Intel Corporation 4 Series Chipset
Integrated Graphics Controller [8086:2e12] (rev 03) (prog-if 00 [VGA 
controller])
Subsystem: Intel Corporation Device [8086:1003]
00:02.1 Display controller [0380]: Intel Corporation 4 Series Chipset Integrated
Graphics Controller [8086:2e13] (rev 03)
Subsystem: Intel Corporation Device [8086:1003]

Messages are attached.

Any help is appreciated, thanks. :)

Best regards,


Mihai


i915_kernel_BUG_gmbus_nullptrderef.txt.bz2
Description: BZip2 compressed data


i915_3.0.2_warning_messages.txt.bz2
Description: BZip2 compressed data


smime.p7s
Description: S/MIME Cryptographic Signature