from:"Fernando Lopez\-Lezcano"

Re: [ANNOUNCE] v5.9.1-rt18

2020-10-27 Thread Fernando Lopez-Lezcano


On 10/27/20 1:22 AM, Sebastian Andrzej Siewior wrote:

On 2020-10-26 23:53:20 [-0700], Fernando Lopez-Lezcano wrote:

Maybe I'm doing something wrong but I get a compilation error (see below)
when trying to do a debug build (building rpm packages for Fedora). 5.9.1 +
rt19...

Builds fine otherwise...


If you could remove CONFIG_TEST_LOCKUP then it should work. I will think
of something.


Thanks much, I should have figured this out for myself :-( Just t 
busy. The compilation process went ahead (not finished yet), let me know 
if there is a proper patch. No hurry...


Thanks!
-- Fernando

Re: [ANNOUNCE] v5.9.1-rt18

2020-10-27 Thread Fernando Lopez-Lezcano


On 10/21/20 6:14 AM, Sebastian Andrzej Siewior wrote:

On 2020-10-21 14:53:27 [+0200], To Thomas Gleixner wrote:

Dear RT folks!

I'm pleased to announce the v5.9.1-rt18 patch set.


Maybe I'm doing something wrong but I get a compilation error (see 
below) when trying to do a debug build (building rpm packages for 
Fedora). 5.9.1 + rt19...


Builds fine otherwise...
Thanks,
-- Fernando



+ make -s 'HOSTCFLAGS=-O2 -g -pipe -Wall -Werror=format-security 
-Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions 
-fstack-protector-strong -grecord-gcc-switches 
-specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 
-specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -fcommon -m64 
-mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection 
-fcf-protection' 'HOSTLDFLAGS=-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now 
-specs=/usr/lib/rpm/redhat/redhat-hardened-ld' ARCH=x86_64 KCFLAGS= 
WITH_GCOV=0 -j4 modules

BUILDSTDERR: In file included from :
BUILDSTDERR: lib/test_lockup.c: In function 'test_lockup_init':
BUILDSTDERR: lib/test_lockup.c:484:31: error: 'spinlock_t' {aka 'struct 
spinlock'} has no member named 'rlock'; did you mean 'lock'?

BUILDSTDERR:   484 |  offsetof(spinlock_t, rlock.magic),
BUILDSTDERR:   |   ^
BUILDSTDERR: ././include/linux/compiler_types.h:135:57: note: in 
definition of macro '__compiler_offsetof'
BUILDSTDERR:   135 | #define __compiler_offsetof(a, b) 
__builtin_offsetof(a, b)
BUILDSTDERR:   | 
 ^
BUILDSTDERR: lib/test_lockup.c:484:10: note: in expansion of macro 
'offsetof'

BUILDSTDERR:   484 |  offsetof(spinlock_t, rlock.magic),
BUILDSTDERR:   |  ^~~~
BUILDSTDERR: ././include/linux/compiler_types.h:135:35: error: 
'rwlock_t' {aka 'struct rt_rw_lock'} has no member named 'magic'
BUILDSTDERR:   135 | #define __compiler_offsetof(a, b) 
__builtin_offsetof(a, b)

BUILDSTDERR:   |   ^~
BUILDSTDERR: ./include/linux/stddef.h:17:32: note: in expansion of macro 
'__compiler_offsetof'
BUILDSTDERR:17 | #define offsetof(TYPE, MEMBER) 
__compiler_offsetof(TYPE, MEMBER)

BUILDSTDERR:   |^~~
BUILDSTDERR: lib/test_lockup.c:487:10: note: in expansion of macro 
'offsetof'

BUILDSTDERR:   487 |  offsetof(rwlock_t, magic),
BUILDSTDERR:   |  ^~~~
BUILDSTDERR: lib/test_lockup.c:488:10: error: 'RWLOCK_MAGIC' undeclared 
(first use in this function); did you mean 'STACK_MAGIC'?

BUILDSTDERR:   488 |  RWLOCK_MAGIC) ||
BUILDSTDERR:   |  ^~~~
BUILDSTDERR:   |  STACK_MAGIC
BUILDSTDERR: lib/test_lockup.c:488:10: note: each undeclared identifier 
is reported only once for each function it appears in

BUILDSTDERR: In file included from :
BUILDSTDERR: ././include/linux/compiler_types.h:135:35: error: 'struct 
mutex' has no member named 'wait_lock'
BUILDSTDERR:   135 | #define __compiler_offsetof(a, b) 
__builtin_offsetof(a, b)

BUILDSTDERR:   |   ^~
BUILDSTDERR: ./include/linux/stddef.h:17:32: note: in expansion of macro 
'__compiler_offsetof'
BUILDSTDERR:17 | #define offsetof(TYPE, MEMBER) 
__compiler_offsetof(TYPE, MEMBER)

BUILDSTDERR:   |^~~
BUILDSTDERR: lib/test_lockup.c:490:10: note: in expansion of macro 
'offsetof'

BUILDSTDERR:   490 |  offsetof(struct mutex, wait_lock.rlock.magic),
BUILDSTDERR:   |  ^~~~
BUILDSTDERR: ././include/linux/compiler_types.h:135:35: error: 'struct 
rw_semaphore' has no member named 'wait_lock'
BUILDSTDERR:   135 | #define __compiler_offsetof(a, b) 
__builtin_offsetof(a, b)

BUILDSTDERR:   |   ^~
BUILDSTDERR: ./include/linux/stddef.h:17:32: note: in expansion of macro 
'__compiler_offsetof'
BUILDSTDERR:17 | #define offsetof(TYPE, MEMBER) 
__compiler_offsetof(TYPE, MEMBER)

BUILDSTDERR:   |^~~
BUILDSTDERR: lib/test_lockup.c:493:10: note: in expansion of macro 
'offsetof'
BUILDSTDERR:   493 |  offsetof(struct rw_semaphore, 
wait_lock.magic),

BUILDSTDERR:   |  ^~~~
BUILDSTDERR: make[1]: *** [scripts/Makefile.build:283: 
lib/test_lockup.o] Error 1

BUILDSTDERR: make: *** [Makefile:1784: lib] Error 2
BUILDSTDERR: make: *** Waiting for unfinished jobs





Changes since v5.9.1-rt17:

   - Update the migrate-disable series by Peter Zijlstra to v3. Include
 also fixes discussed in the thread.

   - UP builds did not boot since the replace of the migrate-disable
 code. Reported by Christian Egger. Fixed as a part of v3 by Peter
 Zijlstra.

   - Rebase the printk code on top of the ringer buffer designed for
 printk which was merged in the v5.10 merge window. Patches by John
 Ogness.

Known issues
  - It has been pointed out that due to changes to the printk

Re: [ANNOUNCE] v4.13.10-rt3 (possible recursive locking warning)

2017-11-03 Thread Fernando Lopez-Lezcano


On 10/27/2017 03:27 PM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.13.10-rt3 patch set.


Thanks!! Wonderful!
I'm seeing this (old Lenovo T510 running Fedora 26):


[   54.942022] 
[   54.942023] WARNING: possible recursive locking detected
[   54.942026] 4.13.10-200.rt3.1.fc26.ccrma.x86_64+rt #1 Not tainted
[   54.942026] 
[   54.942028] csd-sound/1392 is trying to acquire lock:
[   54.942029]  (>wait_lock){-.}, at: [] 
rt_spin_lock_slowunlock+0x4d/0xa0

[   54.942038]
   but task is already holding lock:
[   54.942039]  (>wait_lock){-.}, at: [] 
futex_lock_pi+0x269/0x4b0

[   54.942044]
   other info that might help us debug this:
[   54.942045]  Possible unsafe locking scenario:

[   54.942045]CPU0
[   54.942045]
[   54.942046]   lock(>wait_lock);
[   54.942046]   lock(>wait_lock);
[   54.942047]
*** DEADLOCK ***

[   54.942047]  May be due to missing lock nesting notation

[   54.942048] 1 lock held by csd-sound/1392:
[   54.942049]  #0:  (>wait_lock){-.}, at: 
[] futex_lock_pi+0x269/0x4b0

[   54.942051]
   stack backtrace:
[   54.942053] CPU: 2 PID: 1392 Comm: csd-sound Not tainted 
4.13.10-200.rt3.1.fc26.ccrma.x86_64+rt #1
[   54.942054] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET64WW 
(1.27 ) 07/15/2010

[   54.942055] Call Trace:
[   54.942059]  dump_stack+0x8e/0xd6
[   54.942065]  __lock_acquire+0x72f/0x13b0
[   54.942071]  ? sched_clock+0x9/0x10
[   54.942074]  ? futex_lock_pi+0x269/0x4b0
[   54.942076]  lock_acquire+0xa3/0x250
[   54.942077]  ? lock_acquire+0xa3/0x250
[   54.942079]  ? rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942080]  ? reacquire_held_locks+0xf8/0x180
[   54.942083]  _raw_spin_lock_irqsave+0x4d/0x90
[   54.942084]  ? rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942085]  rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942087]  rt_spin_unlock+0x2a/0x40
[   54.942089]  futex_lock_pi+0x277/0x4b0
[   54.942090]  ? futex_wait_queue_me+0x100/0x170
[   54.942092]  ? futex_wait+0x227/0x250
[   54.942096]  do_futex+0x304/0xc20
[   54.942099]  ? wake_up_new_task+0x1ec/0x370
[   54.942102]  ? _do_fork+0x176/0x750
[   54.942104]  ? up_read+0x2a/0x30
[   54.942106]  SyS_futex+0x13b/0x180
[   54.942110]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[   54.942113]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   54.942116] RIP: 0033:0x7fe500f2d7b2
[   54.942116] RSP: 002b:7ffd13017110 EFLAGS: 0246 ORIG_RAX: 
00ca
[   54.942117] RAX: ffda RBX: 7fe4e7df7700 RCX: 
7fe500f2d7b2
[   54.942118] RDX: 0001 RSI: 0086 RDI: 
557e090dd3f0
[   54.942119] RBP: 7ffd13017280 R08:  R09: 
0001
[   54.942119] R10:  R11: 0246 R12: 

[   54.942120] R13: 7ffd13017210 R14: 7fe4e7df79c0 R15: 




Best,
-- Fernando



Changes since v4.13.10-rt2:

  - A dcache related live lock could occur. The writer could get
preempted within the critical section and the reader would spin to
see the update completed. This update would never complete if the
writer was preempted by a reader with a higher priority. Reported by
Oleg Karfich.

  - The tpm_tis driver can cause latency spikes (~400us) after multiple
writes to the chip is followed by a read operation. This read causes
a flush of all the cached writes to the chip and is blocking the CPU
until the operation completes. Reported and patched by Haris
Okanovic.

  - The upgrade to v4.13-RT broke the zram driver. Patched by Mike
Galbraith.

  - Tom Zanussi's "tracing: Inter-event (e.g. latency) support" patchset
has been update to v3.

  - The static SRCU notifier wasn't compiling with SRCU_TINY. Reported
by kbuild test robot.

Re: [ANNOUNCE] v4.13.10-rt3 (possible recursive locking warning)

2017-11-03 Thread Fernando Lopez-Lezcano


On 10/27/2017 03:27 PM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.13.10-rt3 patch set.


Thanks!! Wonderful!
I'm seeing this (old Lenovo T510 running Fedora 26):


[   54.942022] 
[   54.942023] WARNING: possible recursive locking detected
[   54.942026] 4.13.10-200.rt3.1.fc26.ccrma.x86_64+rt #1 Not tainted
[   54.942026] 
[   54.942028] csd-sound/1392 is trying to acquire lock:
[   54.942029]  (>wait_lock){-.}, at: [] 
rt_spin_lock_slowunlock+0x4d/0xa0

[   54.942038]
   but task is already holding lock:
[   54.942039]  (>wait_lock){-.}, at: [] 
futex_lock_pi+0x269/0x4b0

[   54.942044]
   other info that might help us debug this:
[   54.942045]  Possible unsafe locking scenario:

[   54.942045]CPU0
[   54.942045]
[   54.942046]   lock(>wait_lock);
[   54.942046]   lock(>wait_lock);
[   54.942047]
*** DEADLOCK ***

[   54.942047]  May be due to missing lock nesting notation

[   54.942048] 1 lock held by csd-sound/1392:
[   54.942049]  #0:  (>wait_lock){-.}, at: 
[] futex_lock_pi+0x269/0x4b0

[   54.942051]
   stack backtrace:
[   54.942053] CPU: 2 PID: 1392 Comm: csd-sound Not tainted 
4.13.10-200.rt3.1.fc26.ccrma.x86_64+rt #1
[   54.942054] Hardware name: LENOVO 4313CTO/4313CTO, BIOS 6MET64WW 
(1.27 ) 07/15/2010

[   54.942055] Call Trace:
[   54.942059]  dump_stack+0x8e/0xd6
[   54.942065]  __lock_acquire+0x72f/0x13b0
[   54.942071]  ? sched_clock+0x9/0x10
[   54.942074]  ? futex_lock_pi+0x269/0x4b0
[   54.942076]  lock_acquire+0xa3/0x250
[   54.942077]  ? lock_acquire+0xa3/0x250
[   54.942079]  ? rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942080]  ? reacquire_held_locks+0xf8/0x180
[   54.942083]  _raw_spin_lock_irqsave+0x4d/0x90
[   54.942084]  ? rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942085]  rt_spin_lock_slowunlock+0x4d/0xa0
[   54.942087]  rt_spin_unlock+0x2a/0x40
[   54.942089]  futex_lock_pi+0x277/0x4b0
[   54.942090]  ? futex_wait_queue_me+0x100/0x170
[   54.942092]  ? futex_wait+0x227/0x250
[   54.942096]  do_futex+0x304/0xc20
[   54.942099]  ? wake_up_new_task+0x1ec/0x370
[   54.942102]  ? _do_fork+0x176/0x750
[   54.942104]  ? up_read+0x2a/0x30
[   54.942106]  SyS_futex+0x13b/0x180
[   54.942110]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[   54.942113]  entry_SYSCALL_64_fastpath+0x1f/0xbe
[   54.942116] RIP: 0033:0x7fe500f2d7b2
[   54.942116] RSP: 002b:7ffd13017110 EFLAGS: 0246 ORIG_RAX: 
00ca
[   54.942117] RAX: ffda RBX: 7fe4e7df7700 RCX: 
7fe500f2d7b2
[   54.942118] RDX: 0001 RSI: 0086 RDI: 
557e090dd3f0
[   54.942119] RBP: 7ffd13017280 R08:  R09: 
0001
[   54.942119] R10:  R11: 0246 R12: 

[   54.942120] R13: 7ffd13017210 R14: 7fe4e7df79c0 R15: 




Best,
-- Fernando



Changes since v4.13.10-rt2:

  - A dcache related live lock could occur. The writer could get
preempted within the critical section and the reader would spin to
see the update completed. This update would never complete if the
writer was preempted by a reader with a higher priority. Reported by
Oleg Karfich.

  - The tpm_tis driver can cause latency spikes (~400us) after multiple
writes to the chip is followed by a read operation. This read causes
a flush of all the cached writes to the chip and is blocking the CPU
until the operation completes. Reported and patched by Haris
Okanovic.

  - The upgrade to v4.13-RT broke the zram driver. Patched by Mike
Galbraith.

  - Tom Zanussi's "tracing: Inter-event (e.g. latency) support" patchset
has been update to v3.

  - The static SRCU notifier wasn't compiling with SRCU_TINY. Reported
by kbuild test robot.

Re: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls

2015-08-06 Thread Fernando Lopez-Lezcano


On 07/25/2015 03:32 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.1.3-rt3 patch set.

...

I've had a few hangs with nothing left behind to debug... but today I 
find this:


(NOTE: I'm attaching a file with the details, I don't know if my mailer 
will mangled these lines)



Aug  5 10:46:18 localhost kernel: [ 2343.673560] WARNING: CPU: 3 PID: 43 
at net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: [ 2343.673561] NETDEV WATCHDOG: eth1 
(e1000e): transmit queue 0 timed out



and then:


Aug  5 10:46:18 localhost kernel: [ 2343.673679] e1000e :04:00.0 
eth1: Reset adapter unexpectedly
Aug  5 10:46:30 localhost kernel: [ 2355.706987] ata5.00: exception 
Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6 frozen
Aug  5 10:46:30 localhost kernel: [ 2355.706990] ata5: SError: { HostInt 
10B8B }
Aug  5 10:46:30 localhost kernel: [ 2355.707003] ata5.00: cmd 
a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug  5 10:46:30 localhost kernel: [ 2355.707003]  Get event 
status notification 4a 01 00 00 10 00 00 00 08 00res 
40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout)

Aug  5 10:46:30 localhost kernel: [ 2355.707005] ata5.00: status: { DRDY }
Aug  5 10:46:30 localhost kernel: [ 2355.707007] ata5: hard resetting link


same one but later in the log:


Aug  5 10:46:18 localhost kernel: WARNING: CPU: 3 PID: 43 at 
net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: NETDEV WATCHDOG: eth1 (e1000e): 
transmit queue 0 timed out



Things apparently keep working and then:


Aug  5 11:58:36 localhost kernel: [ 6678.122596] Network Receive[2409]: 
segfault at 28 ip 003c4c293ca9 sp 7fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c20+1b4000]
Aug  5 11:58:36 localhost kernel: Network Receive[2409]: segfault at 28 
ip 003c4c293ca9 sp 7fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c20+1b4000]
Aug  5 11:58:36 localhost kernel: timekeeping watchdog: Marking 
clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: 	'hpet' wd_now: 47ebf654 wd_last: 
c0debfe6 mask: 
Aug  5 11:58:36 localhost kernel: 	'tsc' cs_now: 154f6e564f7d cs_last: 
7784d315c59 mask: 

Aug  5 11:58:36 localhost systemd: Starting dnf makecache...
Aug  5 11:58:36 localhost kernel: [ 6678.123233] timekeeping watchdog: 
Marking clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: [ 6678.123237] 	'hpet' wd_now: 
47ebf654 wd_last: c0debfe6 mask: 
Aug  5 11:58:36 localhost kernel: [ 6678.123238] 	'tsc' cs_now: 
154f6e564f7d cs_last: 7784d315c59 mask: 
Aug  5 11:58:36 localhost kernel: [ 6678.146207] Switched to clocksource 
hpet

Aug  5 11:58:36 localhost kernel: Switched to clocksource hpet
Aug  5 11:58:36 localhost kernel: [ 6678.150087] BUG: unable to handle 
kernel NULL pointer dereference at 0ea0
Aug  5 11:58:36 localhost kernel: [ 6678.150097] IP: 
[] nfs40_discover_server_trunking+0x5e/0x110 [nfsv4]
Aug  5 11:58:36 localhost kernel: [ 6678.150098] PGD 7f3c83067 PUD 
7f46fb067 PMD 0
Aug  5 11:58:36 localhost kernel: [ 6678.150099] Oops:  [#1] PREEMPT 
SMP



And eventually (later) get a ton of these:


Aug  5 11:59:36 localhost kernel: [ 6738.107181] INFO: rcu_preempt 
detected stalls on CPUs/tasks: {} (detected by 3, t=60002 jiffies, 
g=37092, c=37091, q=0)
Aug  5 11:59:36 localhost kernel: [ 6738.107183] All QSes seen, last 
rcu_preempt kthread activity 1 (4301410925-4301410924), 
jiffies_till_next_fqs=3, root ->qsmask 0x0



So something is left in a not good state...

-- Fernando


messages.gz
Description: GNU Zip compressed data

Re: [ANNOUNCE] 4.1.3-rt3 - xmit queue timeout, oops, rcu stalls

2015-08-06 Thread Fernando Lopez-Lezcano


On 07/25/2015 03:32 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.1.3-rt3 patch set.

...

I've had a few hangs with nothing left behind to debug... but today I 
find this:


(NOTE: I'm attaching a file with the details, I don't know if my mailer 
will mangled these lines)



Aug  5 10:46:18 localhost kernel: [ 2343.673560] WARNING: CPU: 3 PID: 43 
at net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: [ 2343.673561] NETDEV WATCHDOG: eth1 
(e1000e): transmit queue 0 timed out



and then:


Aug  5 10:46:18 localhost kernel: [ 2343.673679] e1000e :04:00.0 
eth1: Reset adapter unexpectedly
Aug  5 10:46:30 localhost kernel: [ 2355.706987] ata5.00: exception 
Emask 0x40 SAct 0x0 SErr 0x80800 action 0x6 frozen
Aug  5 10:46:30 localhost kernel: [ 2355.706990] ata5: SError: { HostInt 
10B8B }
Aug  5 10:46:30 localhost kernel: [ 2355.707003] ata5.00: cmd 
a0/00:00:00:08:00/00:00:00:00:00/a0 tag 0 pio 16392 in
Aug  5 10:46:30 localhost kernel: [ 2355.707003]  Get event 
status notification 4a 01 00 00 10 00 00 00 08 00res 
40/00:03:00:00:00/00:00:00:00:00/a0 Emask 0x44 (timeout)

Aug  5 10:46:30 localhost kernel: [ 2355.707005] ata5.00: status: { DRDY }
Aug  5 10:46:30 localhost kernel: [ 2355.707007] ata5: hard resetting link


same one but later in the log:


Aug  5 10:46:18 localhost kernel: WARNING: CPU: 3 PID: 43 at 
net/sched/sch_generic.c:303 dev_watchdog+0x26f/0x280()
Aug  5 10:46:18 localhost kernel: NETDEV WATCHDOG: eth1 (e1000e): 
transmit queue 0 timed out



Things apparently keep working and then:


Aug  5 11:58:36 localhost kernel: [ 6678.122596] Network Receive[2409]: 
segfault at 28 ip 003c4c293ca9 sp 7fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c20+1b4000]
Aug  5 11:58:36 localhost kernel: Network Receive[2409]: segfault at 28 
ip 003c4c293ca9 sp 7fb6f64dbb58 error 6 in 
libc-2.18.so[3c4c20+1b4000]
Aug  5 11:58:36 localhost kernel: timekeeping watchdog: Marking 
clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: 	'hpet' wd_now: 47ebf654 wd_last: 
c0debfe6 mask: 
Aug  5 11:58:36 localhost kernel: 	'tsc' cs_now: 154f6e564f7d cs_last: 
7784d315c59 mask: 

Aug  5 11:58:36 localhost systemd: Starting dnf makecache...
Aug  5 11:58:36 localhost kernel: [ 6678.123233] timekeeping watchdog: 
Marking clocksource 'tsc' as unstable, because the skew is too large:
Aug  5 11:58:36 localhost kernel: [ 6678.123237] 	'hpet' wd_now: 
47ebf654 wd_last: c0debfe6 mask: 
Aug  5 11:58:36 localhost kernel: [ 6678.123238] 	'tsc' cs_now: 
154f6e564f7d cs_last: 7784d315c59 mask: 
Aug  5 11:58:36 localhost kernel: [ 6678.146207] Switched to clocksource 
hpet

Aug  5 11:58:36 localhost kernel: Switched to clocksource hpet
Aug  5 11:58:36 localhost kernel: [ 6678.150087] BUG: unable to handle 
kernel NULL pointer dereference at 0ea0
Aug  5 11:58:36 localhost kernel: [ 6678.150097] IP: 
[a05d922e] nfs40_discover_server_trunking+0x5e/0x110 [nfsv4]
Aug  5 11:58:36 localhost kernel: [ 6678.150098] PGD 7f3c83067 PUD 
7f46fb067 PMD 0
Aug  5 11:58:36 localhost kernel: [ 6678.150099] Oops:  [#1] PREEMPT 
SMP



And eventually (later) get a ton of these:


Aug  5 11:59:36 localhost kernel: [ 6738.107181] INFO: rcu_preempt 
detected stalls on CPUs/tasks: {} (detected by 3, t=60002 jiffies, 
g=37092, c=37091, q=0)
Aug  5 11:59:36 localhost kernel: [ 6738.107183] All QSes seen, last 
rcu_preempt kthread activity 1 (4301410925-4301410924), 
jiffies_till_next_fqs=3, root -qsmask 0x0



So something is left in a not good state...

-- Fernando


messages.gz
Description: GNU Zip compressed data

Re: [ANNOUNCE] 4.0.4-rt1

2015-06-09 Thread Fernando Lopez-Lezcano


On 06/09/2015 03:05 PM, Pavel Vasilyev wrote:

09.06.2015 19:45, Fernando Lopez-Lezcano пишет:


This is still happening, about once a day. John Dulaney help me set up a
crash kernel dump (thanks!) so now I have a kernel core dump for this
one,


Asus,Fedora,CGROUPS, iptables,snd_ac97,radeon,raid1,kvm - this realtime
system? :D


:-P

Yup. I have been using rt for many years - and packaging it - for (very) 
low latency sound processing. Linux + rt + jackd + rtirq + threaded irqs 
+ jack clients, everything with the right priorities. Runs very nicely 
unless you hit an issue like the one I'm asking about[*]. Usually 
running snd_hdspm with RME hardware when in concert situations (this one 
with Asus mobo is my - quite old by now - desktop at work, but I'm also 
having the same problem in my Lenovo laptop).


-- Fernando

[*] for example a whole concert for a 24.8 3D sound system with a remote 
ethernet driven D/A and running all the time with 64 frame x 2 buffers 
at 48KHz.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 4.0.4-rt1

2015-06-09 Thread Fernando Lopez-Lezcano


On 05/28/2015 06:56 PM, Fernando Lopez-Lezcano wrote:

Oh well. Second time the machine hangs in two days in the same way
(otherwise very stable running 3.18.x-rty)

(this is a bumblebee + bbswitch graphics laptop - argh, if I had known
better...)


May 28 18:49:21 localhost kernel: [ cut here ]
May 28 18:49:21 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
May 28 18:49:21 localhost kernel: invalid opcode:  [#1] PREEMPT SMP

...




This is still happening, about once a day. John Dulaney help me set up a 
crash kernel dump (thanks!) so now I have a kernel core dump for this 
one, I'll try to post more details tomorrow. Let me know if there is 
anything in particular I should look at.


I'm attaching another backtrace on a different machine (older desktop), 
slightly different but same end result.


-- Fernando
Jun  9 03:49:19 localhost kernel: [ cut here ]
Jun  9 03:49:19 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
Jun  9 03:49:19 localhost kernel: invalid opcode:  [#1] PREEMPT SMP 
Jun  9 03:49:19 localhost kernel: Modules linked in: bnep bluetooth rfkill fuse 
tun act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb 
sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT 
nf_log_ipv4 nf_log_common xt_LOG xt_connlimit xt_realm xt_addrtype xt_comment 
xt_recent xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 ipt_ECN ipt_CLUSTERIP 
ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip 
nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda 
ts_kmp nf_conntrack_amanda nf_conntrack_sane ebtable_nat nf_conntrack_tftp 
ebtables nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp 
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc 
nf_conntrack_h323
Jun  9 03:49:19 localhost kernel: nf_conntrack_ftp xt_TPROXY xt_time xt_TCPMSS 
xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev br_netfilter bridge stp llc 
xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark 
xt_CLASSIFY xt_AUDIT xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle nfnetlink nfsv3 nfs_acl 
auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack 
ip6table_filter ip6_tables w83627ehf hwmon_vid iTCO_wdt iTCO_vendor_support 
gpio_ich raid1 coretemp kvm_intel kvm hid_logitech_hidpp serio_raw snd_ice1712 
snd_cs8427 snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_mpu401_uart
Jun  9 03:49:19 localhost kernel: snd_rawmidi snd_ac97_codec ac97_bus snd_seq 
snd_seq_device snd_pcm snd_timer snd lpc_ich soundcore i2c_i801 mfd_core shpchp 
acpi_cpufreq binfmt_misc hid_logitech_dj ata_generic firewire_ohci radeon 
pata_acpi firewire_core crc_itu_t sata_sil24 i2c_algo_bit drm_kms_helper sky2 
r8169 pata_marvell mii ttm drm
Jun  9 03:49:19 localhost kernel: CPU: 2 PID: 10424 Comm: prelink Not tainted 
4.0.4-100.rt1.3.fc20.ccrma.x86_64+rt #1
Jun  9 03:49:19 localhost kernel: Hardware name: System manufacturer 
P5K/EPU/P5K/EPU, BIOS 060406/19/2008
Jun  9 03:49:19 localhost kernel: task: 8802225e1520 ti: 8801256c8000 
task.ti: 8801256c8000
Jun  9 03:49:19 localhost kernel: RIP: 0010:[]  
[] mem_cgroup_swapout+0x102/0x110
Jun  9 03:49:19 localhost kernel: RSP: 0018:8801256cb288  EFLAGS: 00010202
Jun  9 03:49:19 localhost kernel: RAX: 0246 RBX: ea0008be7ec0 
RCX: 
Jun  9 03:49:19 localhost kernel: RDX:  RSI: 8801256cb248 
RDI: 820236d0
Jun  9 03:49:19 localhost kernel: RBP: 8801256cb298 R08: ea0008be7ee0 
R09: 8801256cb498
Jun  9 03:49:19 localhost kernel: R10: 8801256cbfd8 R11: 0002 
R12: 880227013800
Jun  9 03:49:19 localhost kernel: R13: 81c652b8 R14: 0001 
R15: 81c652a0
Jun  9 03:49:19 localhost kernel: FS:  01ad8900(0063) 
GS:88022fd0() knlGS:
Jun  9 03:49:19 localhost kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jun  9 03:49:19 localhost kernel: CR2: 02131878 CR3: 00014a4f3000 
CR4: 07e0
Jun  9 03:49:19 localhost kernel: Stack:
Jun  9 03:49:19 localhost kernel: ea0008be7ec0 000c2a0d 
8801256cb2d8 811b60c7
Jun  9 03:49:19 localhost kernel:  8801256cb738 
ea0008be7ec0 8801256cb4b0
Jun  9 03:49:19 localhost kernel: ea0008be7ee0 81c652a0 
8801256cb418 811b924f
Jun  9 03:49:19 localhost kernel: Call Trace:
Jun  9 03:49:19 localhost kernel: [] 
__remove_mapping+0x107/0x180
Jun  9 03:49:19 localhost kernel: [] 
shrink_page_list+0x7df/0xb50
Jun  9 03:49:19 localhost kernel

Re: [ANNOUNCE] 4.0.4-rt1

2015-06-09 Thread Fernando Lopez-Lezcano


On 05/28/2015 06:56 PM, Fernando Lopez-Lezcano wrote:

Oh well. Second time the machine hangs in two days in the same way
(otherwise very stable running 3.18.x-rty)

(this is a bumblebee + bbswitch graphics laptop - argh, if I had known
better...)


May 28 18:49:21 localhost kernel: [ cut here ]
May 28 18:49:21 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
May 28 18:49:21 localhost kernel: invalid opcode:  [#1] PREEMPT SMP

...




This is still happening, about once a day. John Dulaney help me set up a 
crash kernel dump (thanks!) so now I have a kernel core dump for this 
one, I'll try to post more details tomorrow. Let me know if there is 
anything in particular I should look at.


I'm attaching another backtrace on a different machine (older desktop), 
slightly different but same end result.


-- Fernando
Jun  9 03:49:19 localhost kernel: [ cut here ]
Jun  9 03:49:19 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
Jun  9 03:49:19 localhost kernel: invalid opcode:  [#1] PREEMPT SMP 
Jun  9 03:49:19 localhost kernel: Modules linked in: bnep bluetooth rfkill fuse 
tun act_police cls_basic cls_flow cls_fw cls_u32 sch_tbf sch_prio sch_htb 
sch_hfsc sch_ingress sch_sfq xt_CHECKSUM ipt_rpfilter xt_statistic xt_CT 
nf_log_ipv4 nf_log_common xt_LOG xt_connlimit xt_realm xt_addrtype xt_comment 
xt_recent xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 ipt_ECN ipt_CLUSTERIP 
ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp nf_nat_sip 
nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda 
ts_kmp nf_conntrack_amanda nf_conntrack_sane ebtable_nat nf_conntrack_tftp 
ebtables nf_conntrack_sip nf_conntrack_proto_udplite nf_conntrack_proto_sctp 
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc 
nf_conntrack_h323
Jun  9 03:49:19 localhost kernel: nf_conntrack_ftp xt_TPROXY xt_time xt_TCPMSS 
xt_tcpmss xt_sctp xt_policy xt_pkttype xt_physdev br_netfilter bridge stp llc 
xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_connmark 
xt_CLASSIFY xt_AUDIT xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle nfnetlink nfsv3 nfs_acl 
auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc fscache ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack 
ip6table_filter ip6_tables w83627ehf hwmon_vid iTCO_wdt iTCO_vendor_support 
gpio_ich raid1 coretemp kvm_intel kvm hid_logitech_hidpp serio_raw snd_ice1712 
snd_cs8427 snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda snd_mpu401_uart
Jun  9 03:49:19 localhost kernel: snd_rawmidi snd_ac97_codec ac97_bus snd_seq 
snd_seq_device snd_pcm snd_timer snd lpc_ich soundcore i2c_i801 mfd_core shpchp 
acpi_cpufreq binfmt_misc hid_logitech_dj ata_generic firewire_ohci radeon 
pata_acpi firewire_core crc_itu_t sata_sil24 i2c_algo_bit drm_kms_helper sky2 
r8169 pata_marvell mii ttm drm
Jun  9 03:49:19 localhost kernel: CPU: 2 PID: 10424 Comm: prelink Not tainted 
4.0.4-100.rt1.3.fc20.ccrma.x86_64+rt #1
Jun  9 03:49:19 localhost kernel: Hardware name: System manufacturer 
P5K/EPU/P5K/EPU, BIOS 060406/19/2008
Jun  9 03:49:19 localhost kernel: task: 8802225e1520 ti: 8801256c8000 
task.ti: 8801256c8000
Jun  9 03:49:19 localhost kernel: RIP: 0010:[812112c2]  
[812112c2] mem_cgroup_swapout+0x102/0x110
Jun  9 03:49:19 localhost kernel: RSP: 0018:8801256cb288  EFLAGS: 00010202
Jun  9 03:49:19 localhost kernel: RAX: 0246 RBX: ea0008be7ec0 
RCX: 
Jun  9 03:49:19 localhost kernel: RDX:  RSI: 8801256cb248 
RDI: 820236d0
Jun  9 03:49:19 localhost kernel: RBP: 8801256cb298 R08: ea0008be7ee0 
R09: 8801256cb498
Jun  9 03:49:19 localhost kernel: R10: 8801256cbfd8 R11: 0002 
R12: 880227013800
Jun  9 03:49:19 localhost kernel: R13: 81c652b8 R14: 0001 
R15: 81c652a0
Jun  9 03:49:19 localhost kernel: FS:  01ad8900(0063) 
GS:88022fd0() knlGS:
Jun  9 03:49:19 localhost kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Jun  9 03:49:19 localhost kernel: CR2: 02131878 CR3: 00014a4f3000 
CR4: 07e0
Jun  9 03:49:19 localhost kernel: Stack:
Jun  9 03:49:19 localhost kernel: ea0008be7ec0 000c2a0d 
8801256cb2d8 811b60c7
Jun  9 03:49:19 localhost kernel:  8801256cb738 
ea0008be7ec0 8801256cb4b0
Jun  9 03:49:19 localhost kernel: ea0008be7ee0 81c652a0 
8801256cb418 811b924f
Jun  9 03:49:19 localhost kernel: Call Trace:
Jun  9 03:49:19 localhost kernel: [811b60c7] 
__remove_mapping+0x107/0x180
Jun  9 03:49:19 localhost kernel: [811b924f] 
shrink_page_list

Re: [ANNOUNCE] 4.0.4-rt1

2015-06-09 Thread Fernando Lopez-Lezcano


On 06/09/2015 03:05 PM, Pavel Vasilyev wrote:

09.06.2015 19:45, Fernando Lopez-Lezcano пишет:


This is still happening, about once a day. John Dulaney help me set up a
crash kernel dump (thanks!) so now I have a kernel core dump for this
one,


Asus,Fedora,CGROUPS, iptables,snd_ac97,radeon,raid1,kvm - this realtime
system? :D


:-P

Yup. I have been using rt for many years - and packaging it - for (very) 
low latency sound processing. Linux + rt + jackd + rtirq + threaded irqs 
+ jack clients, everything with the right priorities. Runs very nicely 
unless you hit an issue like the one I'm asking about[*]. Usually 
running snd_hdspm with RME hardware when in concert situations (this one 
with Asus mobo is my - quite old by now - desktop at work, but I'm also 
having the same problem in my Lenovo laptop).


-- Fernando

[*] for example a whole concert for a 24.8 3D sound system with a remote 
ethernet driven D/A and running all the time with 64 frame x 2 buffers 
at 48KHz.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-28 Thread Fernando Lopez-Lezcano


On 05/26/2015 12:41 PM, Fernando Lopez-Lezcano wrote:

On 05/26/2015 08:43 AM, Clark Williams wrote:

On Tue, 26 May 2015 11:19:24 -0400
Steven Rostedt  wrote:


On Tue, 26 May 2015 08:48:02 -0500
Clark Williams  wrote:



Change the WARN_ON to WARN_ON_NORT


Do we have a WARN_ON_NORT? I see a WARN_ON_NONRT, but not a
WARN_ON_NORT. Does this compile?

-- Steve


Sigh. Of course not. Reupdated patch (and yes this one compiles):


Thanks! Seems to have fixed the problem (of course!)
So far so good and nothing weird in the output of dmesg


Oh well. Second time the machine hangs in two days in the same way 
(otherwise very stable running 3.18.x-rty)


(this is a bumblebee + bbswitch graphics laptop - argh, if I had known 
better...)



May 28 18:49:21 localhost kernel: [ cut here ]
May 28 18:49:21 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
May 28 18:49:21 localhost kernel: invalid opcode:  [#1] PREEMPT SMP
May 28 18:49:21 localhost kernel: Modules linked in: ccm rfcomm fuse 
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT 
nf_reje\
ct_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_fi\
lter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep 
bbswitch(OE) vfat fat iTCO_wdt iTCO_vendor_support arc4 intel\
_rapl iosf_mbi coretemp kvm_intel kvm uvcvideo crct10dif_pclmul 
videobuf2_vmalloc crc32_pclmul crc32c_intel videobuf2_core 
videobuf2_memops ghash_clmulni_intel v4l2_common iwlmvm videodev media 
mac80211 \

btusb bluetooth iwlwifi
May 28 18:49:21 localhost kernel: snd_hda_codec_realtek 
snd_hda_codec_hdmi lpc_ich snd_hda_codec_generic cfg80211 mfd_core 
i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep 
snd_seq snd_se\
q_device snd_pcm thinkpad_acpi snd_timer rfkill tpm_tis snd mei_me 
ie31200_edac tpm mei edac_core soundcore shpchp nfsd auth_rpcgss nfs_acl 
lockd grace sunrpc serio_raw i915 sdhci_pci i2c_algo_bit sdhci \

e1000e drm_kms_helper mmc_core ptp drm pps_core wmi video
May 28 18:49:21 localhost kernel: CPU: 4 PID: 134 Comm: kswapd0 Tainted: 
G   OE   4.0.4-201.rt1.3.fc21.ccrma.x86_64+rt #1
May 28 18:49:21 localhost kernel: Hardware name: LENOVO 
20BGCTO1WW/20BGCTO1WW, BIOS GNET65WW (2.13 ) 06/20/2014
May 28 18:49:21 localhost kernel: task: 88046650e9a0 ti: 
88046657c000 task.ti: 88046657c000
May 28 18:49:21 localhost kernel: RIP: 0010:[] 
[] mem_cgroup_swapout+0x118/0x120
May 28 18:49:21 localhost kernel: RSP: 0018:88046657f998  EFLAGS: 
00010202
May 28 18:49:21 localhost kernel: RAX: 0246 RBX: 
ea0011150e80 RCX: 
May 28 18:49:21 localhost kernel: RDX: 0006a980 RSI: 
0001 RDI: 88046d413800
May 28 18:49:21 localhost kernel: RBP: 88046657f9a8 R08: 
81c68700 R09: 88046657fba8
May 28 18:49:21 localhost kernel: R10: 000a R11: 
88046657ffd8 R12: 88046d413800
May 28 18:49:21 localhost kernel: R13: 81c68718 R14: 
0001 R15: 88046657faa8
May 28 18:49:21 localhost kernel: FS:  () 
GS:88046da0() knlGS:
May 28 18:49:21 localhost kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 28 18:49:21 localhost kernel: CR2: 7efd07eb8000 CR3: 
01c0e000 CR4: 001407e0

May 28 18:49:21 localhost kernel: Stack:
May 28 18:49:21 localhost kernel: ea0011150e80 00196218 
88046657f9e8 811b8d8f
May 28 18:49:21 localhost kernel:  88046657fe48 
ea0011150e80 88046657fbc0
May 28 18:49:21 localhost kernel: ea0011150ea0 88046657faa8 
88046657fb28 811bba3f

May 28 18:49:21 localhost kernel: Call Trace:
May 28 18:49:21 localhost kernel: [] 
__remove_mapping+0x12f/0x1a0
May 28 18:49:21 localhost kernel: [] 
shrink_page_list+0x5ef/0xc30
May 28 18:49:21 localhost kernel: [] 
shrink_inactive_list+0x1e9/0x630
May 28 18:49:21 localhost kernel: [] 
shrink_lruvec+0x62c/0x830
May 28 18:49:21 localhost kernel: [] ? 
__switch_to+0x150/0x610
May 28 18:49:21 localhost kernel: [] 
shrink_zone+0xf4/0x2d0

May 28 18:49:21 localhost kernel: [] kswapd+0x587/0xa80
May 28 18:49:21 localhost kernel: [] ? 
mem_cgroup_shrink_node_zone+0x1f0/0x1f0

May 28 18:49:21 localhost kernel: [] kthread+0xca/0xe0
May 28 18:49:21 localhost kernel: [] ? 
kthread_worker_fn+0x180/0x180
May 28 18:49:21 localhost kernel: [] 
ret_from_fork+0x58/0x90
May 28 18:49:21 localhost kernel: [] ? 
kthread_worker_fn+0x180/0x180
May 28 18:49:21 localhost kernel: Code: a6 81 48 89 df e8 a9 ce fb ff 0f 
0b 0f 1f 80 00 00 00 00 48 c7 c6 f3 b7 a6 81 48 89 df e8 91 ce fb ff 0f 
0b 0f 1f 80 00 00 00 00 <0f> 0b 66 0f 1f 44 00 00

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-28 Thread Fernando Lopez-Lezcano


On 05/26/2015 12:41 PM, Fernando Lopez-Lezcano wrote:

On 05/26/2015 08:43 AM, Clark Williams wrote:

On Tue, 26 May 2015 11:19:24 -0400
Steven Rostedt rost...@goodmis.org wrote:


On Tue, 26 May 2015 08:48:02 -0500
Clark Williams willi...@redhat.com wrote:



Change the WARN_ON to WARN_ON_NORT


Do we have a WARN_ON_NORT? I see a WARN_ON_NONRT, but not a
WARN_ON_NORT. Does this compile?

-- Steve


Sigh. Of course not. Reupdated patch (and yes this one compiles):


Thanks! Seems to have fixed the problem (of course!)
So far so good and nothing weird in the output of dmesg


Oh well. Second time the machine hangs in two days in the same way 
(otherwise very stable running 3.18.x-rty)


(this is a bumblebee + bbswitch graphics laptop - argh, if I had known 
better...)



May 28 18:49:21 localhost kernel: [ cut here ]
May 28 18:49:21 localhost kernel: kernel BUG at mm/memcontrol.c:5848!
May 28 18:49:21 localhost kernel: invalid opcode:  [#1] PREEMPT SMP
May 28 18:49:21 localhost kernel: Modules linked in: ccm rfcomm fuse 
xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun 
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT 
nf_reje\
ct_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc 
ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_fi\
lter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep 
bbswitch(OE) vfat fat iTCO_wdt iTCO_vendor_support arc4 intel\
_rapl iosf_mbi coretemp kvm_intel kvm uvcvideo crct10dif_pclmul 
videobuf2_vmalloc crc32_pclmul crc32c_intel videobuf2_core 
videobuf2_memops ghash_clmulni_intel v4l2_common iwlmvm videodev media 
mac80211 \

btusb bluetooth iwlwifi
May 28 18:49:21 localhost kernel: snd_hda_codec_realtek 
snd_hda_codec_hdmi lpc_ich snd_hda_codec_generic cfg80211 mfd_core 
i2c_i801 snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep 
snd_seq snd_se\
q_device snd_pcm thinkpad_acpi snd_timer rfkill tpm_tis snd mei_me 
ie31200_edac tpm mei edac_core soundcore shpchp nfsd auth_rpcgss nfs_acl 
lockd grace sunrpc serio_raw i915 sdhci_pci i2c_algo_bit sdhci \

e1000e drm_kms_helper mmc_core ptp drm pps_core wmi video
May 28 18:49:21 localhost kernel: CPU: 4 PID: 134 Comm: kswapd0 Tainted: 
G   OE   4.0.4-201.rt1.3.fc21.ccrma.x86_64+rt #1
May 28 18:49:21 localhost kernel: Hardware name: LENOVO 
20BGCTO1WW/20BGCTO1WW, BIOS GNET65WW (2.13 ) 06/20/2014
May 28 18:49:21 localhost kernel: task: 88046650e9a0 ti: 
88046657c000 task.ti: 88046657c000
May 28 18:49:21 localhost kernel: RIP: 0010:[812164b8] 
[812164b8] mem_cgroup_swapout+0x118/0x120
May 28 18:49:21 localhost kernel: RSP: 0018:88046657f998  EFLAGS: 
00010202
May 28 18:49:21 localhost kernel: RAX: 0246 RBX: 
ea0011150e80 RCX: 
May 28 18:49:21 localhost kernel: RDX: 0006a980 RSI: 
0001 RDI: 88046d413800
May 28 18:49:21 localhost kernel: RBP: 88046657f9a8 R08: 
81c68700 R09: 88046657fba8
May 28 18:49:21 localhost kernel: R10: 000a R11: 
88046657ffd8 R12: 88046d413800
May 28 18:49:21 localhost kernel: R13: 81c68718 R14: 
0001 R15: 88046657faa8
May 28 18:49:21 localhost kernel: FS:  () 
GS:88046da0() knlGS:
May 28 18:49:21 localhost kernel: CS:  0010 DS:  ES:  CR0: 
80050033
May 28 18:49:21 localhost kernel: CR2: 7efd07eb8000 CR3: 
01c0e000 CR4: 001407e0

May 28 18:49:21 localhost kernel: Stack:
May 28 18:49:21 localhost kernel: ea0011150e80 00196218 
88046657f9e8 811b8d8f
May 28 18:49:21 localhost kernel:  88046657fe48 
ea0011150e80 88046657fbc0
May 28 18:49:21 localhost kernel: ea0011150ea0 88046657faa8 
88046657fb28 811bba3f

May 28 18:49:21 localhost kernel: Call Trace:
May 28 18:49:21 localhost kernel: [811b8d8f] 
__remove_mapping+0x12f/0x1a0
May 28 18:49:21 localhost kernel: [811bba3f] 
shrink_page_list+0x5ef/0xc30
May 28 18:49:21 localhost kernel: [811bc709] 
shrink_inactive_list+0x1e9/0x630
May 28 18:49:21 localhost kernel: [811bd50c] 
shrink_lruvec+0x62c/0x830
May 28 18:49:21 localhost kernel: [81014610] ? 
__switch_to+0x150/0x610
May 28 18:49:21 localhost kernel: [811bd804] 
shrink_zone+0xf4/0x2d0

May 28 18:49:21 localhost kernel: [811bec37] kswapd+0x587/0xa80
May 28 18:49:21 localhost kernel: [811be6b0] ? 
mem_cgroup_shrink_node_zone+0x1f0/0x1f0

May 28 18:49:21 localhost kernel: [810bf4ba] kthread+0xca/0xe0
May 28 18:49:21 localhost kernel: [810bf3f0] ? 
kthread_worker_fn+0x180/0x180
May 28 18:49:21 localhost kernel: [817a2098] 
ret_from_fork+0x58/0x90
May 28 18:49:21 localhost

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-26 Thread Fernando Lopez-Lezcano


On 05/26/2015 08:43 AM, Clark Williams wrote:

On Tue, 26 May 2015 11:19:24 -0400
Steven Rostedt  wrote:


On Tue, 26 May 2015 08:48:02 -0500
Clark Williams  wrote:



Change the WARN_ON to WARN_ON_NORT


Do we have a WARN_ON_NORT? I see a WARN_ON_NONRT, but not a
WARN_ON_NORT. Does this compile?

-- Steve


Sigh. Of course not. Reupdated patch (and yes this one compiles):


Thanks! Seems to have fixed the problem (of course!)
So far so good and nothing weird in the output of dmesg
-- Fernando




From: Clark Williams 
Date: Thu, 21 May 2015 12:51:53 -0500
Subject: [PATCH] [rt] i915: bogus warning from i915 when running on PREEMPT_RT

The i915 driver has a 'WARN_ON(!in_interrupt())' in the display
handler, which whines constanly on the RT kernel (since the interrupt
is actually handled in a threaded handler and not actual interrupt
context).

Change the WARN_ON to WARN_ON_NORT

Signed-off-by: Clark Williams 
---
  drivers/gpu/drm/i915/intel_display.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index f75173c20f47..30b1d16caa0d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9745,7 +9745,7 @@ void intel_check_page_flip(struct drm_device *dev, int 
pipe)
struct drm_crtc *crtc = dev_priv->pipe_to_crtc_mapping[pipe];
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);

-   WARN_ON(!in_interrupt());
+   WARN_ON_NONRT(!in_interrupt());

if (crtc == NULL)
return;



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-26 Thread Fernando Lopez-Lezcano


On 05/26/2015 08:43 AM, Clark Williams wrote:

On Tue, 26 May 2015 11:19:24 -0400
Steven Rostedt rost...@goodmis.org wrote:


On Tue, 26 May 2015 08:48:02 -0500
Clark Williams willi...@redhat.com wrote:



Change the WARN_ON to WARN_ON_NORT


Do we have a WARN_ON_NORT? I see a WARN_ON_NONRT, but not a
WARN_ON_NORT. Does this compile?

-- Steve


Sigh. Of course not. Reupdated patch (and yes this one compiles):


Thanks! Seems to have fixed the problem (of course!)
So far so good and nothing weird in the output of dmesg
-- Fernando




From: Clark Williams willi...@redhat.com
Date: Thu, 21 May 2015 12:51:53 -0500
Subject: [PATCH] [rt] i915: bogus warning from i915 when running on PREEMPT_RT

The i915 driver has a 'WARN_ON(!in_interrupt())' in the display
handler, which whines constanly on the RT kernel (since the interrupt
is actually handled in a threaded handler and not actual interrupt
context).

Change the WARN_ON to WARN_ON_NORT

Signed-off-by: Clark Williams willi...@redhat.com
---
  drivers/gpu/drm/i915/intel_display.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c 
b/drivers/gpu/drm/i915/intel_display.c
index f75173c20f47..30b1d16caa0d 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9745,7 +9745,7 @@ void intel_check_page_flip(struct drm_device *dev, int 
pipe)
struct drm_crtc *crtc = dev_priv-pipe_to_crtc_mapping[pipe];
struct intel_crtc *intel_crtc = to_intel_crtc(crtc);

-   WARN_ON(!in_interrupt());
+   WARN_ON_NONRT(!in_interrupt());

if (crtc == NULL)
return;



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-24 Thread Fernando Lopez-Lezcano


On 05/19/2015 02:39 PM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.0.4-rt1 patch set.


Great!!


Changes since v3.18.13-rt10

- Rebase to v4.0.

- David Hildenbrand's series of decouple of preempt_disable from
   pagefault_disable is part of the series.

While doing the v4.0 I stumbled upon a few things. Therefore I plan to
reorder the -RT queue and merge patches where possible. Also I intend to
drop PREEMPT_RTB and PREEMPT_RT_BASE unless there is need for it…


...

I had to do this to get it to build (looks like it is not rt specific, 
probably just a typo in mainline):



--- linux-4.0/sound/soc/intel/sst/sst.c~2015-04-12 15:12:50.0 
-0700
+++ linux-4.0/sound/soc/intel/sst/sst.c 2015-05-23 21:51:46.0 -0700
@@ -368,8 +368,8 @@
 * initialize by FW or driver when firmware is loaded
 */
spin_lock_irqsave(>ipc_spin_lock, irq_flags);
-   sst_shim_write64(shim, SST_IMRX, shim_regs->imrx),
-   sst_shim_write64(shim, SST_CSR, shim_regs->csr),
+   sst_shim_write64(shim, SST_IMRX, shim_regs->imrx);
+   sst_shim_write64(shim, SST_CSR, shim_regs->csr);
spin_unlock_irqrestore(>ipc_spin_lock, irq_flags);
 }



On a desktop with an i7-3770k it seems to run fine (but I have not have 
time to test for latency problems).


On my laptop, a lenovo w540, I get this continuously - so it is not 
really usable at this point:



May 24 13:51:41 localhost kernel: [ cut here ]
May 24 13:51:41 localhost kernel: WARNING: CPU: 5 PID: 361 at 
drivers/gpu/drm/i915/intel_display.c:9748 
intel_check_page_flip+0xaa/0xf0 [i915]()

May 24 13:51:41 localhost kernel: WARN_ON(!in_interrupt())
May 24 13:51:41 localhost kernel: Modules linked in:
May 24 13:51:41 localhost kernel: rfcomm fuse ccm xt_CHECKSUM 
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJ\
ECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp 
llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mang\
le ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_mangle iptable_security\
 iptable_raw bnep bbswitch(OE) vfat fat iTCO_wdt iTCO_vendor_support 
arc4 intel_rapl iosf_mbi coretemp kvm_intel kvm uvcvideo 
crct10dif_pclmul videobuf2_vmalloc videobuf\
2_core crc32_pclmul crc32c_intel videobuf2_memops ghash_clmulni_intel 
v4l2_common videodev media iwlmvm btusb serio_raw mac80211 bluetooth 
snd_hda_codec_realtek
May 24 13:51:41 localhost kernel: snd_hda_codec_hdmi 
snd_hda_codec_generic iwlwifi snd_hda_intel sdhci_pci snd_hda_controller 
cfg80211 sdhci snd_hda_codec mmc_core snd_h\
wdep lpc_ich snd_seq mei_me i2c_i801 mfd_core snd_seq_device mei snd_pcm 
thinkpad_acpi snd_timer ie31200_edac snd shpchp edac_core soundcore 
tpm_tis rfkill tpm nfsd auth\
_rpcgss nfs_acl lockd grace sunrpc i915 i2c_algo_bit e1000e 
drm_kms_helper ptp drm pps_core wmi video
May 24 13:51:41 localhost kernel: CPU: 5 PID: 361 Comm: irq/30-i915 
Tainted: GW  OE   4.0.4-201.rt1.2.fc21.ccrma.x86_64+rt #1
May 24 13:51:41 localhost kernel: Hardware name: LENOVO 
20BGCTO1WW/20BGCTO1WW, BIOS GNET65WW (2.13 ) 06/20/2014
May 24 13:51:41 localhost kernel:  1f35af7b 
8804651afc78 8179c0b9
May 24 13:51:41 localhost kernel:  8804651afcd0 
8804651afcb8 8109ee1a
May 24 13:51:41 localhost kernel: 8804651afcb8 88046638c000 
880469dd7800 0001

May 24 13:51:41 localhost kernel: Call Trace:
May 24 13:51:41 localhost kernel: [] dump_stack+0x4c/0x81
May 24 13:51:41 localhost kernel: [] 
warn_slowpath_common+0x8a/0xe0
May 24 13:51:41 localhost kernel: [] 
warn_slowpath_fmt+0x55/0x70
May 24 13:51:41 localhost kernel: [] 
intel_check_page_flip+0xaa/0xf0 [i915]
May 24 13:51:41 localhost kernel: [] 
ironlake_irq_handler+0x2e8/0x1000 [i915]
May 24 13:51:41 localhost kernel: [] ? 
__switch_to+0x150/0x610
May 24 13:51:41 localhost kernel: [] ? 
irq_thread_fn+0x50/0x50
May 24 13:51:41 localhost kernel: [] 
irq_forced_thread_fn+0x27/0x80
May 24 13:51:41 localhost kernel: [] 
irq_thread+0x12f/0x180
May 24 13:51:41 localhost kernel: [] ? 
wake_threads_waitq+0x30/0x30
May 24 13:51:41 localhost kernel: [] ? 
irq_thread_check_affinity+0x90/0x90

May 24 13:51:41 localhost kernel: [] kthread+0xca/0xe0
May 24 13:51:41 localhost kernel: [] ? 
kthread_worker_fn+0x180/0x180
May 24 13:51:41 localhost kernel: [] 
ret_from_fork+0x58/0x90
May 24 13:51:41 localhost kernel: [] ? 
kthread_worker_fn+0x180/0x180

May 24 13:51:41 localhost kernel: ---[ end trace 05fe ]---
May 24 13:51:41 localhost kernel: [ cut here ]


Any patches I could try to fix this?
-- Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the

Re: [ANNOUNCE] 4.0.4-rt1

2015-05-24 Thread Fernando Lopez-Lezcano


On 05/19/2015 02:39 PM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v4.0.4-rt1 patch set.


Great!!


Changes since v3.18.13-rt10

- Rebase to v4.0.

- David Hildenbrand's series of decouple of preempt_disable from
   pagefault_disable is part of the series.

While doing the v4.0 I stumbled upon a few things. Therefore I plan to
reorder the -RT queue and merge patches where possible. Also I intend to
drop PREEMPT_RTB and PREEMPT_RT_BASE unless there is need for it…


...

I had to do this to get it to build (looks like it is not rt specific, 
probably just a typo in mainline):



--- linux-4.0/sound/soc/intel/sst/sst.c~2015-04-12 15:12:50.0 
-0700
+++ linux-4.0/sound/soc/intel/sst/sst.c 2015-05-23 21:51:46.0 -0700
@@ -368,8 +368,8 @@
 * initialize by FW or driver when firmware is loaded
 */
spin_lock_irqsave(ctx-ipc_spin_lock, irq_flags);
-   sst_shim_write64(shim, SST_IMRX, shim_regs-imrx),
-   sst_shim_write64(shim, SST_CSR, shim_regs-csr),
+   sst_shim_write64(shim, SST_IMRX, shim_regs-imrx);
+   sst_shim_write64(shim, SST_CSR, shim_regs-csr);
spin_unlock_irqrestore(ctx-ipc_spin_lock, irq_flags);
 }



On a desktop with an i7-3770k it seems to run fine (but I have not have 
time to test for latency problems).


On my laptop, a lenovo w540, I get this continuously - so it is not 
really usable at this point:



May 24 13:51:41 localhost kernel: [ cut here ]
May 24 13:51:41 localhost kernel: WARNING: CPU: 5 PID: 361 at 
drivers/gpu/drm/i915/intel_display.c:9748 
intel_check_page_flip+0xaa/0xf0 [i915]()

May 24 13:51:41 localhost kernel: WARN_ON(!in_interrupt())
May 24 13:51:41 localhost kernel: Modules linked in:
May 24 13:51:41 localhost kernel: rfcomm fuse ccm xt_CHECKSUM 
ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns 
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJ\
ECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp 
llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mang\
le ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
iptable_mangle iptable_security\
 iptable_raw bnep bbswitch(OE) vfat fat iTCO_wdt iTCO_vendor_support 
arc4 intel_rapl iosf_mbi coretemp kvm_intel kvm uvcvideo 
crct10dif_pclmul videobuf2_vmalloc videobuf\
2_core crc32_pclmul crc32c_intel videobuf2_memops ghash_clmulni_intel 
v4l2_common videodev media iwlmvm btusb serio_raw mac80211 bluetooth 
snd_hda_codec_realtek
May 24 13:51:41 localhost kernel: snd_hda_codec_hdmi 
snd_hda_codec_generic iwlwifi snd_hda_intel sdhci_pci snd_hda_controller 
cfg80211 sdhci snd_hda_codec mmc_core snd_h\
wdep lpc_ich snd_seq mei_me i2c_i801 mfd_core snd_seq_device mei snd_pcm 
thinkpad_acpi snd_timer ie31200_edac snd shpchp edac_core soundcore 
tpm_tis rfkill tpm nfsd auth\
_rpcgss nfs_acl lockd grace sunrpc i915 i2c_algo_bit e1000e 
drm_kms_helper ptp drm pps_core wmi video
May 24 13:51:41 localhost kernel: CPU: 5 PID: 361 Comm: irq/30-i915 
Tainted: GW  OE   4.0.4-201.rt1.2.fc21.ccrma.x86_64+rt #1
May 24 13:51:41 localhost kernel: Hardware name: LENOVO 
20BGCTO1WW/20BGCTO1WW, BIOS GNET65WW (2.13 ) 06/20/2014
May 24 13:51:41 localhost kernel:  1f35af7b 
8804651afc78 8179c0b9
May 24 13:51:41 localhost kernel:  8804651afcd0 
8804651afcb8 8109ee1a
May 24 13:51:41 localhost kernel: 8804651afcb8 88046638c000 
880469dd7800 0001

May 24 13:51:41 localhost kernel: Call Trace:
May 24 13:51:41 localhost kernel: [8179c0b9] dump_stack+0x4c/0x81
May 24 13:51:41 localhost kernel: [8109ee1a] 
warn_slowpath_common+0x8a/0xe0
May 24 13:51:41 localhost kernel: [8109eec5] 
warn_slowpath_fmt+0x55/0x70
May 24 13:51:41 localhost kernel: [a0186dda] 
intel_check_page_flip+0xaa/0xf0 [i915]
May 24 13:51:41 localhost kernel: [a0152018] 
ironlake_irq_handler+0x2e8/0x1000 [i915]
May 24 13:51:41 localhost kernel: [81014610] ? 
__switch_to+0x150/0x610
May 24 13:51:41 localhost kernel: [810fb040] ? 
irq_thread_fn+0x50/0x50
May 24 13:51:41 localhost kernel: [810fb067] 
irq_forced_thread_fn+0x27/0x80
May 24 13:51:41 localhost kernel: [810fb61f] 
irq_thread+0x12f/0x180
May 24 13:51:41 localhost kernel: [810fb0f0] ? 
wake_threads_waitq+0x30/0x30
May 24 13:51:41 localhost kernel: [810fb4f0] ? 
irq_thread_check_affinity+0x90/0x90

May 24 13:51:41 localhost kernel: [810bf4ba] kthread+0xca/0xe0
May 24 13:51:41 localhost kernel: [810bf3f0] ? 
kthread_worker_fn+0x180/0x180
May 24 13:51:41 localhost kernel: [817a2098] 
ret_from_fork+0x58/0x90
May 24 13:51:41 localhost kernel: [810bf3f0] ? 
kthread_worker_fn+0x180/0x180

May 24 13:51:41 localhost kernel: ---[ end

Re: [ANNOUNCE] 3.14-rt1

2014-05-15 Thread Fernando Lopez-Lezcano


On 05/02/2014 04:37 AM, Sebastian Andrzej Siewior wrote:

* Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]:


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I
have seen something similar in 3.12.x-r):


Yes, you did: https://lkml.org/lkml/2014/3/7/163
You did not test I've sent. Care to do so?


I did patch my kernel and (I think) I did not see the problem again. I 
did get some very occassional hangs that seemed to be video related but 
I think I could not see what had caused them.



Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut
here ]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID:
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption.
prev->next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in:


and please send backtrace information properly formatted. This is
terrible hard to read.


Sorry about that, I will attach files in the future.

I re-patched 3.14.3-rt5 with a slightly tweaked version of you patch. 
Will see what happens and report back.

-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-05-15 Thread Fernando Lopez-Lezcano


On 05/02/2014 04:37 AM, Sebastian Andrzej Siewior wrote:

* Fernando Lopez-Lezcano | 2014-04-26 11:29:04 [-0700]:


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I
have seen something similar in 3.12.x-r):


Yes, you did: https://lkml.org/lkml/2014/3/7/163
You did not test I've sent. Care to do so?


I did patch my kernel and (I think) I did not see the problem again. I 
did get some very occassional hangs that seemed to be video related but 
I think I could not see what had caused them.



Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut
here ]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID:
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption.
prev-next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in:


and please send backtrace information properly formatted. This is
terrible hard to read.


Sorry about that, I will attach files in the future.

I re-patched 3.14.3-rt5 with a slightly tweaked version of you patch. 
Will see what happens and report back.

-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-26 Thread Fernando Lopez-Lezcano


On 04/11/2014 11:57 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.14-rt1 patch setty).

Changes since v3.12.15-rt25
- I dropped the sparc64 patches I had in the queue. They did not apply
   cleanly, the code in v3.14 changed in the MMU area. Here is where I
   remembered that it was not working perfectly either.


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I have 
seen something similar in 3.12.x-r):


Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut here 
]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID: 
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption. 
prev->next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in: fuse 
ipt_MASQUERADE xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw rfcomm ip6table_filter 
bnep ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iTCO_wdt 
iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel uvcvideo videobuf2_vmalloc microcode 
videobuf2_memops snd_hda_codec_hdmi videobuf2_core videodev media 
serio_raw btusb bluetooth intel_ips i2c_i801 6lowpan_iphc 
snd_hda_codec_conexant snd_hda_codec_generic arc4 iwldvm mac80211 
iwlwifi lpc_ich sdhci_pci mfd_core sdhci cfg80211 mmc_core snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer 
ptp mei_me pps_core mei shpchp thinkpad_acpi snd ppdev soundcore rfkill 
parport_pc parport acpi_cpufreq uinput firewire_ohci nouveau 
firewire_core crc_itu_t i2c_algo_bit drm_kms_helper ttm drm mxm_wmi 
i2c_core wmi video
Apr 26 11:16:11 localhost kernel: [   96.323331] CPU: 0 PID: 2051 Comm: 
cinnamon Not tainted 3.14.1-200.rt1.1.fc19.ccrma.x86_64+rt #1
Apr 26 11:16:11 localhost kernel: [   96.323332] Hardware name: LENOVO 
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Apr 26 11:16:11 localhost kernel: [   96.323334]   
8a5c11dc 8800ae715a88 81707fca
Apr 26 11:16:11 localhost kernel: [   96.323336]  8800ae715ad0 
8800ae715ac0 8108d03d 8802101196a0
Apr 26 11:16:11 localhost kernel: [   96.323337]  880210119b50 
880210119b50 880210119b40 88021a615648

Apr 26 11:16:11 localhost kernel: [   96.323338] Call Trace:
Apr 26 11:16:11 localhost kernel: [   96.323345]  [] 
dump_stack+0x4d/0x82
Apr 26 11:16:11 localhost kernel: [   96.323351]  [] 
warn_slowpath_common+0x7d/0xc0
Apr 26 11:16:11 localhost kernel: [   96.323352]  [] 
warn_slowpath_fmt+0x5c/0x80
Apr 26 11:16:11 localhost kernel: [   96.323354]  [] 
__list_del_entry+0xa1/0xd0
Apr 26 11:16:11 localhost kernel: [   96.323355]  [] 
list_del+0xd/0x30
Apr 26 11:16:11 localhost kernel: [   96.323393]  [] 
nouveau_fence_signal+0x53/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323414]  [] 
nouveau_fence_update+0x48/0xa0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323435]  [] 
nouveau_fence_sync+0x45/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323456]  [] 
validate_list+0xd8/0x2e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323478]  [] 
nouveau_gem_ioctl_pushbuf+0xaa3/0x13e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323500]  [] 
drm_ioctl+0x4f2/0x620 [drm]
Apr 26 11:16:11 localhost kernel: [   96.323506]  [] ? 
migrate_enable+0x94/0x1c0
Apr 26 11:16:11 localhost kernel: [   96.323527]  [] 
nouveau_drm_ioctl+0x4e/0x90 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323530]  [] 
do_vfs_ioctl+0x2e0/0x4c0
Apr 26 11:16:11 localhost kernel: [   96.323533]  [] ? 
file_has_perm+0xa6/0xb0
Apr 26 11:16:11 localhost kernel: [   96.323535]  [] 
SyS_ioctl+0x81/0xa0
Apr 26 11:16:11 localhost kernel: [   96.323538]  [] 
system_call_fastpath+0x16/0x1b
Apr 26 11:16:11 localhost kernel: [   96.323569] ---[ end trace 
0002 ]---


-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.14-rt1

2014-04-26 Thread Fernando Lopez-Lezcano


On 04/11/2014 11:57 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.14-rt1 patch setty).

Changes since v3.12.15-rt25
- I dropped the sparc64 patches I had in the queue. They did not apply
   cleanly, the code in v3.14 changed in the MMU area. Here is where I
   remembered that it was not working perfectly either.


Saw this a moment ago (3.14.1 + rt1, Fedora 19 laptop - I think I have 
seen something similar in 3.12.x-r):


Apr 26 11:16:11 localhost kernel: [   96.323248] [ cut here 
]
Apr 26 11:16:11 localhost kernel: [   96.323262] WARNING: CPU: 0 PID: 
2051 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0()
Apr 26 11:16:11 localhost kernel: [   96.323264] list_del corruption. 
prev-next should be 8802101196a0, but was 0001
Apr 26 11:16:11 localhost kernel: [   96.323266] Modules linked in: fuse 
ipt_MASQUERADE xt_CHECKSUM tun ip6t_rpfilter ip6t_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw rfcomm ip6table_filter 
bnep ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iTCO_wdt 
iTCO_vendor_support coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul 
crc32c_intel ghash_clmulni_intel uvcvideo videobuf2_vmalloc microcode 
videobuf2_memops snd_hda_codec_hdmi videobuf2_core videodev media 
serio_raw btusb bluetooth intel_ips i2c_i801 6lowpan_iphc 
snd_hda_codec_conexant snd_hda_codec_generic arc4 iwldvm mac80211 
iwlwifi lpc_ich sdhci_pci mfd_core sdhci cfg80211 mmc_core snd_hda_intel 
snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm e1000e snd_timer 
ptp mei_me pps_core mei shpchp thinkpad_acpi snd ppdev soundcore rfkill 
parport_pc parport acpi_cpufreq uinput firewire_ohci nouveau 
firewire_core crc_itu_t i2c_algo_bit drm_kms_helper ttm drm mxm_wmi 
i2c_core wmi video
Apr 26 11:16:11 localhost kernel: [   96.323331] CPU: 0 PID: 2051 Comm: 
cinnamon Not tainted 3.14.1-200.rt1.1.fc19.ccrma.x86_64+rt #1
Apr 26 11:16:11 localhost kernel: [   96.323332] Hardware name: LENOVO 
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Apr 26 11:16:11 localhost kernel: [   96.323334]   
8a5c11dc 8800ae715a88 81707fca
Apr 26 11:16:11 localhost kernel: [   96.323336]  8800ae715ad0 
8800ae715ac0 8108d03d 8802101196a0
Apr 26 11:16:11 localhost kernel: [   96.323337]  880210119b50 
880210119b50 880210119b40 88021a615648

Apr 26 11:16:11 localhost kernel: [   96.323338] Call Trace:
Apr 26 11:16:11 localhost kernel: [   96.323345]  [81707fca] 
dump_stack+0x4d/0x82
Apr 26 11:16:11 localhost kernel: [   96.323351]  [8108d03d] 
warn_slowpath_common+0x7d/0xc0
Apr 26 11:16:11 localhost kernel: [   96.323352]  [8108d0dc] 
warn_slowpath_fmt+0x5c/0x80
Apr 26 11:16:11 localhost kernel: [   96.323354]  [8137c551] 
__list_del_entry+0xa1/0xd0
Apr 26 11:16:11 localhost kernel: [   96.323355]  [8137c58d] 
list_del+0xd/0x30
Apr 26 11:16:11 localhost kernel: [   96.323393]  [a0135593] 
nouveau_fence_signal+0x53/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323414]  [a0135678] 
nouveau_fence_update+0x48/0xa0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323435]  [a0135f85] 
nouveau_fence_sync+0x45/0x80 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323456]  [a013aea8] 
validate_list+0xd8/0x2e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323478]  [a013c3d3] 
nouveau_gem_ioctl_pushbuf+0xaa3/0x13e0 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323500]  [a002ad02] 
drm_ioctl+0x4f2/0x620 [drm]
Apr 26 11:16:11 localhost kernel: [   96.323506]  [810c1af4] ? 
migrate_enable+0x94/0x1c0
Apr 26 11:16:11 localhost kernel: [   96.323527]  [a0132cfe] 
nouveau_drm_ioctl+0x4e/0x90 [nouveau]
Apr 26 11:16:11 localhost kernel: [   96.323530]  [81203480] 
do_vfs_ioctl+0x2e0/0x4c0
Apr 26 11:16:11 localhost kernel: [   96.323533]  [812fd8d6] ? 
file_has_perm+0xa6/0xb0
Apr 26 11:16:11 localhost kernel: [   96.323535]  [812036e1] 
SyS_ioctl+0x81/0xa0
Apr 26 11:16:11 localhost kernel: [   96.323538]  [81716769] 
system_call_fastpath+0x16/0x1b
Apr 26 11:16:11 localhost kernel: [   96.323569] ---[ end trace 
0002 ]---


-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-14 Thread Fernando Lopez-Lezcano

On 02/14/2014 02:43 AM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:

On 02/13/2014 03:55 PM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:

On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[] []
smp_call_function_many+0x2ca/0x330

Can you decode the exact location inside of smp_call_function_many via
addr2line please ?

# addr2line -e
/usr/lib/debug/lib/modules/3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt/vmlinux
810dc60e
/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.9-301.rt13.1.fc20.ccrma.x86_64/kernel/smp.c:108

So it's stuck in csd_lock_wait(), which means that the csd of the
target cpu is not free.

Is the machine completely dead or can you still retrieve information
from it?

After migrating to fc20/3.12.x-rtyy I started experiencing freezes in
some workstations. This coincided with one of our students running high
cpu load multi-core computations in them (he had been doing that before
under 3.10.x-rtyy with no problems). In the morning I would find
workstations unresponsive and catatonic. Probably his software was still
eating up cpu as the machines were warm (ie: still under load). No pings
back or keyboard/mouse/display response.

This was the only time I could get information from a machine while it
was in the process of freezing up - but this might have been a different
issue. I was ssh'd in and that terminal became unresponsive. I managed
to ssh in again and looked at the logs. The machine was not completely
frozen but it eventually became completely catatonic. For all I know
this might be different from the locked machines syndrome as it left
traces in the logs (I could forward you all the log entries if you want).

I could try to boot one of the machines into 3.12.xrtyy, replicate the
conditions and wait. What should I look for if I can catch this in the act?

-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-14 Thread Fernando Lopez-Lezcano

On 02/14/2014 02:43 AM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:

On 02/13/2014 03:55 PM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:

On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[810dc60a] [810dc60a]
smp_call_function_many+0x2ca/0x330

Can you decode the exact location inside of smp_call_function_many via
addr2line please ?

# addr2line -e
/usr/lib/debug/lib/modules/3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt/vmlinux
810dc60e
/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.9-301.rt13.1.fc20.ccrma.x86_64/kernel/smp.c:108

So it's stuck in csd_lock_wait(), which means that the csd of the
target cpu is not free.

Is the machine completely dead or can you still retrieve information
from it?

I could try to boot one of the machines into 3.12.xrtyy, replicate the
conditions and wait. What should I look for if I can catch this in the act?

-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-13 Thread Fernando Lopez-Lezcano


On 02/13/2014 03:55 PM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:


On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[]  []
smp_call_function_many+0x2ca/0x330


Can you decode the exact location inside of smp_call_function_many via
addr2line please ?


Hope this is useful (adding 0x2ce/0x330 as offsets does not make any
difference, don't know if it should)...

# grep smp_call_function /var/log/messages|tail -1
Feb 12 14:18:21 cmn27 kernel: [771840.224419] RIP: 0010:[]
[] smp_call_function_many+0x2ce/0x330
# addr2line -e
/usr/lib/debug/lib/modules/3.12.10-300.rt15.1.fc20.ccrma.x86_64+rt/vmlinux
810dc60e
/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.10-300.rt15.1.fc20.ccrma.x86_64/kernel/rtmutex.c:1295


I can't see how the kernel decoder thinks it's smp_call_function_many
but addr2line looks at rtmutex.c

That doesn't make any sense at all. Version mismatch?


Indeed, sorry for the mixup... here I go again, hopefully this one will 
make sense:


# addr2line -e 
/usr/lib/debug/lib/modules/3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt/vmlinux 810dc60e

/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.9-301.rt13.1.fc20.ccrma.x86_64/kernel/smp.c:108

-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-13 Thread Fernando Lopez-Lezcano


On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[]  []
smp_call_function_many+0x2ca/0x330


Can you decode the exact location inside of smp_call_function_many via
addr2line please ?


Hope this is useful (adding 0x2ce/0x330 as offsets does not make any 
difference, don't know if it should)...


# grep smp_call_function /var/log/messages|tail -1
Feb 12 14:18:21 cmn27 kernel: [771840.224419] RIP: 
0010:[]  [] 
smp_call_function_many+0x2ce/0x330
# addr2line -e 
/usr/lib/debug/lib/modules/3.12.10-300.rt15.1.fc20.ccrma.x86_64+rt/vmlinux 
 810dc60e

/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.10-300.rt15.1.fc20.ccrma.x86_64/kernel/rtmutex.c:1295

This is the only time I was able to catch some logs of the problem (if 
it is the same). I had to revert to 3.10.27-rt25 for the time being and 
that seems to be holding up well so far.


-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-13 Thread Fernando Lopez-Lezcano


On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[810dc60a]  [810dc60a]
smp_call_function_many+0x2ca/0x330


Can you decode the exact location inside of smp_call_function_many via
addr2line please ?


Hope this is useful (adding 0x2ce/0x330 as offsets does not make any 
difference, don't know if it should)...


# grep smp_call_function /var/log/messages|tail -1
Feb 12 14:18:21 cmn27 kernel: [771840.224419] RIP: 
0010:[810dc60e]  [810dc60e] 
smp_call_function_many+0x2ce/0x330
# addr2line -e 
/usr/lib/debug/lib/modules/3.12.10-300.rt15.1.fc20.ccrma.x86_64+rt/vmlinux 
 810dc60e

/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.10-300.rt15.1.fc20.ccrma.x86_64/kernel/rtmutex.c:1295

This is the only time I was able to catch some logs of the problem (if 
it is the same). I had to revert to 3.10.27-rt25 for the time being and 
that seems to be holding up well so far.


-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.12.9-rt13: BUG: soft lockup

2014-02-13 Thread Fernando Lopez-Lezcano


On 02/13/2014 03:55 PM, Thomas Gleixner wrote:

On Thu, 13 Feb 2014, Fernando Lopez-Lezcano wrote:


On 02/13/2014 02:25 PM, Thomas Gleixner wrote:

On Wed, 12 Feb 2014, Fernando Lopez-Lezcano wrote:

[771508.546449] RIP: 0010:[810dc60a]  [810dc60a]
smp_call_function_many+0x2ca/0x330


Can you decode the exact location inside of smp_call_function_many via
addr2line please ?


Hope this is useful (adding 0x2ce/0x330 as offsets does not make any
difference, don't know if it should)...

# grep smp_call_function /var/log/messages|tail -1
Feb 12 14:18:21 cmn27 kernel: [771840.224419] RIP: 0010:[810dc60e]
[810dc60e] smp_call_function_many+0x2ce/0x330
# addr2line -e
/usr/lib/debug/lib/modules/3.12.10-300.rt15.1.fc20.ccrma.x86_64+rt/vmlinux
810dc60e
/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.10-300.rt15.1.fc20.ccrma.x86_64/kernel/rtmutex.c:1295


I can't see how the kernel decoder thinks it's smp_call_function_many
but addr2line looks at rtmutex.c

That doesn't make any sense at all. Version mismatch?


Indeed, sorry for the mixup... here I go again, hopefully this one will 
make sense:


# addr2line -e 
/usr/lib/debug/lib/modules/3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt/vmlinux 810dc60e

/usr/src/debug/kernel-3.12.fc20.ccrma/linux-3.12.9-301.rt13.1.fc20.ccrma.x86_64/kernel/smp.c:108

-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.12.9-rt13: BUG: soft lockup

2014-02-12 Thread Fernando Lopez-Lezcano


Hi all,

I'm seeing these BUGs with 3.12.9-rt13 finally caught the messages. 
I was getting frozen machines with no traces left behind, this could 
possibly be it (see below - I have to retest with rt15)


-- Fernando


[771508.546420] BUG: soft lockup - CPU#5 stuck for 23s! [SweepSinVsUsm:1421]
[771508.546431] Modules linked in: bnep bluetooth fuse tun act_police 
cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb 
sch_hfsc sch_ingress sch_sfq nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache bridge stp llc xt_CHECKSUM 
ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype 
xt_comment xt_recent xt_nat ipt_ULOG ipt_MASQUERADE ipt_ECN 
ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic 
nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc 
nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda 
nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp 
nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns 
nf_conntrack_broadcast
[771508.546441]  nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp 
xt_TPROXY xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype 
xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport 
xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit 
xt_DSCP xt_dscp xt_dccp xt_connmark ebtable_nat xt_CLASSIFY ebtables 
xt_AUDIT xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle nfnetlink ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack 
ip6table_filter ip6_tables hwmon_vid iTCO_wdt iTCO_vendor_support 
gpio_ich eeepc_wmi asus_wmi sparse_keymap rfkill snd_ice1712 snd_cs8427 
x86_pkg_temp_thermal snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda coretemp 
snd_mpu401_uart snd_rawmidi snd_ac97_codec kvm ac97_bus crct10dif_pclmul
[771508.546446]  crc32_pclmul snd_seq crc32c_intel snd_seq_device 
ghash_clmulni_intel snd_pcm microcode snd_page_alloc snd_timer snd 
serio_raw soundcore r8169 i2c_i801 lpc_ich mfd_core mii mei_me mei 
shpchp binfmt_misc usb_storage nouveau i2c_algo_bit drm_kms_helper ttm 
drm i2c_core mxm_wmi video wmi
[771508.546447] CPU: 5 PID: 1421 Comm: SweepSinVsUsm Tainted: GW 
   3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt #1
[771508.546447] Hardware name: System manufacturer System Product 
Name/P8Z77-M, BIOS 1406 07/19/2012
[771508.546447] task: 8804af8b8000 ti: 8807ab41 task.ti: 
8807ab41
[771508.546449] RIP: 0010:[]  [] 
smp_call_function_many+0x2ca/0x330

[771508.546449] RSP: 0018:8807ab411cf0  EFLAGS: 0202
[771508.546450] RAX: 0001 RBX:  RCX: 
8807fe2e9918
[771508.546450] RDX: 0001 RSI: 0400 RDI: 

[771508.546450] RBP: 8807ab411d50 R08: 8807fe4e6108 R09: 
0010
[771508.546451] R10: 8807fe4e6108 R11: 0246 R12: 

[771508.546451] R13:  R14: 00df R15: 
810432e4
[771508.546452] FS:  7faf34514700() GS:8807fe48() 
knlGS:

[771508.546452] CS:  0010 DS:  ES:  CR0: 80050033
[771508.546453] CR2: 7faebb93d000 CR3: 0006bafd8000 CR4: 
001407e0

[771508.546453] Stack:
[771508.546454]  8807fe4e6188 0001 000660c0 
8807ab411d60
[771508.546455]  8105adb0 8807fe5660c0 0202 
880077268480
[771508.546456]  880077268780 7faebb93f000 7faebafea000 
880077268480

[771508.546456] Call Trace:
[771508.546458]  [] ? leave_mm+0x80/0x80
[771508.546459]  [] native_flush_tlb_others+0x37/0x40
[771508.546460]  [] flush_tlb_mm_range+0xb4/0x280
[771508.546461]  [] tlb_flush_mmu.part.50+0x33/0x90
[771508.546462]  [] tlb_finish_mmu+0x55/0x60
[771508.546463]  [] zap_page_range+0x112/0x150
[771508.546465]  [] SyS_madvise+0x381/0x7b0
[771508.546466]  [] system_call_fastpath+0x16/0x1b
[771508.546475] Code: 4d ea 24 00 3b 05 9f e5 c2 00 89 c2 0f 8d c3 fd ff 
ff 48 98 49 8b 4d 00 48 03 0c c5 e0 64 d0 81 f6 41 20 01 74 cb 0f 1f 00 
f3 90  41 20 01 75 f8 eb be 0f b6 4d ac 48 8b 55 b8 44 89 ef 48 8b

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.12.9-rt13: BUG: soft lockup

2014-02-12 Thread Fernando Lopez-Lezcano


Hi all,

I'm seeing these BUGs with 3.12.9-rt13 finally caught the messages. 
I was getting frozen machines with no traces left behind, this could 
possibly be it (see below - I have to retest with rt15)


-- Fernando


[771508.546420] BUG: soft lockup - CPU#5 stuck for 23s! [SweepSinVsUsm:1421]
[771508.546431] Modules linked in: bnep bluetooth fuse tun act_police 
cls_basic cls_flow cls_fw cls_u32 sch_fq_codel sch_tbf sch_prio sch_htb 
sch_hfsc sch_ingress sch_sfq nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache bridge stp llc xt_CHECKSUM 
ipt_rpfilter xt_statistic xt_CT xt_LOG xt_connlimit xt_realm xt_addrtype 
xt_comment xt_recent xt_nat ipt_ULOG ipt_MASQUERADE ipt_ECN 
ipt_CLUSTERIP ipt_ah xt_set ip_set nf_nat_tftp nf_nat_snmp_basic 
nf_conntrack_snmp nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc 
nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda 
nf_conntrack_sane nf_conntrack_tftp nf_conntrack_sip 
nf_conntrack_proto_udplite nf_conntrack_proto_sctp nf_conntrack_pptp 
nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns 
nf_conntrack_broadcast
[771508.546441]  nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp 
xt_TPROXY xt_time xt_TCPMSS xt_tcpmss xt_sctp xt_policy xt_pkttype 
xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport 
xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit 
xt_DSCP xt_dscp xt_dccp xt_connmark ebtable_nat xt_CLASSIFY ebtables 
xt_AUDIT xt_state iptable_raw iptable_nat nf_nat_ipv4 nf_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 iptable_mangle nfnetlink ip6t_REJECT 
nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack 
ip6table_filter ip6_tables hwmon_vid iTCO_wdt iTCO_vendor_support 
gpio_ich eeepc_wmi asus_wmi sparse_keymap rfkill snd_ice1712 snd_cs8427 
x86_pkg_temp_thermal snd_i2c snd_ice17xx_ak4xxx snd_ak4xxx_adda coretemp 
snd_mpu401_uart snd_rawmidi snd_ac97_codec kvm ac97_bus crct10dif_pclmul
[771508.546446]  crc32_pclmul snd_seq crc32c_intel snd_seq_device 
ghash_clmulni_intel snd_pcm microcode snd_page_alloc snd_timer snd 
serio_raw soundcore r8169 i2c_i801 lpc_ich mfd_core mii mei_me mei 
shpchp binfmt_misc usb_storage nouveau i2c_algo_bit drm_kms_helper ttm 
drm i2c_core mxm_wmi video wmi
[771508.546447] CPU: 5 PID: 1421 Comm: SweepSinVsUsm Tainted: GW 
   3.12.9-301.rt13.1.fc20.ccrma.x86_64+rt #1
[771508.546447] Hardware name: System manufacturer System Product 
Name/P8Z77-M, BIOS 1406 07/19/2012
[771508.546447] task: 8804af8b8000 ti: 8807ab41 task.ti: 
8807ab41
[771508.546449] RIP: 0010:[810dc60a]  [810dc60a] 
smp_call_function_many+0x2ca/0x330

[771508.546449] RSP: 0018:8807ab411cf0  EFLAGS: 0202
[771508.546450] RAX: 0001 RBX:  RCX: 
8807fe2e9918
[771508.546450] RDX: 0001 RSI: 0400 RDI: 

[771508.546450] RBP: 8807ab411d50 R08: 8807fe4e6108 R09: 
0010
[771508.546451] R10: 8807fe4e6108 R11: 0246 R12: 

[771508.546451] R13:  R14: 00df R15: 
810432e4
[771508.546452] FS:  7faf34514700() GS:8807fe48() 
knlGS:

[771508.546452] CS:  0010 DS:  ES:  CR0: 80050033
[771508.546453] CR2: 7faebb93d000 CR3: 0006bafd8000 CR4: 
001407e0

[771508.546453] Stack:
[771508.546454]  8807fe4e6188 0001 000660c0 
8807ab411d60
[771508.546455]  8105adb0 8807fe5660c0 0202 
880077268480
[771508.546456]  880077268780 7faebb93f000 7faebafea000 
880077268480

[771508.546456] Call Trace:
[771508.546458]  [8105adb0] ? leave_mm+0x80/0x80
[771508.546459]  [8105af07] native_flush_tlb_others+0x37/0x40
[771508.546460]  [8105b084] flush_tlb_mm_range+0xb4/0x280
[771508.546461]  [81177173] tlb_flush_mmu.part.50+0x33/0x90
[771508.546462]  [81177d15] tlb_finish_mmu+0x55/0x60
[771508.546463]  [8117a072] zap_page_range+0x112/0x150
[771508.546465]  [81176bc1] SyS_madvise+0x381/0x7b0
[771508.546466]  [81696169] system_call_fastpath+0x16/0x1b
[771508.546475] Code: 4d ea 24 00 3b 05 9f e5 c2 00 89 c2 0f 8d c3 fd ff 
ff 48 98 49 8b 4d 00 48 03 0c c5 e0 64 d0 81 f6 41 20 01 74 cb 0f 1f 00 
f3 90 f6 41 20 01 75 f8 eb be 0f b6 4d ac 48 8b 55 b8 44 89 ef 48 8b

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

3.10.20-rt17, BUG and Oops

2013-11-30 Thread Fernando Lopez-Lezcano


Hi all,

Just got this on 3.10.20-rt17, ThinkPad T510 running Fedora 19 (I think 
it has happened a few times before). The machine is not completely dead, 
the mouse pointer moves around but otherwise display updates and 
keyboard response are nil.


-- Fernando



Nov 29 23:17:52 localhost kernel: [50532.638944] BUG: unable to handle 
kernel NULL pointer dereference at 02c7
Nov 29 23:17:52 localhost kernel: [50532.638951] IP: 
[] advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.638953] PGD 1db141067 PUD 
228703067 PMD 0
Nov 29 23:17:52 localhost kernel: [50532.638955] Oops:  [#1] PREEMPT 
SMP
Nov 29 23:17:52 localhost kernel: [50532.638983] Modules linked in: 
snd_hrtimer snd_seq_midi snd_seq_midi_event snd_seq_dummy snd_hdsp 
snd_rawmidi fuse xt_CHECKSUM tun nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security rfcomm iptable_raw 
bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi acpi_cpufreq mperf 
coretemp kvm_intel kvm crc32_pclmul uvcvideo crc32c_intel 
ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops videobuf2_core 
microcode videodev media serio_raw snd_hda_codec_conexant intel_ips 
btusb i2c_i801 bluetooth arc4 iwldvm mac80211 snd_hda_intel 
snd_hda_codec iwlwifi snd_hwdep sdhci_pci snd_seq sdhci snd_seq_device 
cfg80211 mmc_core snd_pcm lpc_ich mfd_core e1000e snd_page_alloc ptp 
mei_me snd_timer pps_core mei thinkpad_acpi snd soundcore rfkill shpchp 
uinput nouveau i2c_algo_bit firewire_ohci drm_kms_helper firewire_core 
crc_itu_t ttm drm i2c_core mxm_wmi video wmi
Nov 29 23:17:52 localhost kernel: [50532.639006] CPU: 0 PID: 45 Comm: 
irq/9-acpi Not tainted 3.10.20-200.rt17.1.fc19.ccrma.x86_64.rt #1
Nov 29 23:17:52 localhost kernel: [50532.639007] Hardware name: LENOVO 
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Nov 29 23:17:52 localhost kernel: [50532.639008] task: 880229bc8000 
ti: 880229bac000 task.ti: 880229bac000
Nov 29 23:17:52 localhost kernel: [50532.639011] RIP: 
0010:[]  [] 
advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.639012] RSP: 
0018:880229badd50  EFLAGS: 00010246
Nov 29 23:17:52 localhost kernel: [50532.639013] RAX: 0081 
RBX: 88011339f990 RCX: 0082
Nov 29 23:17:52 localhost kernel: [50532.639013] RDX: 0246 
RSI: 0001 RDI: 880229b78eb0
Nov 29 23:17:52 localhost kernel: [50532.639014] RBP: 880229badd70 
R08:  R09: 
Nov 29 23:17:52 localhost kernel: [50532.639015] R10: 0001 
R11: 102a R12: 880229b78e00
Nov 29 23:17:52 localhost kernel: [50532.639016] R13: 0001 
R14: 880229b78eb0 R15: 880229b78d36
Nov 29 23:17:52 localhost kernel: [50532.639017] FS: 
() GS:88023bc0() knlGS:
Nov 29 23:17:52 localhost kernel: [50532.639018] CS:  0010 DS:  ES: 
 CR0: 8005003b
Nov 29 23:17:52 localhost kernel: [50532.639019] CR2: 02c7 
CR3: 0001e8198000 CR4: 07f0
Nov 29 23:17:52 localhost kernel: [50532.639020] DR0:  
DR1:  DR2: 
Nov 29 23:17:52 localhost kernel: [50532.639021] DR3:  
DR6: 0ff0 DR7: 0400

Nov 29 23:17:52 localhost kernel: [50532.639021] Stack:
Nov 29 23:17:52 localhost kernel: [50532.639024]  880229b78e00 
0001 88022983b000 0001
Nov 29 23:17:52 localhost kernel: [50532.639026]  880229badd90 
8136258e 880229bc0198 0011
Nov 29 23:17:52 localhost kernel: [50532.639028]  880229baddb8 
8136c3a3 880229b8a660 

Nov 29 23:17:52 localhost kernel: [50532.639028] Call Trace:
Nov 29 23:17:52 localhost kernel: [50532.639032]  [] 
acpi_ec_gpe_handler+0x48/0xc9
Nov 29 23:17:52 localhost kernel: [50532.639036]  [] 
acpi_ev_gpe_dispatch+0xb6/0x126
Nov 29 23:17:52 localhost kernel: [50532.639037]  [] 
acpi_ev_gpe_detect+0xc0/0x111
Nov 29 23:17:52 localhost kernel: [50532.639043]  [] ? 
irq_thread_fn+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.639044]  [] 
acpi_ev_sci_xrupt_handler+0x1f/0x25
Nov 29 23:17:52 localhost kernel: [50532.639048]  [] 
acpi_irq+0x16/0x31
Nov 29 23:17:52 localhost kernel: [50532.639050]  [] 
irq_forced_thread_fn+0x23/0x70
Nov 29 23:17:52 localhost kernel: [50532.639051]  [] 
irq_thread+0x10f/0x150
Nov 29 23:17:52 localhost kernel: [50532.639053]  [] ? 
wake_threads_waitq+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.639054]  [] ? 
irq_thread_check_affinity+0x90/0x90
Nov 29 23:17:52 localhost kernel: [50532.639058]  []

3.10.20-rt17, BUG and Oops

2013-11-30 Thread Fernando Lopez-Lezcano


Hi all,

Just got this on 3.10.20-rt17, ThinkPad T510 running Fedora 19 (I think 
it has happened a few times before). The machine is not completely dead, 
the mouse pointer moves around but otherwise display updates and 
keyboard response are nil.


-- Fernando



Nov 29 23:17:52 localhost kernel: [50532.638944] BUG: unable to handle 
kernel NULL pointer dereference at 02c7
Nov 29 23:17:52 localhost kernel: [50532.638951] IP: 
[81361e9a] advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.638953] PGD 1db141067 PUD 
228703067 PMD 0
Nov 29 23:17:52 localhost kernel: [50532.638955] Oops:  [#1] PREEMPT 
SMP
Nov 29 23:17:52 localhost kernel: [50532.638983] Modules linked in: 
snd_hrtimer snd_seq_midi snd_seq_midi_event snd_seq_dummy snd_hdsp 
snd_rawmidi fuse xt_CHECKSUM tun nf_conntrack_netbios_ns 
nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack 
ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables 
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle ip6table_security ip6table_raw ip6table_filter 
ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 
nf_nat nf_conntrack iptable_mangle iptable_security rfcomm iptable_raw 
bnep iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi acpi_cpufreq mperf 
coretemp kvm_intel kvm crc32_pclmul uvcvideo crc32c_intel 
ghash_clmulni_intel videobuf2_vmalloc videobuf2_memops videobuf2_core 
microcode videodev media serio_raw snd_hda_codec_conexant intel_ips 
btusb i2c_i801 bluetooth arc4 iwldvm mac80211 snd_hda_intel 
snd_hda_codec iwlwifi snd_hwdep sdhci_pci snd_seq sdhci snd_seq_device 
cfg80211 mmc_core snd_pcm lpc_ich mfd_core e1000e snd_page_alloc ptp 
mei_me snd_timer pps_core mei thinkpad_acpi snd soundcore rfkill shpchp 
uinput nouveau i2c_algo_bit firewire_ohci drm_kms_helper firewire_core 
crc_itu_t ttm drm i2c_core mxm_wmi video wmi
Nov 29 23:17:52 localhost kernel: [50532.639006] CPU: 0 PID: 45 Comm: 
irq/9-acpi Not tainted 3.10.20-200.rt17.1.fc19.ccrma.x86_64.rt #1
Nov 29 23:17:52 localhost kernel: [50532.639007] Hardware name: LENOVO 
4313CTO/4313CTO, BIOS 6MET64WW (1.27 ) 07/15/2010
Nov 29 23:17:52 localhost kernel: [50532.639008] task: 880229bc8000 
ti: 880229bac000 task.ti: 880229bac000
Nov 29 23:17:52 localhost kernel: [50532.639011] RIP: 
0010:[81361e9a]  [81361e9a] 
advance_transaction+0x60/0x121
Nov 29 23:17:52 localhost kernel: [50532.639012] RSP: 
0018:880229badd50  EFLAGS: 00010246
Nov 29 23:17:52 localhost kernel: [50532.639013] RAX: 0081 
RBX: 88011339f990 RCX: 0082
Nov 29 23:17:52 localhost kernel: [50532.639013] RDX: 0246 
RSI: 0001 RDI: 880229b78eb0
Nov 29 23:17:52 localhost kernel: [50532.639014] RBP: 880229badd70 
R08:  R09: 
Nov 29 23:17:52 localhost kernel: [50532.639015] R10: 0001 
R11: 102a R12: 880229b78e00
Nov 29 23:17:52 localhost kernel: [50532.639016] R13: 0001 
R14: 880229b78eb0 R15: 880229b78d36
Nov 29 23:17:52 localhost kernel: [50532.639017] FS: 
() GS:88023bc0() knlGS:
Nov 29 23:17:52 localhost kernel: [50532.639018] CS:  0010 DS:  ES: 
 CR0: 8005003b
Nov 29 23:17:52 localhost kernel: [50532.639019] CR2: 02c7 
CR3: 0001e8198000 CR4: 07f0
Nov 29 23:17:52 localhost kernel: [50532.639020] DR0:  
DR1:  DR2: 
Nov 29 23:17:52 localhost kernel: [50532.639021] DR3:  
DR6: 0ff0 DR7: 0400

Nov 29 23:17:52 localhost kernel: [50532.639021] Stack:
Nov 29 23:17:52 localhost kernel: [50532.639024]  880229b78e00 
0001 88022983b000 0001
Nov 29 23:17:52 localhost kernel: [50532.639026]  880229badd90 
8136258e 880229bc0198 0011
Nov 29 23:17:52 localhost kernel: [50532.639028]  880229baddb8 
8136c3a3 880229b8a660 

Nov 29 23:17:52 localhost kernel: [50532.639028] Call Trace:
Nov 29 23:17:52 localhost kernel: [50532.639032]  [8136258e] 
acpi_ec_gpe_handler+0x48/0xc9
Nov 29 23:17:52 localhost kernel: [50532.639036]  [8136c3a3] 
acpi_ev_gpe_dispatch+0xb6/0x126
Nov 29 23:17:52 localhost kernel: [50532.639037]  [8136c4d3] 
acpi_ev_gpe_detect+0xc0/0x111
Nov 29 23:17:52 localhost kernel: [50532.639043]  [810f46b0] ? 
irq_thread_fn+0x50/0x50
Nov 29 23:17:52 localhost kernel: [50532.639044]  [8136e3cf] 
acpi_ev_sci_xrupt_handler+0x1f/0x25
Nov 29 23:17:52 localhost kernel: [50532.639048]  [8135b12f] 
acpi_irq+0x16/0x31
Nov 29 23:17:52 localhost kernel: [50532.639050]  [810f46d3] 
irq_forced_thread_fn+0x23/0x70
Nov 29 23:17:52 localhost kernel: [50532.639051]  [810f4c7f] 
irq_thread+0x10f/0x150
Nov 29 23:17:52 localhost kernel: [50532.639053]

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-23 Thread Fernando Lopez-Lezcano


On 08/23/2013 10:56 AM, Sebastian Andrzej Siewior wrote:

* Fernando Lopez-Lezcano | 2013-08-23 10:18:08 [-0700]:


Please post a patch when/if you have it so I can retry the build...
Thanks for taking a look at this!


Does this fix your trobule?


Yes, it does, thanks! Builds, installs and boots the x86_64 kernel (I 
did not test the i686 build, I don't have a 32 machine to test).


-- Fernando



diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index 0bfa40d..6f61d5f 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -220,7 +220,7 @@ static struct sample *buffer_get_sample(struct sample 
*sample)
  #else
  #define time_type u64
  #define time_get()trace_clock_local()
-#define time_to_us(x)  ((x) / 1000)
+#define time_to_us(x)  div_u64(x, 1000)
  #define time_sub(a, b)((a) - (b))
  #define init_time(a, b)   a = b
  #define time_u64(a)   a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-23 Thread Fernando Lopez-Lezcano


On 08/23/2013 12:08 AM, Sebastian Andrzej Siewior wrote:

On 08/23/2013 07:50 AM, Fernando Lopez-Lezcano wrote:

On 08/22/2013 11:21 AM, Sebastian Andrzej Siewior wrote:

- hwlat improvements by Steven

Known issues:

...
Trying to build I get (in make modules):

ERROR: "__udivdi3" [drivers/misc/hwlat_detector.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


looks like someone forgot to do do_div() instead / which fails on 32bit
if used on a 64bit dividend.

Will fix later.


Please post a patch when/if you have it so I can retry the build...
Thanks for taking a look at this!
-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-23 Thread Fernando Lopez-Lezcano


On 08/23/2013 12:08 AM, Sebastian Andrzej Siewior wrote:

On 08/23/2013 07:50 AM, Fernando Lopez-Lezcano wrote:

On 08/22/2013 11:21 AM, Sebastian Andrzej Siewior wrote:

- hwlat improvements by Steven

Known issues:

...
Trying to build I get (in make modules):

ERROR: __udivdi3 [drivers/misc/hwlat_detector.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2


looks like someone forgot to do do_div() instead / which fails on 32bit
if used on a 64bit dividend.

Will fix later.


Please post a patch when/if you have it so I can retry the build...
Thanks for taking a look at this!
-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-23 Thread Fernando Lopez-Lezcano


On 08/23/2013 10:56 AM, Sebastian Andrzej Siewior wrote:

* Fernando Lopez-Lezcano | 2013-08-23 10:18:08 [-0700]:


Please post a patch when/if you have it so I can retry the build...
Thanks for taking a look at this!


Does this fix your trobule?


Yes, it does, thanks! Builds, installs and boots the x86_64 kernel (I 
did not test the i686 build, I don't have a 32 machine to test).


-- Fernando



diff --git a/drivers/misc/hwlat_detector.c b/drivers/misc/hwlat_detector.c
index 0bfa40d..6f61d5f 100644
--- a/drivers/misc/hwlat_detector.c
+++ b/drivers/misc/hwlat_detector.c
@@ -220,7 +220,7 @@ static struct sample *buffer_get_sample(struct sample 
*sample)
  #else
  #define time_type u64
  #define time_get()trace_clock_local()
-#define time_to_us(x)  ((x) / 1000)
+#define time_to_us(x)  div_u64(x, 1000)
  #define time_sub(a, b)((a) - (b))
  #define init_time(a, b)   a = b
  #define time_u64(a)   a

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-22 Thread Fernando Lopez-Lezcano


On 08/22/2013 11:21 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.9-rt5 patch set.


Thanks!,


Changes since v3.10.9-rt4
- swait fixes from Steven. It fixed the issues with CONFIG_RCU_NOCB_CPU
   where the system suddenly froze and RCU wasn't doing its job anymore
- hwlat improvements by Steven

Known issues:

...
Trying to build I get (in make modules):

ERROR: "__udivdi3" [drivers/misc/hwlat_detector.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

(find attached the final configuration used for building)
-- Fernando


build.log.bz2
Description: application/bzip

Re: [ANNOUNCE] 3.10.9-rt5

2013-08-22 Thread Fernando Lopez-Lezcano


On 08/22/2013 11:21 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.9-rt5 patch set.


Thanks!,


Changes since v3.10.9-rt4
- swait fixes from Steven. It fixed the issues with CONFIG_RCU_NOCB_CPU
   where the system suddenly froze and RCU wasn't doing its job anymore
- hwlat improvements by Steven

Known issues:

...
Trying to build I get (in make modules):

ERROR: __udivdi3 [drivers/misc/hwlat_detector.ko] undefined!
make[1]: *** [__modpost] Error 1
make: *** [modules] Error 2

(find attached the final configuration used for building)
-- Fernando


build.log.bz2
Description: application/bzip

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-20 Thread Fernando Lopez-Lezcano


On 08/19/2013 05:29 PM, Steven Rostedt wrote:

On Mon, 19 Aug 2013 10:23:44 -0700
Fernando Lopez-Lezcano  wrote:


The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


I think I'm going to send them an email about that.


In the meanwhile, any hope of a patch to be able to compile and test
with my current configuration?


Can you boot without enabling CONFIG_BCACHE?


Just to confirm that the kernel builds, installs and boots fine without 
this option...

-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-20 Thread Fernando Lopez-Lezcano


On 08/19/2013 05:29 PM, Steven Rostedt wrote:

On Mon, 19 Aug 2013 10:23:44 -0700
Fernando Lopez-Lezcano na...@ccrma.stanford.edu wrote:


The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


I think I'm going to send them an email about that.


In the meanwhile, any hope of a patch to be able to compile and test
with my current configuration?


Can you boot without enabling CONFIG_BCACHE?


Just to confirm that the kernel builds, installs and boots fine without 
this option...

-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-19 Thread Fernando Lopez-Lezcano


On 08/19/2013 05:29 PM, Steven Rostedt wrote:

On Mon, 19 Aug 2013 10:23:44 -0700
Fernando Lopez-Lezcano  wrote:



The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


I think I'm going to send them an email about that.



In the meanwhile, any hope of a patch to be able to compile and test
with my current configuration?


Can you boot without enabling CONFIG_BCACHE?


I'm pretty sure I'll be able to do that. No real need in my personal 
case AFAICT.


I'll try that next - it is just that I try very hard to keep the 
configuration of my rt kernels as close as possible to the defaults that 
Fedora uses (they get distributed as part of Planet CCRMA and there is 
no telling what usage cases they will hit - it would be confusing to 
have something that works on Fedora kernels and does not on equivalent 
RT patched kernels).


Thanks for the heads up!,
-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-19 Thread Fernando Lopez-Lezcano


On 08/16/2013 12:01 AM, Sebastian Andrzej Siewior wrote:

On 08/15/2013 09:22 PM, Steven Rostedt wrote:

On Thu, 15 Aug 2013 11:42:55 -0700
Fernando Lopez-Lezcano  wrote:


On 08/12/2013 09:34 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.6-rt3 patch set.


I'm getting this when trying to build:

drivers/md/bcache/request.c: In function 'cached_dev_write_complete':
drivers/md/bcache/request.c:1008:2: error: implicit declaration of
function 'up_read_non_owner' [-Werror=implicit-function-declaration]
up_read_non_owner(>writeback_lock);
^
drivers/md/bcache/request.c: In function 'request_write':
drivers/md/bcache/request.c:1034:2: error: implicit declaration of
function 'down_read_non_owner' [-Werror=implicit-function-declaration]
down_read_non_owner(>writeback_lock);
^
cc1: some warnings being treated as errors



Can you send us your config.


The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


In the meanwhile, any hope of a patch to be able to compile and test 
with my current configuration?

Thanks,
-- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-19 Thread Fernando Lopez-Lezcano


On 08/16/2013 12:01 AM, Sebastian Andrzej Siewior wrote:

On 08/15/2013 09:22 PM, Steven Rostedt wrote:

On Thu, 15 Aug 2013 11:42:55 -0700
Fernando Lopez-Lezcano na...@ccrma.stanford.edu wrote:


On 08/12/2013 09:34 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.6-rt3 patch set.


I'm getting this when trying to build:

drivers/md/bcache/request.c: In function 'cached_dev_write_complete':
drivers/md/bcache/request.c:1008:2: error: implicit declaration of
function 'up_read_non_owner' [-Werror=implicit-function-declaration]
up_read_non_owner(dc-writeback_lock);
^
drivers/md/bcache/request.c: In function 'request_write':
drivers/md/bcache/request.c:1034:2: error: implicit declaration of
function 'down_read_non_owner' [-Werror=implicit-function-declaration]
down_read_non_owner(dc-writeback_lock);
^
cc1: some warnings being treated as errors



Can you send us your config.


The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


In the meanwhile, any hope of a patch to be able to compile and test 
with my current configuration?

Thanks,
-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-19 Thread Fernando Lopez-Lezcano


On 08/19/2013 05:29 PM, Steven Rostedt wrote:

On Mon, 19 Aug 2013 10:23:44 -0700
Fernando Lopez-Lezcano na...@ccrma.stanford.edu wrote:



The problem is that bcache is using new semaphore functions which it
just introduced which rt does not know about. The comment above their
definition says that it is wrong to use them and completion is the
right way to do it.
So my question is, why don't we use completion but this nasty hack?


I think I'm going to send them an email about that.



In the meanwhile, any hope of a patch to be able to compile and test
with my current configuration?


Can you boot without enabling CONFIG_BCACHE?


I'm pretty sure I'll be able to do that. No real need in my personal 
case AFAICT.


I'll try that next - it is just that I try very hard to keep the 
configuration of my rt kernels as close as possible to the defaults that 
Fedora uses (they get distributed as part of Planet CCRMA and there is 
no telling what usage cases they will hit - it would be confusing to 
have something that works on Fedora kernels and does not on equivalent 
RT patched kernels).


Thanks for the heads up!,
-- Fernando
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-15 Thread Fernando Lopez-Lezcano


On 08/12/2013 09:34 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.6-rt3 patch set.


I'm getting this when trying to build:

drivers/md/bcache/request.c: In function 'cached_dev_write_complete':
drivers/md/bcache/request.c:1008:2: error: implicit declaration of 
function 'up_read_non_owner' [-Werror=implicit-function-declaration]

  up_read_non_owner(>writeback_lock);
  ^
drivers/md/bcache/request.c: In function 'request_write':
drivers/md/bcache/request.c:1034:2: error: implicit declaration of 
function 'down_read_non_owner' [-Werror=implicit-function-declaration]

  down_read_non_owner(>writeback_lock);
  ^
cc1: some warnings being treated as errors

Does not look like *_read_non_owner exist in rwsem_rt.h...
-- Fernando



Changes since v3.10.6-rt2
- the queue can be imported with git quiltimport
- powerpc compiles again. Thanks to Paul Gortmaker for the patch.
- added three patches from v3.8 which fall the wagon on their way to
   3.10. One of them enables RT-FULL on ARM :)
- removed all cpsw patches from the queue. They made it upstream. My
   nfsboot setup seems not to work, lets look at this later.
- make arm/spear compile. Thanks to Felipe Balbi for the patch.
- Add a patch from Corey Minyard to no longer use deprecated
   CONFIG_NO_HZ.
- add the one patch which I added to the last 3.8-rt to get get list_bl
   work again on !SMP && !DEBUG_SPINLOCK
- Spell "preemptible" properly in "Preemptible Kernel (Basic RT)" menu
   item. Thanks to Uwe Kleine-König for the patch.
- a patch from John Kacur to avoid a warning in the hpsa.
- a patch for the ppc5200 where the compiler thinks a variable isn't
   initialized and stops compililing due to -Werror

Known issues:

   - SLAB support not working

   - The cpsw network driver shows some issues.

   - ARM & PPC don't fall apart once booted. More testing doesn't
 hurt.

   - bcache with CONFIG_DEBUG_LOCK_ALLOC enabled does not compile.

...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.10.6-rt3

2013-08-15 Thread Fernando Lopez-Lezcano


On 08/12/2013 09:34 AM, Sebastian Andrzej Siewior wrote:

Dear RT folks!

I'm pleased to announce the v3.10.6-rt3 patch set.


I'm getting this when trying to build:

drivers/md/bcache/request.c: In function 'cached_dev_write_complete':
drivers/md/bcache/request.c:1008:2: error: implicit declaration of 
function 'up_read_non_owner' [-Werror=implicit-function-declaration]

  up_read_non_owner(dc-writeback_lock);
  ^
drivers/md/bcache/request.c: In function 'request_write':
drivers/md/bcache/request.c:1034:2: error: implicit declaration of 
function 'down_read_non_owner' [-Werror=implicit-function-declaration]

  down_read_non_owner(dc-writeback_lock);
  ^
cc1: some warnings being treated as errors

Does not look like *_read_non_owner exist in rwsem_rt.h...
-- Fernando



Changes since v3.10.6-rt2
- the queue can be imported with git quiltimport
- powerpc compiles again. Thanks to Paul Gortmaker for the patch.
- added three patches from v3.8 which fall the wagon on their way to
   3.10. One of them enables RT-FULL on ARM :)
- removed all cpsw patches from the queue. They made it upstream. My
   nfsboot setup seems not to work, lets look at this later.
- make arm/spear compile. Thanks to Felipe Balbi for the patch.
- Add a patch from Corey Minyard to no longer use deprecated
   CONFIG_NO_HZ.
- add the one patch which I added to the last 3.8-rt to get get list_bl
   work again on !SMP  !DEBUG_SPINLOCK
- Spell preemptible properly in Preemptible Kernel (Basic RT) menu
   item. Thanks to Uwe Kleine-König for the patch.
- a patch from John Kacur to avoid a warning in the hpsa.
- a patch for the ppc5200 where the compiler thinks a variable isn't
   initialized and stops compililing due to -Werror

Known issues:

   - SLAB support not working

   - The cpsw network driver shows some issues.

   - ARM  PPC don't fall apart once booted. More testing doesn't
 hurt.

   - bcache with CONFIG_DEBUG_LOCK_ALLOC enabled does not compile.

...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.6-rt17

2012-11-15 Thread Fernando Lopez-Lezcano


On 11/15/2012 10:11 AM, Thomas Gleixner wrote:

On Wed, 14 Nov 2012, Fernando Lopez-Lezcano wrote:


On 11/12/2012 01:28 PM, Thomas Gleixner wrote:

Dear RT Folks,

I'm pleased to announce the 3.6.6-rt17 release. 3.6.6-rt16 is just a
not announced update release to 3.6.6.


Got this:


net/nfc/llcp/llcp.c: In function 'nfc_llcp_register_device':
net/nfc/llcp/llcp.c:1185:24: error: expected expression before '{' token
net/nfc/llcp/llcp.c:1186:35: error: expected expression before '{' token


when building with CONFIG_NFC / CONFIG_NFS_LLCP (builds fine when those are
not set)


Grrr. Damned ignorants.

Does that fix it for you ?


Yes, thanks! I had to tweak the patch but it does make the whole thing 
compile.

-- Fernando



>

Subject: nfc: Use proper lock init functions
From: Thomas Gleixner
Date: Thu, 15 Nov 2012 19:03:20 +0100

Grmbl. Why insist people on using static initializers if there are
proper init functions? Just because they can?

Signed-off-by: Thomas Gleixner
---
  net/nfc/llcp/llcp.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-stable/net/nfc/llcp/llcp.c
===
--- linux-stable.orig/net/nfc/llcp/llcp.c
+++ linux-stable/net/nfc/llcp/llcp.c
@@ -1182,8 +1182,8 @@ int nfc_llcp_register_device(struct nfc_
goto err_rx_wq;
}

-   local->sockets.lock = __RW_LOCK_UNLOCKED(local->sockets.lock);
-   local->connecting_sockets.lock = 
__RW_LOCK_UNLOCKED(local->connecting_sockets.lock);
+   rwlock_init(>sockets.lock);
+   rwlock_init(>connecting_sockets.lock);

nfc_llcp_build_gb(local);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.6-rt17

2012-11-15 Thread Fernando Lopez-Lezcano


On 11/15/2012 10:11 AM, Thomas Gleixner wrote:

On Wed, 14 Nov 2012, Fernando Lopez-Lezcano wrote:


On 11/12/2012 01:28 PM, Thomas Gleixner wrote:

Dear RT Folks,

I'm pleased to announce the 3.6.6-rt17 release. 3.6.6-rt16 is just a
not announced update release to 3.6.6.


Got this:


net/nfc/llcp/llcp.c: In function 'nfc_llcp_register_device':
net/nfc/llcp/llcp.c:1185:24: error: expected expression before '{' token
net/nfc/llcp/llcp.c:1186:35: error: expected expression before '{' token


when building with CONFIG_NFC / CONFIG_NFS_LLCP (builds fine when those are
not set)


Grrr. Damned ignorants.

Does that fix it for you ?


Yes, thanks! I had to tweak the patch but it does make the whole thing 
compile.

-- Fernando





Subject: nfc: Use proper lock init functions
From: Thomas Gleixnert...@linutronix.de
Date: Thu, 15 Nov 2012 19:03:20 +0100

Grmbl. Why insist people on using static initializers if there are
proper init functions? Just because they can?

Signed-off-by: Thomas Gleixnert...@linutronix.de
---
  net/nfc/llcp/llcp.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-stable/net/nfc/llcp/llcp.c
===
--- linux-stable.orig/net/nfc/llcp/llcp.c
+++ linux-stable/net/nfc/llcp/llcp.c
@@ -1182,8 +1182,8 @@ int nfc_llcp_register_device(struct nfc_
goto err_rx_wq;
}

-   local-sockets.lock = __RW_LOCK_UNLOCKED(local-sockets.lock);
-   local-connecting_sockets.lock = 
__RW_LOCK_UNLOCKED(local-connecting_sockets.lock);
+   rwlock_init(local-sockets.lock);
+   rwlock_init(local-connecting_sockets.lock);

nfc_llcp_build_gb(local);


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.6-rt17

2012-11-14 Thread Fernando Lopez-Lezcano


On 11/12/2012 01:28 PM, Thomas Gleixner wrote:

Dear RT Folks,

I'm pleased to announce the 3.6.6-rt17 release. 3.6.6-rt16 is just a
not announced update release to 3.6.6.


Got this:


net/nfc/llcp/llcp.c: In function 'nfc_llcp_register_device':
net/nfc/llcp/llcp.c:1185:24: error: expected expression before '{' token
net/nfc/llcp/llcp.c:1186:35: error: expected expression before '{' token


when building with CONFIG_NFC / CONFIG_NFS_LLCP (builds fine when those 
are not set)

-- Fernando



Changes since 3.6.6-rt16:

* Finally make the NOHZ softirq pending detection work with the new
  softirq scheme.

* Remove the WARN_ON from __raise_softirq_irqoff(). I got the
  information I want for now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] 3.6.6-rt17

2012-11-14 Thread Fernando Lopez-Lezcano


On 11/12/2012 01:28 PM, Thomas Gleixner wrote:

Dear RT Folks,

I'm pleased to announce the 3.6.6-rt17 release. 3.6.6-rt16 is just a
not announced update release to 3.6.6.


Got this:


net/nfc/llcp/llcp.c: In function 'nfc_llcp_register_device':
net/nfc/llcp/llcp.c:1185:24: error: expected expression before '{' token
net/nfc/llcp/llcp.c:1186:35: error: expected expression before '{' token


when building with CONFIG_NFC / CONFIG_NFS_LLCP (builds fine when those 
are not set)

-- Fernando



Changes since 3.6.6-rt16:

* Finally make the NOHZ softirq pending detection work with the new
  softirq scheme.

* Remove the WARN_ON from __raise_softirq_irqoff(). I got the
  information I want for now.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-28 Thread Fernando Lopez-Lezcano

On Mon, 2008-01-28 at 10:26 -0800, Fernando Lopez-Lezcano wrote:
> On Sun, 2008-01-27 at 05:46 +0100, Mike Galbraith wrote:
> > On Sat, 2008-01-26 at 17:59 -0800, Fernando Lopez-Lezcano wrote:
> > 
> > > Hi Ingo... back to testing. 
> > > History:
> > > 
> > > 2.6.23.x + rt has not been very usable for audio applications. 
> > > 2.6.24-rt1: same so far. 
> > > 
> > > Why: Jack keeps printing "delayed..." messages and has xruns which means
> > > that somehow the timing is delayed more than what jack would think
> > > reasonable. As in the case with an old timing bug, the problem
> > > dissapears when booting the kernel with idle=poll. Other users of Planet
> > > CCRMA are able to replicate the behavior, which goes away with idle=poll
> > > or booting the machine with only one core. As a workaround I have been
> > > packaging 2.6.22.x but now I'm not able to use that as the old rt14
> > > patch, suitably tweaked results in a non working kernel. 
> > > 
> > > So it looks like, again, timing is getting skewed when the jack process
> > > jumps between cpus and thus jack sees timing jumps that are just not
> > > happenning. 
> > > 
> > > This is with a build based on 2.6.24 using as a base the latest Fedora
> > > rawhide source package plus 2.6.24-rt1. 
> > 
> > Do you have a simple testcase?  (one which doesn't entail installing
> > ccrma and becoming an audiophile)
> 
> No, I don't at this point. 
> I'll see if I can cook something simple today... (naively thinking that
> some short C code could test for the clock being actually monotonic
> across cpus). 

Sorry, no luck so far in writing something simple that will fail. I
tried testing for the results from repeated calls to clock_gettime (what
jack uses for timing by default) to actually be monotonic, while a
script uses taskset to force a cpu switch and of course got no errors. 

2.6.24-rt1 with idle=poll works fine, without it I get multiple problems
with the jack internal timing, or least that is what it seems to me from
the symptoms. 

-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-28 Thread Fernando Lopez-Lezcano

On Sun, 2008-01-27 at 05:46 +0100, Mike Galbraith wrote:
> On Sat, 2008-01-26 at 17:59 -0800, Fernando Lopez-Lezcano wrote:
> 
> > Hi Ingo... back to testing. 
> > History:
> > 
> > 2.6.23.x + rt has not been very usable for audio applications. 
> > 2.6.24-rt1: same so far. 
> > 
> > Why: Jack keeps printing "delayed..." messages and has xruns which means
> > that somehow the timing is delayed more than what jack would think
> > reasonable. As in the case with an old timing bug, the problem
> > dissapears when booting the kernel with idle=poll. Other users of Planet
> > CCRMA are able to replicate the behavior, which goes away with idle=poll
> > or booting the machine with only one core. As a workaround I have been
> > packaging 2.6.22.x but now I'm not able to use that as the old rt14
> > patch, suitably tweaked results in a non working kernel. 
> > 
> > So it looks like, again, timing is getting skewed when the jack process
> > jumps between cpus and thus jack sees timing jumps that are just not
> > happenning. 
> > 
> > This is with a build based on 2.6.24 using as a base the latest Fedora
> > rawhide source package plus 2.6.24-rt1. 
> 
> Do you have a simple testcase?  (one which doesn't entail installing
> ccrma and becoming an audiophile)

No, I don't at this point. 
I'll see if I can cook something simple today... (naively thinking that
some short C code could test for the clock being actually monotonic
across cpus). 

-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-28 Thread Fernando Lopez-Lezcano

On Mon, 2008-01-28 at 10:26 -0800, Fernando Lopez-Lezcano wrote:
 On Sun, 2008-01-27 at 05:46 +0100, Mike Galbraith wrote:
  On Sat, 2008-01-26 at 17:59 -0800, Fernando Lopez-Lezcano wrote:
  
   Hi Ingo... back to testing. 
   History:
   
   2.6.23.x + rt has not been very usable for audio applications. 
   2.6.24-rt1: same so far. 
   
   Why: Jack keeps printing delayed... messages and has xruns which means
   that somehow the timing is delayed more than what jack would think
   reasonable. As in the case with an old timing bug, the problem
   dissapears when booting the kernel with idle=poll. Other users of Planet
   CCRMA are able to replicate the behavior, which goes away with idle=poll
   or booting the machine with only one core. As a workaround I have been
   packaging 2.6.22.x but now I'm not able to use that as the old rt14
   patch, suitably tweaked results in a non working kernel. 
   
   So it looks like, again, timing is getting skewed when the jack process
   jumps between cpus and thus jack sees timing jumps that are just not
   happenning. 
   
   This is with a build based on 2.6.24 using as a base the latest Fedora
   rawhide source package plus 2.6.24-rt1. 
  
  Do you have a simple testcase?  (one which doesn't entail installing
  ccrma and becoming an audiophile)
 
 No, I don't at this point. 
 I'll see if I can cook something simple today... (naively thinking that
 some short C code could test for the clock being actually monotonic
 across cpus). 

Sorry, no luck so far in writing something simple that will fail. I
tried testing for the results from repeated calls to clock_gettime (what
jack uses for timing by default) to actually be monotonic, while a
script uses taskset to force a cpu switch and of course got no errors. 

2.6.24-rt1 with idle=poll works fine, without it I get multiple problems
with the jack internal timing, or least that is what it seems to me from
the symptoms. 

-- Fernando


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-28 Thread Fernando Lopez-Lezcano

On Sun, 2008-01-27 at 05:46 +0100, Mike Galbraith wrote:
 On Sat, 2008-01-26 at 17:59 -0800, Fernando Lopez-Lezcano wrote:
 
  Hi Ingo... back to testing. 
  History:
  
  2.6.23.x + rt has not been very usable for audio applications. 
  2.6.24-rt1: same so far. 
  
  Why: Jack keeps printing delayed... messages and has xruns which means
  that somehow the timing is delayed more than what jack would think
  reasonable. As in the case with an old timing bug, the problem
  dissapears when booting the kernel with idle=poll. Other users of Planet
  CCRMA are able to replicate the behavior, which goes away with idle=poll
  or booting the machine with only one core. As a workaround I have been
  packaging 2.6.22.x but now I'm not able to use that as the old rt14
  patch, suitably tweaked results in a non working kernel. 
  
  So it looks like, again, timing is getting skewed when the jack process
  jumps between cpus and thus jack sees timing jumps that are just not
  happenning. 
  
  This is with a build based on 2.6.24 using as a base the latest Fedora
  rawhide source package plus 2.6.24-rt1. 
 
 Do you have a simple testcase?  (one which doesn't entail installing
 ccrma and becoming an audiophile)

No, I don't at this point. 
I'll see if I can cook something simple today... (naively thinking that
some short C code could test for the clock being actually monotonic
across cpus). 

-- Fernando


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-26 Thread Fernando Lopez-Lezcano

On Sat, 2007-12-08 at 10:17 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > On Fri, 2007-12-07 at 20:59 +0100, Ingo Molnar wrote:
> > > * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > 
> > > > > Nope, it doesn't still getting "delay" and "xrun" messages galore.
> > > > 
> > > > Attached: configuration and dmesg output booting with idle=poll, 
> > > > reconfirmed that that makes the delay and xrun messages go away.
> > > 
> > > could you try the rolled up patch of various fixlets, ontop of 
> > > current -git? (it might even apply to -rc4) It includes some more 
> > > stuff beyond the ones in the pull request. (still being 
> > > tested/reviewed)
> > 
> > I'll try but it will take me a while to figure git and do a package 
> > build of it...
> 
> if you want to try a vanilla kernel package then pick up the kernel 
> package from Fedora rawhide - this fixlet should show up there within a 
> couple of days, Dave Jones is doing a really nice job of keeping up with 
> latest -git. (and the Fedora kernel has hrtimers and dynticks enabled.)

Hi Ingo... back to testing. 
History:

2.6.23.x + rt has not been very usable for audio applications. 
2.6.24-rt1: same so far. 

Why: Jack keeps printing "delayed..." messages and has xruns which means
that somehow the timing is delayed more than what jack would think
reasonable. As in the case with an old timing bug, the problem
dissapears when booting the kernel with idle=poll. Other users of Planet
CCRMA are able to replicate the behavior, which goes away with idle=poll
or booting the machine with only one core. As a workaround I have been
packaging 2.6.22.x but now I'm not able to use that as the old rt14
patch, suitably tweaked results in a non working kernel. 

So it looks like, again, timing is getting skewed when the jack process
jumps between cpus and thus jack sees timing jumps that are just not
happenning. 

This is with a build based on 2.6.24 using as a base the latest Fedora
rawhide source package plus 2.6.24-rt1. 

-- Fernando

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.24-rt1: timing problems (was [git pull] x86/hrtimer/acpi fixes)

2008-01-26 Thread Fernando Lopez-Lezcano

On Sat, 2007-12-08 at 10:17 +0100, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  On Fri, 2007-12-07 at 20:59 +0100, Ingo Molnar wrote:
   * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
   
 Nope, it doesn't still getting delay and xrun messages galore.

Attached: configuration and dmesg output booting with idle=poll, 
reconfirmed that that makes the delay and xrun messages go away.
   
   could you try the rolled up patch of various fixlets, ontop of 
   current -git? (it might even apply to -rc4) It includes some more 
   stuff beyond the ones in the pull request. (still being 
   tested/reviewed)
  
  I'll try but it will take me a while to figure git and do a package 
  build of it...
 
 if you want to try a vanilla kernel package then pick up the kernel 
 package from Fedora rawhide - this fixlet should show up there within a 
 couple of days, Dave Jones is doing a really nice job of keeping up with 
 latest -git. (and the Fedora kernel has hrtimers and dynticks enabled.)

Hi Ingo... back to testing. 
History:

2.6.23.x + rt has not been very usable for audio applications. 
2.6.24-rt1: same so far. 

Why: Jack keeps printing delayed... messages and has xruns which means
that somehow the timing is delayed more than what jack would think
reasonable. As in the case with an old timing bug, the problem
dissapears when booting the kernel with idle=poll. Other users of Planet
CCRMA are able to replicate the behavior, which goes away with idle=poll
or booting the machine with only one core. As a workaround I have been
packaging 2.6.22.x but now I'm not able to use that as the old rt14
patch, suitably tweaked results in a non working kernel. 

So it looks like, again, timing is getting skewed when the jack process
jumps between cpus and thus jack sees timing jumps that are just not
happenning. 

This is with a build based on 2.6.24 using as a base the latest Fedora
rawhide source package plus 2.6.24-rt1. 

-- Fernando


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 20:59 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > > Nope, it doesn't still getting "delay" and "xrun" messages galore.
> > 
> > Attached: configuration and dmesg output booting with idle=poll, 
> > reconfirmed that that makes the delay and xrun messages go away.
> 
> could you try the rolled up patch of various fixlets, ontop of current 
> -git? (it might even apply to -rc4) It includes some more stuff beyond 
> the ones in the pull request. (still being tested/reviewed)

I'll try but it will take me a while to figure git and do a package
build of it...

-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 19:59 +0100, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > Ingo, I was about to post about timer problems in 2.6.23.9+rt12 when I 
> > saw this. Would this be related / should I test / will this solve 
> > everything? :-)
> > 
> > What I'm seeing is jack "delays" that go away if I boot with 
> > "idle=poll", just like it was happening a long time ago. Smells like 
> > 'time of day' glitches when the process switches cpus (this is on a 
> > dual core intel laptop).
> 
> does it go away with hpet=disable as well? If yes then there could be a 
> relation. If not then it's something else and we need to debug it.

Nope, it doesn't still getting "delay" and "xrun" messages galore. 
-- Fernando


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 19:36 +0100, Ingo Molnar wrote:
> Linus, please pull the latest x86 git tree from:
> 
>git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
> 
> This contains 3 x86/hrtimer/hpet/ACPI fixes from Thomas: the ACPI fix 
> has been ACK-ed by Venki. Build and boot tested on various boxes.

Ingo, I was about to post about timer problems in 2.6.23.9+rt12 when I
saw this. Would this be related / should I test / will this solve
everything? :-)

What I'm seeing is jack "delays" that go away if I boot with
"idle=poll", just like it was happening a long time ago. Smells like
'time of day' glitches when the process switches cpus (this is on a dual
core intel laptop). 

Does not happen in 2.6.22.10 + rt9 - well, I do see very occassional
delay warnings there as well. 

I also see occassional complete hangs but I don't have a way of knowing
what triggers that. 

-- Fernando


> -->
> Thomas Gleixner (3):
>   hrtimers: avoid overflow for large relative timeouts
>   clockevents: warn once when program_event() is called with negative 
> expiry
>   ACPI: move timer broadcast before busmaster disable
> 
>  drivers/acpi/processor_idle.c |   19 ++-
>  kernel/hrtimer.c  |8 
>  kernel/time/clockevents.c |5 +
>  3 files changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index b1fbee3..2fe34cc 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -531,6 +531,11 @@ static void acpi_processor_idle(void)
>  
>   case ACPI_STATE_C3:
>   /*
> +  * Must be done before busmaster disable as we might
> +  * need to access HPET !
> +  */
> + acpi_state_timer_broadcast(pr, cx, 1);
> + /*
>* disable bus master
>* bm_check implies we need ARB_DIS
>* !bm_check implies we need cache flush
> @@ -557,7 +562,6 @@ static void acpi_processor_idle(void)
>   /* Get start time (ticks) */
>   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>   /* Invoke C3 */
> - acpi_state_timer_broadcast(pr, cx, 1);
>   /* Tell the scheduler that we are going deep-idle: */
>   sched_clock_idle_sleep_event();
>   acpi_cstate_enter(cx);
> @@ -1401,9 +1405,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device 
> *dev,
>   if (acpi_idle_suspend)
>   return(acpi_idle_enter_c1(dev, state));
>  
> - if (pr->flags.bm_check)
> - acpi_idle_update_bm_rld(pr, cx);
> -
>   local_irq_disable();
>   current_thread_info()->status &= ~TS_POLLING;
>   /*
> @@ -1418,13 +1419,21 @@ static int acpi_idle_enter_simple(struct 
> cpuidle_device *dev,
>   return 0;
>   }
>  
> + /*
> +  * Must be done before busmaster disable as we might need to
> +  * access HPET !
> +  */
> + acpi_state_timer_broadcast(pr, cx, 1);
> +
> + if (pr->flags.bm_check)
> + acpi_idle_update_bm_rld(pr, cx);
> +
>   if (cx->type == ACPI_STATE_C3)
>   ACPI_FLUSH_CPU_CACHE();
>  
>   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>   /* Tell the scheduler that we are going deep-idle: */
>   sched_clock_idle_sleep_event();
> - acpi_state_timer_broadcast(pr, cx, 1);
>   acpi_idle_do_entry(cx);
>   t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
>  
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 22a2514..e65dd0b 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -850,6 +850,14 @@ hrtimer_start(struct hrtimer *timer, ktime_t tim, const 
> enum hrtimer_mode mode)
>  #ifdef CONFIG_TIME_LOW_RES
>   tim = ktime_add(tim, base->resolution);
>  #endif
> + /*
> +  * Careful here: User space might have asked for a
> +  * very long sleep, so the add above might result in a
> +  * negative number, which enqueues the timer in front
> +  * of the queue.
> +  */
> + if (tim.tv64 < 0)
> + tim.tv64 = KTIME_MAX;
>   }
>   timer->expires = tim;
>  
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index 822beeb..5fb139f 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -78,6 +78,11 @@ int clockevents_program_event(struct clock_event_device 
> *dev, ktime_t expires,
>   unsigned long long clc;
>   int64_t delta;
>  
> + if (unlikely(expires.tv64 < 0)) {
> + WARN_ON_ONCE(1);
> + return -ETIME;
> + }
> +
>   delta = ktime_to_ns(ktime_sub(expires, now));
>  
>   if (delta <= 0)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 19:36 +0100, Ingo Molnar wrote:
 Linus, please pull the latest x86 git tree from:
 
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
 
 This contains 3 x86/hrtimer/hpet/ACPI fixes from Thomas: the ACPI fix 
 has been ACK-ed by Venki. Build and boot tested on various boxes.

Ingo, I was about to post about timer problems in 2.6.23.9+rt12 when I
saw this. Would this be related / should I test / will this solve
everything? :-)

What I'm seeing is jack delays that go away if I boot with
idle=poll, just like it was happening a long time ago. Smells like
'time of day' glitches when the process switches cpus (this is on a dual
core intel laptop). 

Does not happen in 2.6.22.10 + rt9 - well, I do see very occassional
delay warnings there as well. 

I also see occassional complete hangs but I don't have a way of knowing
what triggers that. 

-- Fernando


 --
 Thomas Gleixner (3):
   hrtimers: avoid overflow for large relative timeouts
   clockevents: warn once when program_event() is called with negative 
 expiry
   ACPI: move timer broadcast before busmaster disable
 
  drivers/acpi/processor_idle.c |   19 ++-
  kernel/hrtimer.c  |8 
  kernel/time/clockevents.c |5 +
  3 files changed, 27 insertions(+), 5 deletions(-)
 
 diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
 index b1fbee3..2fe34cc 100644
 --- a/drivers/acpi/processor_idle.c
 +++ b/drivers/acpi/processor_idle.c
 @@ -531,6 +531,11 @@ static void acpi_processor_idle(void)
  
   case ACPI_STATE_C3:
   /*
 +  * Must be done before busmaster disable as we might
 +  * need to access HPET !
 +  */
 + acpi_state_timer_broadcast(pr, cx, 1);
 + /*
* disable bus master
* bm_check implies we need ARB_DIS
* !bm_check implies we need cache flush
 @@ -557,7 +562,6 @@ static void acpi_processor_idle(void)
   /* Get start time (ticks) */
   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
   /* Invoke C3 */
 - acpi_state_timer_broadcast(pr, cx, 1);
   /* Tell the scheduler that we are going deep-idle: */
   sched_clock_idle_sleep_event();
   acpi_cstate_enter(cx);
 @@ -1401,9 +1405,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device 
 *dev,
   if (acpi_idle_suspend)
   return(acpi_idle_enter_c1(dev, state));
  
 - if (pr-flags.bm_check)
 - acpi_idle_update_bm_rld(pr, cx);
 -
   local_irq_disable();
   current_thread_info()-status = ~TS_POLLING;
   /*
 @@ -1418,13 +1419,21 @@ static int acpi_idle_enter_simple(struct 
 cpuidle_device *dev,
   return 0;
   }
  
 + /*
 +  * Must be done before busmaster disable as we might need to
 +  * access HPET !
 +  */
 + acpi_state_timer_broadcast(pr, cx, 1);
 +
 + if (pr-flags.bm_check)
 + acpi_idle_update_bm_rld(pr, cx);
 +
   if (cx-type == ACPI_STATE_C3)
   ACPI_FLUSH_CPU_CACHE();
  
   t1 = inl(acpi_gbl_FADT.xpm_timer_block.address);
   /* Tell the scheduler that we are going deep-idle: */
   sched_clock_idle_sleep_event();
 - acpi_state_timer_broadcast(pr, cx, 1);
   acpi_idle_do_entry(cx);
   t2 = inl(acpi_gbl_FADT.xpm_timer_block.address);
  
 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
 index 22a2514..e65dd0b 100644
 --- a/kernel/hrtimer.c
 +++ b/kernel/hrtimer.c
 @@ -850,6 +850,14 @@ hrtimer_start(struct hrtimer *timer, ktime_t tim, const 
 enum hrtimer_mode mode)
  #ifdef CONFIG_TIME_LOW_RES
   tim = ktime_add(tim, base-resolution);
  #endif
 + /*
 +  * Careful here: User space might have asked for a
 +  * very long sleep, so the add above might result in a
 +  * negative number, which enqueues the timer in front
 +  * of the queue.
 +  */
 + if (tim.tv64  0)
 + tim.tv64 = KTIME_MAX;
   }
   timer-expires = tim;
  
 diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
 index 822beeb..5fb139f 100644
 --- a/kernel/time/clockevents.c
 +++ b/kernel/time/clockevents.c
 @@ -78,6 +78,11 @@ int clockevents_program_event(struct clock_event_device 
 *dev, ktime_t expires,
   unsigned long long clc;
   int64_t delta;
  
 + if (unlikely(expires.tv64  0)) {
 + WARN_ON_ONCE(1);
 + return -ETIME;
 + }
 +
   delta = ktime_to_ns(ktime_sub(expires, now));
  
   if (delta = 0)
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 19:59 +0100, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  Ingo, I was about to post about timer problems in 2.6.23.9+rt12 when I 
  saw this. Would this be related / should I test / will this solve 
  everything? :-)
  
  What I'm seeing is jack delays that go away if I boot with 
  idle=poll, just like it was happening a long time ago. Smells like 
  'time of day' glitches when the process switches cpus (this is on a 
  dual core intel laptop).
 
 does it go away with hpet=disable as well? If yes then there could be a 
 relation. If not then it's something else and we need to debug it.

Nope, it doesn't still getting delay and xrun messages galore. 
-- Fernando


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [git pull] x86/hrtimer/acpi fixes

2007-12-07 Thread Fernando Lopez-Lezcano

On Fri, 2007-12-07 at 20:59 +0100, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
   Nope, it doesn't still getting delay and xrun messages galore.
  
  Attached: configuration and dmesg output booting with idle=poll, 
  reconfirmed that that makes the delay and xrun messages go away.
 
 could you try the rolled up patch of various fixlets, ontop of current 
 -git? (it might even apply to -rc4) It includes some more stuff beyond 
 the ones in the pull request. (still being tested/reviewed)

I'll try but it will take me a while to figure git and do a package
build of it...

-- Fernando


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.23.9-rt12: BUGs

2007-11-28 Thread Fernando Lopez-Lezcano

> I'll try rt12...
> 
> Same problems in rt12, getting lots of "delay of xxx usecs exceeds
> estimated spare time of ; restart" in jackd (on my T61 Lenovo laptop
> running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
> the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

While trying out 2.6.23.9-rt12 I got the three attached bugs. 
Also attached is the output of dmesg for a clean boot on the machine. 

Jack displays timing problems, similar to when there were timing 
issues with dual processor machines. Still investigating as time 
permits. 

-- Fernando

 apparently while suspending ---

Nov 27 20:06:01 localhost kernel: Stopping tasks ... done.
Nov 27 20:06:01 localhost kernel: Suspending console(s)
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Stopping disk
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :15:00.2 disabled
Nov 27 20:06:01 localhost kernel: eth%d: Going into suspend...
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :03:00.0 disabled
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1f.2 disabled
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.2 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1b.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:19.0 disabled
Nov 27 20:06:01 localhost kernel: Disabling non-boot CPUs ...
Nov 27 20:06:01 localhost kernel: Breaking affinity for irq 218
Nov 27 20:06:01 localhost kernel: CPU 1 is now offline
Nov 27 20:06:01 localhost kernel: SMP alternatives: switching to UP code
Nov 27 20:06:01 localhost kernel: BUG: sleeping function called from invalid context pm-suspend(3740) at kernel/rtmutex.c:637
Nov 27 20:06:01 localhost gnome-power-manager: (nando) DBUS timed out, but recovering
Nov 27 20:06:01 localhost kernel: in_atomic():0 [], irqs_disabled():1
Nov 27 20:06:01 localhost kernel:  [] __rt_spin_lock+0x21/0x3d
Nov 27 20:06:01 localhost kernel:  [] free_pages_bulk+0x28/0x188
Nov 27 20:06:01 localhost kernel:  [] __drain_pages+0x48/0x69
Nov 27 20:06:01 localhost kernel:  [] page_alloc_cpu_notify+0x1e/0x3d
Nov 27 20:06:01 localhost kernel:  [] notifier_call_chain+0x2a/0x47
Nov 27 20:06:01 localhost kernel:  [] raw_notifier_call_chain+0x17/0x1a
Nov 27 20:06:01 localhost kernel:  [] _cpu_down+0x184/0x242
Nov 27 20:06:01 localhost kernel:  [] disable_nonboot_cpus+0x4e/0xd2
Nov 27 20:06:01 localhost kernel:  [] acpi_sleep_prepare+0x41/0x48
Nov 27 20:06:01 localhost kernel:  [] suspend_devices_and_enter+0x64/0x96
Nov 27 20:06:01 localhost kernel:  [] enter_state+0x11b/0x193
Nov 27 20:06:01 localhost kernel:  [] state_store+0x8e/0xa2
Nov 27 20:06:01 localhost kernel:  [] state_store+0x0/0xa2
Nov 27 20:06:01 localhost kernel:  [] subsys_attr_store+0x27/0x2b
Nov 27 20:06:01 localhost kernel:  [] sysfs_write_file+0xa6/0xd9
Nov 27 20:06:01 localhost kernel:  [] sysfs_write_file+0x0/0xd9
Nov 27 20:06:01 localhost kernel:  [] vfs_write+0xa8/0x15a
Nov 27 20:06:01 localhost gnome-power-manager: (nando) Resuming computer
Nov 27 20:06:01 localhost kernel:  [] sys_write+0x41/0x67
Nov 27 20:06:01 localhost kernel:  [] syscall_call+0x7/0xb
Nov 27 20:06:01 localhost kernel:  [] xfrm_send_policy_notify+0x44f/0x4f4
Nov 27 20:06:01 localhost NetworkManager:   Waking up from sleep. 
Nov 27 20:06:01 localhost kernel:  ===
Nov 27 20:06:01 localhost NetworkManager:   Deactivating device eth1. 
Nov 27 20:06:01 localhost kernel: CPU1 is down
Nov 27 20:06:01 localhost NetworkManager:   eth1: Device is fully-supported using driver 'e1000'. 
Nov 27 20:06:01 localhost kernel: Intel machine check architecture supported.
Nov 27 20:06:01 localhost NetworkManager:   nm_device_init(): waiting for device's worker thread to start 
Nov 27 20:06:01 localhost kernel: Intel machine check reporting enabled on CPU#0.

2.6.23.9-rt12: BUGs

2007-11-28 Thread Fernando Lopez-Lezcano

 I'll try rt12...
 
 Same problems in rt12, getting lots of delay of xxx usecs exceeds
 estimated spare time of ; restart in jackd (on my T61 Lenovo laptop
 running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
 the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

While trying out 2.6.23.9-rt12 I got the three attached bugs. 
Also attached is the output of dmesg for a clean boot on the machine. 

Jack displays timing problems, similar to when there were timing 
issues with dual processor machines. Still investigating as time 
permits. 

-- Fernando

 apparently while suspending ---

Nov 27 20:06:01 localhost kernel: Stopping tasks ... done.
Nov 27 20:06:01 localhost kernel: Suspending console(s)
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 27 20:06:01 localhost kernel: sd 0:0:0:0: [sda] Stopping disk
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :15:00.2 disabled
Nov 27 20:06:01 localhost kernel: eth%d: Going into suspend...
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :03:00.0 disabled
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost pcscd: winscard_msg_srv.c:238:SHMProcessEventsContext() select returns with failure: Interrupted system call
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1f.2 disabled
Nov 27 20:06:01 localhost pcscd: winscard_svc.c:222:ContextThread() Error in SHMProcessEventsContext
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.2 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1d.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1b.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.7 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.1 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:1a.0 disabled
Nov 27 20:06:01 localhost kernel: ACPI: PCI interrupt for device :00:19.0 disabled
Nov 27 20:06:01 localhost kernel: Disabling non-boot CPUs ...
Nov 27 20:06:01 localhost kernel: Breaking affinity for irq 218
Nov 27 20:06:01 localhost kernel: CPU 1 is now offline
Nov 27 20:06:01 localhost kernel: SMP alternatives: switching to UP code
Nov 27 20:06:01 localhost kernel: BUG: sleeping function called from invalid context pm-suspend(3740) at kernel/rtmutex.c:637
Nov 27 20:06:01 localhost gnome-power-manager: (nando) DBUS timed out, but recovering
Nov 27 20:06:01 localhost kernel: in_atomic():0 [], irqs_disabled():1
Nov 27 20:06:01 localhost kernel:  [c062d88d] __rt_spin_lock+0x21/0x3d
Nov 27 20:06:01 localhost kernel:  [c0466b68] free_pages_bulk+0x28/0x188
Nov 27 20:06:01 localhost kernel:  [c0466d10] __drain_pages+0x48/0x69
Nov 27 20:06:01 localhost kernel:  [c0466d4f] page_alloc_cpu_notify+0x1e/0x3d
Nov 27 20:06:01 localhost kernel:  [c062faa4] notifier_call_chain+0x2a/0x47
Nov 27 20:06:01 localhost kernel:  [c0439520] raw_notifier_call_chain+0x17/0x1a
Nov 27 20:06:01 localhost kernel:  [c044a805] _cpu_down+0x184/0x242
Nov 27 20:06:01 localhost kernel:  [c044aa6c] disable_nonboot_cpus+0x4e/0xd2
Nov 27 20:06:01 localhost kernel:  [c0533915] acpi_sleep_prepare+0x41/0x48
Nov 27 20:06:01 localhost kernel:  [c044f213] suspend_devices_and_enter+0x64/0x96
Nov 27 20:06:01 localhost kernel:  [c044f360] enter_state+0x11b/0x193
Nov 27 20:06:01 localhost kernel:  [c044f466] state_store+0x8e/0xa2
Nov 27 20:06:01 localhost kernel:  [c044f3d8] state_store+0x0/0xa2
Nov 27 20:06:01 localhost kernel:  [c04bb067] subsys_attr_store+0x27/0x2b
Nov 27 20:06:01 localhost kernel:  [c04bb2a9] sysfs_write_file+0xa6/0xd9
Nov 27 20:06:01 localhost kernel:  [c04bb203] sysfs_write_file+0x0/0xd9
Nov 27 20:06:01 localhost kernel:  [c04826eb] vfs_write+0xa8/0x15a
Nov 27 20:06:01 localhost gnome-power-manager: (nando) Resuming computer
Nov 27 20:06:01 localhost kernel:  [c0482d14] sys_write+0x41/0x67
Nov 27 20:06:01 localhost kernel:  [c040514a] syscall_call+0x7/0xb
Nov 27 20:06:01 localhost kernel:  [c062] xfrm_send_policy_notify+0x44f/0x4f4
Nov 27 20:06:01 localhost NetworkManager: info  Waking up from sleep. 
Nov 27 20:06:01 localhost kernel:  ===
Nov 27 20:06:01 localhost NetworkManager: info  Deactivating device eth1. 
Nov 27 20:06:01 localhost kernel: CPU1 is down
Nov 27 20:06:01 localhost NetworkManager: info  eth1: Device is fully-supported using driver 'e1000'. 
Nov 27 20:06:01 localhost kernel: Intel machine check architecture supported.
Nov 27 20:06:01 localhost

Re: 2.6.22.14 + rt? vs 2.6.23.9-rt12

2007-11-27 Thread Fernando Lopez-Lezcano

On Tue, 2007-11-27 at 17:02 -0800, Fernando Lopez-Lezcano wrote:
> Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
> having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
> agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
> is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
> and the latest I managed to repatch and run successfully is 2.6.22.10. I
> did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
> the boot... takes... a... long... time... I can send my .14 patch off
> the list if you want/need it. 
> 
> [in my 2.6.23.1-rt11 tests I am getting "delayed..." messages from
> jackd, smells like a problem with internal timing in the kernel]
> 
> I'll try rt12...

Same problems in rt12, getting lots of "delay of xxx usecs exceeds
estimated spare time of ; restart" in jackd (on my T61 Lenovo laptop
running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22.14 + rt?

2007-11-27 Thread Fernando Lopez-Lezcano

Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
and the latest I managed to repatch and run successfully is 2.6.22.10. I
did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
the boot... takes... a... long... time... I can send my .14 patch off
the list if you want/need it. 

[in my 2.6.23.1-rt11 tests I am getting "delayed..." messages from
jackd, smells like a problem with internal timing in the kernel]

I'll try rt12...
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22.14 + rt?

2007-11-27 Thread Fernando Lopez-Lezcano

Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
and the latest I managed to repatch and run successfully is 2.6.22.10. I
did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
the boot... takes... a... long... time... I can send my .14 patch off
the list if you want/need it. 

[in my 2.6.23.1-rt11 tests I am getting delayed... messages from
jackd, smells like a problem with internal timing in the kernel]

I'll try rt12...
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22.14 + rt? vs 2.6.23.9-rt12

2007-11-27 Thread Fernando Lopez-Lezcano

On Tue, 2007-11-27 at 17:02 -0800, Fernando Lopez-Lezcano wrote:
 Hi Ingo... any hope of an updated realtime patch for 2.6.22.14? I'm
 having problems with 2.6.23.1 + rt11 (I spent the morning rediffing
 agains 2.6.23.9 and just _now_ pressed reload in my browser and there it
 is..., rt12 for 2.6.23.9!, argh! :-) and wanted to compare with 2.6.22.x
 and the latest I managed to repatch and run successfully is 2.6.22.10. I
 did 2.6.22.14 in the afternoon but I obviously bungled it somewhere as
 the boot... takes... a... long... time... I can send my .14 patch off
 the list if you want/need it. 
 
 [in my 2.6.23.1-rt11 tests I am getting delayed... messages from
 jackd, smells like a problem with internal timing in the kernel]
 
 I'll try rt12...

Same problems in rt12, getting lots of delay of xxx usecs exceeds
estimated spare time of ; restart in jackd (on my T61 Lenovo laptop
running fc7). Does not happen with 2.6.22.10 + rt9. This is both with
the internal snd-hda-intel card and a pcmcia rme hdsp multiface. 

-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PlanetCCRMA] Re: 2.6.22.6 + rt9: suspend/hibernate not working

2007-09-06 Thread Fernando Lopez-Lezcano

On Thu, 2007-09-06 at 12:55 -0700, Fernando Lopez-Lezcano wrote:
> On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
> > On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
> > > On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> > > > Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> > > > I confirmed) that the latest rt kernel I released has broken suspend
> > > > (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> > > > configuration files are virtual clones as far as possible of the
> > > > standard Fedora kernel config files). 
> > > > 
> > > > I don't know where to start debugging this. When suspend is initiated it
> > > > freezes with a "Stopping tasks ... " message in the text console - a
> > > > hard power cycle is the only way to get the machine back to normal. 
> > > > 
> > > > kernel/power/process.c seems to contain that string in the
> > > > freeze_processes function so it looks like the freezer is not freezing
> > > > tasks as no "done" message is ever printed. 
> > > > 
> > > > What could we do to help?
> > > 
> > > If you have high resolution timers enabled you could try disabling it,
> > > and see if the problem persists .
> > 
> > The problem is still there ("Stopping tasks ... " and nothing
> > afterwards). 
> 
> Looks like it was a known problem (sorry about the noise), see:
>   http://lkml.org/lkml/2007/8/25/117
> 
> It does fix the problem here as well. 
> Ingo: is this still the right fix for 2.6.22.6 + rt9?

I'm seeing this while going into suspend:

...
Disabling non-boot CPUs ...
Breaking affinity for irq 218
CPU 1 is now offline
SMP alternatives: switching to UP code
BUG: sleeping function called from invalid context pm-suspend(3676) at
kernel/rtmutex.c:636
in_atomic():0 [], irqs_disabled():1
 [] __rt_spin_lock+0x21/0x3d
 [] free_pages_bulk+0x28/0x188
 [] migration_call+0x3a5/0x3be
 [] __drain_pages+0x48/0x69
 [] page_alloc_cpu_notify+0x15/0x2b
 [] notifier_call_chain+0x2a/0x47
 [] raw_notifier_call_chain+0x17/0x1a
 [] _cpu_down+0x17a/0x238
 [] printk+0x1f/0x92
 [] disable_nonboot_cpus+0x4e/0xd2
 [] enter_state+0x116/0x1d6
 [] state_store+0xc9/0xe0
 [] state_store+0x0/0xe0
 [] subsys_attr_store+0x27/0x2b
 [] sysfs_write_file+0x9a/0xbd
 [] sysfs_write_file+0x0/0xbd
 [] vfs_write+0xa8/0x15a
 [] sys_write+0x41/0x67
 [] syscall_call+0x7/0xb
 ===
CPU1 is down
PM: Entering mem sleep
thinkpad_acpi thinkpad_acpi: LATE suspend
...

I'm attaching the whole compressed dmesg output to put it in context.
-- Fernando



dmesg.1.bz2
Description: application/bzip

Re: 2.6.22.6 + rt9: suspend/hibernate not working

2007-09-06 Thread Fernando Lopez-Lezcano

On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
> On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
> > On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
> > > Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
> > > I confirmed) that the latest rt kernel I released has broken suspend
> > > (tested on fc6 & fc7, stock Fedora kernel works fine - the rt
> > > configuration files are virtual clones as far as possible of the
> > > standard Fedora kernel config files). 
> > > 
> > > I don't know where to start debugging this. When suspend is initiated it
> > > freezes with a "Stopping tasks ... " message in the text console - a
> > > hard power cycle is the only way to get the machine back to normal. 
> > > 
> > > kernel/power/process.c seems to contain that string in the
> > > freeze_processes function so it looks like the freezer is not freezing
> > > tasks as no "done" message is ever printed. 
> > > 
> > > What could we do to help?
> > 
> > If you have high resolution timers enabled you could try disabling it,
> > and see if the problem persists .
> 
> The problem is still there ("Stopping tasks ... " and nothing
> afterwards). 

Looks like it was a known problem (sorry about the noise), see:
  http://lkml.org/lkml/2007/8/25/117

It does fix the problem here as well. 
Ingo: is this still the right fix for 2.6.22.6 + rt9?

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22.6 + rt9: suspend/hibernate not working

2007-09-06 Thread Fernando Lopez-Lezcano

On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
 On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
  On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
   Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
   I confirmed) that the latest rt kernel I released has broken suspend
   (tested on fc6  fc7, stock Fedora kernel works fine - the rt
   configuration files are virtual clones as far as possible of the
   standard Fedora kernel config files). 
   
   I don't know where to start debugging this. When suspend is initiated it
   freezes with a Stopping tasks ...  message in the text console - a
   hard power cycle is the only way to get the machine back to normal. 
   
   kernel/power/process.c seems to contain that string in the
   freeze_processes function so it looks like the freezer is not freezing
   tasks as no done message is ever printed. 
   
   What could we do to help?
  
  If you have high resolution timers enabled you could try disabling it,
  and see if the problem persists .
 
 The problem is still there (Stopping tasks ...  and nothing
 afterwards). 

Looks like it was a known problem (sorry about the noise), see:
  http://lkml.org/lkml/2007/8/25/117

It does fix the problem here as well. 
Ingo: is this still the right fix for 2.6.22.6 + rt9?

-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PlanetCCRMA] Re: 2.6.22.6 + rt9: suspend/hibernate not working

2007-09-06 Thread Fernando Lopez-Lezcano

On Thu, 2007-09-06 at 12:55 -0700, Fernando Lopez-Lezcano wrote:
 On Thu, 2007-09-06 at 11:42 -0700, Fernando Lopez-Lezcano wrote:
  On Tue, 2007-09-04 at 17:15 -0700, Daniel Walker wrote:
   On Tue, 2007-09-04 at 17:12 -0700, Fernando Lopez-Lezcano wrote:
Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
I confirmed) that the latest rt kernel I released has broken suspend
(tested on fc6  fc7, stock Fedora kernel works fine - the rt
configuration files are virtual clones as far as possible of the
standard Fedora kernel config files). 

I don't know where to start debugging this. When suspend is initiated it
freezes with a Stopping tasks ...  message in the text console - a
hard power cycle is the only way to get the machine back to normal. 

kernel/power/process.c seems to contain that string in the
freeze_processes function so it looks like the freezer is not freezing
tasks as no done message is ever printed. 

What could we do to help?
   
   If you have high resolution timers enabled you could try disabling it,
   and see if the problem persists .
  
  The problem is still there (Stopping tasks ...  and nothing
  afterwards). 
 
 Looks like it was a known problem (sorry about the noise), see:
   http://lkml.org/lkml/2007/8/25/117
 
 It does fix the problem here as well. 
 Ingo: is this still the right fix for 2.6.22.6 + rt9?

I'm seeing this while going into suspend:

...
Disabling non-boot CPUs ...
Breaking affinity for irq 218
CPU 1 is now offline
SMP alternatives: switching to UP code
BUG: sleeping function called from invalid context pm-suspend(3676) at
kernel/rtmutex.c:636
in_atomic():0 [], irqs_disabled():1
 [c061c13d] __rt_spin_lock+0x21/0x3d
 [c0461b2c] free_pages_bulk+0x28/0x188
 [c042653d] migration_call+0x3a5/0x3be
 [c0461cd4] __drain_pages+0x48/0x69
 [c0461d0a] page_alloc_cpu_notify+0x15/0x2b
 [c061e083] notifier_call_chain+0x2a/0x47
 [c0435fe8] raw_notifier_call_chain+0x17/0x1a
 [c0446869] _cpu_down+0x17a/0x238
 [c042b4e5] printk+0x1f/0x92
 [c0446ad0] disable_nonboot_cpus+0x4e/0xd2
 [c044b673] enter_state+0x116/0x1d6
 [c044b831] state_store+0xc9/0xe0
 [c044b768] state_store+0x0/0xe0
 [c04b44cf] subsys_attr_store+0x27/0x2b
 [c04b45db] sysfs_write_file+0x9a/0xbd
 [c04b4541] sysfs_write_file+0x0/0xbd
 [c047bc97] vfs_write+0xa8/0x15a
 [c047c2c0] sys_write+0x41/0x67
 [c0404ef6] syscall_call+0x7/0xb
 ===
CPU1 is down
PM: Entering mem sleep
thinkpad_acpi thinkpad_acpi: LATE suspend
...

I'm attaching the whole compressed dmesg output to put it in context.
-- Fernando



dmesg.1.bz2
Description: application/bzip

2.6.22.6 + rt9: suspend/hibernate not working

2007-09-04 Thread Fernando Lopez-Lezcano

Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
I confirmed) that the latest rt kernel I released has broken suspend
(tested on fc6 & fc7, stock Fedora kernel works fine - the rt
configuration files are virtual clones as far as possible of the
standard Fedora kernel config files). 

I don't know where to start debugging this. When suspend is initiated it
freezes with a "Stopping tasks ... " message in the text console - a
hard power cycle is the only way to get the machine back to normal. 

kernel/power/process.c seems to contain that string in the
freeze_processes function so it looks like the freezer is not freezing
tasks as no "done" message is ever printed. 

What could we do to help?
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22.6 + rt9: suspend/hibernate not working

2007-09-04 Thread Fernando Lopez-Lezcano

Hi Ingo... I'm getting reports from some of my Planet CCRMA users (which
I confirmed) that the latest rt kernel I released has broken suspend
(tested on fc6  fc7, stock Fedora kernel works fine - the rt
configuration files are virtual clones as far as possible of the
standard Fedora kernel config files). 

I don't know where to start debugging this. When suspend is initiated it
freezes with a Stopping tasks ...  message in the text console - a
hard power cycle is the only way to get the machine back to normal. 

kernel/power/process.c seems to contain that string in the
freeze_processes function so it looks like the freezer is not freezing
tasks as no done message is ever printed. 

What could we do to help?
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fwd: [PlanetCCRMA] atl1 driver; sleeping function]

2007-07-31 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-31 at 10:51 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > Hi Ingo, I'm forwading this report from a Planet CCRMA user, this is 
> > happening to him with 2.6.21.6-rt21...
> 
> thanks!

Thanks for the patch!
Looks like it fixed the problem Matt was having...
-- Fernando

 Forwarded Message 
From: Matt Barbe
To: Fernando Lopez-Lezcano
Cc: [EMAIL PROTECTED]
Subject: Re: [PlanetCCRMA] atl1 driver; sleeping function
Date: Tue, 31 Jul 2007 22:50:28 -0400

The newly patched atl1 driver seems to be working fine.  I tried it
also in rt21.3 (that's the latest src.rpm in
http://ccrma.stanford.edu/planetccrma/mirror/all/linux/SRPMS/), and it
also worked fine -- I need kernel-rt-devel because I do use apps that
need nvidia drivers, and those are working fine in rt21.3 as well.  I
can keep you up to date if anything negative happens.

Thanks again,

Matt

> 
> > BUG: sleeping function called from invalid context IRQ-219(2243) at
> > kernel/rtmutex.c:613
> > in_atomic():0 [], irqs_disabled():1
> >  [] dump_trace+0x64/0x105
> >  [] show_trace_log_lvl+0x18/0x2c
> >  [] show_trace+0xf/0x11
> >  [] dump_stack+0x12/0x14
> >  [] __rt_spin_lock+0x21/0x3d
> >  [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
> >  [] dev_hard_start_xmit+0x1c6/0x225
> >  [] __qdisc_run+0xb7/0x1cf
> 
> could you try the patch below, does it fix the problem? The atl1 driver 
> uses raw irq flags in combination with a spinlock that is a sleeping 
> lock on -rt. (this is valid code on upstream, fortunately the -rt fix is 
> also a cleanup and a small code reduction enhancement on upstream, so 
> there's no problem pushing such fixes upstream.)
> 
>   Ingo
> 
> --->
> Subject: [patch] drivers/net/atl1/atl1_main.c: use spin_trylock_irqsave()
> From: Ingo Molnar <[EMAIL PROTECTED]>
> 
> use the simpler spin_trylock_irqsave() API to get the adapter lock.
> 
> [ this is also a fix for -rt where adapter->lock is a sleeping lock. ]
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> ---
>  drivers/net/atl1/atl1_main.c |4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> Index: linux-rt-rebase.q/drivers/net/atl1/atl1_main.c
> ===
> --- linux-rt-rebase.q.orig/drivers/net/atl1/atl1_main.c
> +++ linux-rt-rebase.q/drivers/net/atl1/atl1_main.c
> @@ -1704,10 +1704,8 @@ static int atl1_xmit_frame(struct sk_buf
>   }
>   }
>  
> - local_irq_save(flags);
> - if (!spin_trylock(>lock)) {
> + if (!spin_trylock_irqsave(>lock, flags)) {
>   /* Can't get lock - tell upper layer to requeue */
> - local_irq_restore(flags);
>   dev_printk(KERN_DEBUG, >pdev->dev, "tx locked\n");
>   return NETDEV_TX_LOCKED;
>   }

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Fwd: [PlanetCCRMA] atl1 driver; sleeping function]

2007-07-31 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-31 at 10:51 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  Hi Ingo, I'm forwading this report from a Planet CCRMA user, this is 
  happening to him with 2.6.21.6-rt21...
 
 thanks!

Thanks for the patch!
Looks like it fixed the problem Matt was having...
-- Fernando

 Forwarded Message 
From: Matt Barbe
To: Fernando Lopez-Lezcano
Cc: [EMAIL PROTECTED]
Subject: Re: [PlanetCCRMA] atl1 driver; sleeping function
Date: Tue, 31 Jul 2007 22:50:28 -0400

The newly patched atl1 driver seems to be working fine.  I tried it
also in rt21.3 (that's the latest src.rpm in
http://ccrma.stanford.edu/planetccrma/mirror/all/linux/SRPMS/), and it
also worked fine -- I need kernel-rt-devel because I do use apps that
need nvidia drivers, and those are working fine in rt21.3 as well.  I
can keep you up to date if anything negative happens.

Thanks again,

Matt

 
  BUG: sleeping function called from invalid context IRQ-219(2243) at
  kernel/rtmutex.c:613
  in_atomic():0 [], irqs_disabled():1
   [c0405f88] dump_trace+0x64/0x105
   [c0406041] show_trace_log_lvl+0x18/0x2c
   [c040664e] show_trace+0xf/0x11
   [c04066cf] dump_stack+0x12/0x14
   [c060511d] __rt_spin_lock+0x21/0x3d
   [f8a20e0c] atl1_xmit_frame+0x66f/0x6c6 [atl1]
   [c05a3d96] dev_hard_start_xmit+0x1c6/0x225
   [c05b29bd] __qdisc_run+0xb7/0x1cf
 
 could you try the patch below, does it fix the problem? The atl1 driver 
 uses raw irq flags in combination with a spinlock that is a sleeping 
 lock on -rt. (this is valid code on upstream, fortunately the -rt fix is 
 also a cleanup and a small code reduction enhancement on upstream, so 
 there's no problem pushing such fixes upstream.)
 
   Ingo
 
 ---
 Subject: [patch] drivers/net/atl1/atl1_main.c: use spin_trylock_irqsave()
 From: Ingo Molnar [EMAIL PROTECTED]
 
 use the simpler spin_trylock_irqsave() API to get the adapter lock.
 
 [ this is also a fix for -rt where adapter-lock is a sleeping lock. ]
 
 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
 ---
  drivers/net/atl1/atl1_main.c |4 +---
  1 file changed, 1 insertion(+), 3 deletions(-)
 
 Index: linux-rt-rebase.q/drivers/net/atl1/atl1_main.c
 ===
 --- linux-rt-rebase.q.orig/drivers/net/atl1/atl1_main.c
 +++ linux-rt-rebase.q/drivers/net/atl1/atl1_main.c
 @@ -1704,10 +1704,8 @@ static int atl1_xmit_frame(struct sk_buf
   }
   }
  
 - local_irq_save(flags);
 - if (!spin_trylock(adapter-lock)) {
 + if (!spin_trylock_irqsave(adapter-lock, flags)) {
   /* Can't get lock - tell upper layer to requeue */
 - local_irq_restore(flags);
   dev_printk(KERN_DEBUG, adapter-pdev-dev, tx locked\n);
   return NETDEV_TX_LOCKED;
   }

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Fwd: [PlanetCCRMA] atl1 driver; sleeping function]

2007-07-30 Thread Fernando Lopez-Lezcano

Hi Ingo, I'm forwading this report from a Planet CCRMA user, this is
happening to him with 2.6.21.6-rt21...

-- Fernando

 Forwarded Message 
From: Matt Barber
To: [EMAIL PROTECTED]
Subject: [PlanetCCRMA] atl1 driver; sleeping function
Date: Mon, 30 Jul 2007 06:09:58 -0400

Hello,

I'm getting a set of BUG messages in my dmesg with the newest ccrma
kernel.  This is a new box, so I haven't tried the older ccrma
kernels, but the bugs aren't there with Fedora stock.  They look like
this (probably at least a hundred more by now):

BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_send_ack+0xeb/0xef
 [] tcp_rcv_established+0x52a/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 [] ip_local_deliver+0x18f/0x23d
 [] ip_rcv+0x41d/0x456
 [] netif_receive_skb+0x2cc/0x35e
 [] process_backlog+0x76/0xc9
 [] net_rx_action+0xa7/0x1a5
 [] ___do_softirq+0xfe/0x214
 [] do_softirq_from_hardirq+0x48/0x61
 [] do_irqd+0x21a/0x282
 [] kthread+0xb0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
printk: 6 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_push_one+0xb3/0xd8
 [] tcp_sendmsg+0x7c8/0x9f9
 [] inet_sendmsg+0x3b/0x45
 [] sock_sendmsg+0xd0/0xeb
 [] sys_sendto+0x11b/0x13b
 [] sys_send+0x37/0x3b
 [] sys_socketcall+0x14a/0x261
 [] syscall_call+0x7/0xb
 [] 0xb7fd8410
 ===
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_send_ack+0xeb/0xef
 [] tcp_rcv_established+0x52a/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 [] ip_local_deliver+0x18f/0x23d
 [] ip_rcv+0x41d/0x456
 [] netif_receive_skb+0x2cc/0x35e
 [] process_backlog+0x76/0xc9
 [] net_rx_action+0xa7/0x1a5
 [] ___do_softirq+0xfe/0x214
 [] do_softirq_from_hardirq+0x48/0x61
 [] do_irqd+0x21a/0x282
 [] kthread+0xb0/0xd8
 [] kernel_thread_helper+0x7/0x10
 ===
printk: 14 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] tcp_push_one+0xb3/0xd8
 [] tcp_sendmsg+0x7c8/0x9f9
 [] inet_sendmsg+0x3b/0x45
 [] sock_sendmsg+0xd0/0xeb
 [] sys_sendto+0x11b/0x13b
 [] sys_send+0x37/0x3b
 [] sys_socketcall+0x14a/0x261
 [] syscall_call+0x7/0xb
 [] 0xb7fd8410
 ===
BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [] dump_trace+0x64/0x105
 [] show_trace_log_lvl+0x18/0x2c
 [] show_trace+0xf/0x11
 [] dump_stack+0x12/0x14
 [] __rt_spin_lock+0x21/0x3d
 [] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [] dev_hard_start_xmit+0x1c6/0x225
 [] __qdisc_run+0xb7/0x1cf
 [] dev_queue_xmit+0x14a/0x239
 [] ip_output+0x207/0x243
 [] ip_queue_xmit+0x3b2/0x402
 [] tcp_transmit_skb+0x6e5/0x713
 [] __tcp_push_pending_frames+0x6ec/0x7af
 [] tcp_rcv_established+0x107/0x7ff
 [] tcp_v4_do_rcv+0x1bf/0x494
 [] tcp_v4_rcv+0x863/0x8d6
 []

[Fwd: [PlanetCCRMA] atl1 driver; sleeping function]

2007-07-30 Thread Fernando Lopez-Lezcano

Hi Ingo, I'm forwading this report from a Planet CCRMA user, this is
happening to him with 2.6.21.6-rt21...

-- Fernando

 Forwarded Message 
From: Matt Barber
To: [EMAIL PROTECTED]
Subject: [PlanetCCRMA] atl1 driver; sleeping function
Date: Mon, 30 Jul 2007 06:09:58 -0400

Hello,

I'm getting a set of BUG messages in my dmesg with the newest ccrma
kernel.  This is a new box, so I haven't tried the older ccrma
kernels, but the bugs aren't there with Fedora stock.  They look like
this (probably at least a hundred more by now):

BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [c0405f88] dump_trace+0x64/0x105
 [c0406041] show_trace_log_lvl+0x18/0x2c
 [c040664e] show_trace+0xf/0x11
 [c04066cf] dump_stack+0x12/0x14
 [c060511d] __rt_spin_lock+0x21/0x3d
 [f8a20e0c] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [c05a3d96] dev_hard_start_xmit+0x1c6/0x225
 [c05b29bd] __qdisc_run+0xb7/0x1cf
 [c05a5661] dev_queue_xmit+0x14a/0x239
 [c05c4a40] ip_output+0x207/0x243
 [c05c41ea] ip_queue_xmit+0x3b2/0x402
 [c05d26d7] tcp_transmit_skb+0x6e5/0x713
 [c05d289a] tcp_send_ack+0xeb/0xef
 [c05d1617] tcp_rcv_established+0x52a/0x7ff
 [c05d7234] tcp_v4_do_rcv+0x1bf/0x494
 [c05d9955] tcp_v4_rcv+0x863/0x8d6
 [c05bff3a] ip_local_deliver+0x18f/0x23d
 [c05bfd72] ip_rcv+0x41d/0x456
 [c05a3991] netif_receive_skb+0x2cc/0x35e
 [c05a524a] process_backlog+0x76/0xc9
 [c05a5419] net_rx_action+0xa7/0x1a5
 [c042e276] ___do_softirq+0xfe/0x214
 [c042e6a6] do_softirq_from_hardirq+0x48/0x61
 [c0459204] do_irqd+0x21a/0x282
 [c043ad18] kthread+0xb0/0xd8
 [c0405bbf] kernel_thread_helper+0x7/0x10
 ===
printk: 6 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [c0405f88] dump_trace+0x64/0x105
 [c0406041] show_trace_log_lvl+0x18/0x2c
 [c040664e] show_trace+0xf/0x11
 [c04066cf] dump_stack+0x12/0x14
 [c060511d] __rt_spin_lock+0x21/0x3d
 [f8a20e0c] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [c05a3d96] dev_hard_start_xmit+0x1c6/0x225
 [c05b29bd] __qdisc_run+0xb7/0x1cf
 [c05a5661] dev_queue_xmit+0x14a/0x239
 [c05c4a40] ip_output+0x207/0x243
 [c05c41ea] ip_queue_xmit+0x3b2/0x402
 [c05d26d7] tcp_transmit_skb+0x6e5/0x713
 [c05d41ad] tcp_push_one+0xb3/0xd8
 [c05c9f92] tcp_sendmsg+0x7c8/0x9f9
 [c05e2ce1] inet_sendmsg+0x3b/0x45
 [c059a86a] sock_sendmsg+0xd0/0xeb
 [c059b1bf] sys_sendto+0x11b/0x13b
 [c059b216] sys_send+0x37/0x3b
 [c059bb9e] sys_socketcall+0x14a/0x261
 [c0404f7c] syscall_call+0x7/0xb
 [b7fd8410] 0xb7fd8410
 ===
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context IRQ-219(2243) at
kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [c0405f88] dump_trace+0x64/0x105
 [c0406041] show_trace_log_lvl+0x18/0x2c
 [c040664e] show_trace+0xf/0x11
 [c04066cf] dump_stack+0x12/0x14
 [c060511d] __rt_spin_lock+0x21/0x3d
 [f8a20e0c] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [c05a3d96] dev_hard_start_xmit+0x1c6/0x225
 [c05b29bd] __qdisc_run+0xb7/0x1cf
 [c05a5661] dev_queue_xmit+0x14a/0x239
 [c05c4a40] ip_output+0x207/0x243
 [c05c41ea] ip_queue_xmit+0x3b2/0x402
 [c05d26d7] tcp_transmit_skb+0x6e5/0x713
 [c05d289a] tcp_send_ack+0xeb/0xef
 [c05d1617] tcp_rcv_established+0x52a/0x7ff
 [c05d7234] tcp_v4_do_rcv+0x1bf/0x494
 [c05d9955] tcp_v4_rcv+0x863/0x8d6
 [c05bff3a] ip_local_deliver+0x18f/0x23d
 [c05bfd72] ip_rcv+0x41d/0x456
 [c05a3991] netif_receive_skb+0x2cc/0x35e
 [c05a524a] process_backlog+0x76/0xc9
 [c05a5419] net_rx_action+0xa7/0x1a5
 [c042e276] ___do_softirq+0xfe/0x214
 [c042e6a6] do_softirq_from_hardirq+0x48/0x61
 [c0459204] do_irqd+0x21a/0x282
 [c043ad18] kthread+0xb0/0xd8
 [c0405bbf] kernel_thread_helper+0x7/0x10
 ===
printk: 14 messages suppressed.
network driver disabled raw interrupts: atl1_xmit_frame+0x0/0x6c6 [atl1]
BUG: sleeping function called from invalid context firefox-bin(17517)
at kernel/rtmutex.c:613
in_atomic():0 [], irqs_disabled():1
 [c0405f88] dump_trace+0x64/0x105
 [c0406041] show_trace_log_lvl+0x18/0x2c
 [c040664e] show_trace+0xf/0x11
 [c04066cf] dump_stack+0x12/0x14
 [c060511d] __rt_spin_lock+0x21/0x3d
 [f8a20e0c] atl1_xmit_frame+0x66f/0x6c6 [atl1]
 [c05a3d96] dev_hard_start_xmit+0x1c6/0x225
 [c05b29bd] __qdisc_run+0xb7/0x1cf
 [c05a5661] dev_queue_xmit+0x14a/0x239
 [c05c4a40] ip_output+0x207/0x243
 [c05c41ea] ip_queue_xmit+0x3b2/0x402
 [c05d26d7] tcp_transmit_skb+0x6e5/0x713
 [c05d41ad] tcp_push_one+0xb3/0xd8
 [c05c9f92] tcp_sendmsg+0x7c8/0x9f9
 [c05e2ce1] inet_sendmsg+0x3b/0x45
 [c059a86a] sock_sendmsg+0xd0/0xeb
 [c059b1bf] sys_sendto+0x11b/0x13b
 [c059b216] sys_send+0x37/0x3b
 [c059bb9e]

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 22:34 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > > apparently you caught that 3 seconds window where the .23-rc1-rt1 
> > > release script moved old patches into the older/ directory :-)
> > 
> > Yup, good timing... :-) Hard to do again...
> > (BTW, will you keep 2.6.22.x patches going for a while?)
> 
> yeah, that's the plan: to keep .22-rt updated until .23 is released. 
> (Thomas agrees with that approach too)

Thank you thank you to all involved! That's very good news...
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 22:05 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-07-24 at 12:34 -0700, Fernando Lopez-Lezcano wrote:
> > > On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
> > > > * Rui Nuno Capela <[EMAIL PROTECTED]> wrote:
> > > > 
> > > > > Maybe I was too quick, but `make all` on is failing here:
> > > > 
> > > > does -rt6 work better?
> > > 
> > > Hmmm, -rt6 seems to be gone... was about to download it and it
> > > dissapeared. 
> > 
> > Never mind, I see it migrated back to the main page. 
> 
> apparently you caught that 3 seconds window where the .23-rc1-rt1 
> release script moved old patches into the older/ directory :-)

Yup, good timing... :-) Hard to do again...
(BTW, will you keep 2.6.22.x patches going for a while?)
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 12:34 -0700, Fernando Lopez-Lezcano wrote:
> On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
> > * Rui Nuno Capela <[EMAIL PROTECTED]> wrote:
> > 
> > > Maybe I was too quick, but `make all` on is failing here:
> > 
> > does -rt6 work better?
> 
> Hmmm, -rt6 seems to be gone... was about to download it and it
> dissapeared. 

Never mind, I see it migrated back to the main page. 
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
> * Rui Nuno Capela <[EMAIL PROTECTED]> wrote:
> 
> > Maybe I was too quick, but `make all` on is failing here:
> 
> does -rt6 work better?

Hmmm, -rt6 seems to be gone... was about to download it and it
dissapeared. 

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
 * Rui Nuno Capela [EMAIL PROTECTED] wrote:
 
  Maybe I was too quick, but `make all` on is failing here:
 
 does -rt6 work better?

Hmmm, -rt6 seems to be gone... was about to download it and it
dissapeared. 

-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 12:34 -0700, Fernando Lopez-Lezcano wrote:
 On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
  * Rui Nuno Capela [EMAIL PROTECTED] wrote:
  
   Maybe I was too quick, but `make all` on is failing here:
  
  does -rt6 work better?
 
 Hmmm, -rt6 seems to be gone... was about to download it and it
 dissapeared. 

Never mind, I see it migrated back to the main page. 
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 22:05 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-07-24 at 12:34 -0700, Fernando Lopez-Lezcano wrote:
   On Tue, 2007-07-24 at 09:39 +0200, Ingo Molnar wrote:
* Rui Nuno Capela [EMAIL PROTECTED] wrote:

 Maybe I was too quick, but `make all` on is failing here:

does -rt6 work better?
   
   Hmmm, -rt6 seems to be gone... was about to download it and it
   dissapeared. 
  
  Never mind, I see it migrated back to the main page. 
 
 apparently you caught that 3 seconds window where the .23-rc1-rt1 
 release script moved old patches into the older/ directory :-)

Yup, good timing... :-) Hard to do again...
(BTW, will you keep 2.6.22.x patches going for a while?)
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt5

2007-07-24 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-24 at 22:34 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
   apparently you caught that 3 seconds window where the .23-rc1-rt1 
   release script moved old patches into the older/ directory :-)
  
  Yup, good timing... :-) Hard to do again...
  (BTW, will you keep 2.6.22.x patches going for a while?)
 
 yeah, that's the plan: to keep .22-rt updated until .23 is released. 
 (Thomas agrees with that approach too)

Thank you thank you to all involved! That's very good news...
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] slub crashes with recent -git

2007-07-20 Thread Fernando Lopez-Lezcano

On Thu, 2007-07-19 at 21:42 +0200, Ingo Molnar wrote:
> Linus, Christoph,
> 
> recent slub commits in -git cause this bootup crash:
> 
>  Freeing unused kernel memory: 324k freed
>  Write protecting the kernel read-only data: 1294k

Just curious, are the crashes even possible in 2.6.22.1? (I see the same
patchable code snippet in the source). Just wondering if I should also
apply this to 2.6.21.1-rt4...

-- Fernando


>  [ cut here ]
>  kernel BUG at mm/slub.c:2401!
>  invalid opcode:  [#1]
>  PREEMPT SMP 
>  Modules linked in:
>  CPU:0
>  EIP:0060:[]Not tainted VLI
>  EFLAGS: 00010046   (2.6.22 #1)
>  EIP is at ksize+0x13/0x42
>  eax:    ebx:    ecx: 0020   edx: 
>  esi: f76a4000   edi: 0004   ebp: f7b11e74   esp: f7b11e74
>  ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
>  Process udevd (pid: 824, ti=f7b11000 task=f7ca5000 task.ti=f7b11000)
>  Stack: f7b11e94 c016c28b f768cb00 0020 f7ca5000 0004 f76a4000 
> fff4 
> f7b11eb4 c03cf158 0002 f76a4000 0020 f768cb00 f76a4000 
> f7b11ed8 
> f7b11ed0 c03cfbc6 f768cb00 f768cb00 c046bf80 f768cb00 000c 
> f7b11f6c 
>  Call Trace:
>   [] show_trace_log_lvl+0x19/0x2e
>   [] show_stack_log_lvl+0x9d/0xa5
>   [] show_registers+0x1f5/0x334
>   [] die+0x118/0x1fc
>   [] do_trap+0x8e/0xa8
>   [] do_invalid_op+0x88/0x92
>   [] error_code+0x72/0x78
>   [] krealloc+0x27/0x6d
>   [] netlink_realloc_groups+0x61/0xd9
>   [] netlink_bind+0x4f/0x121
>   [] sys_bind+0x67/0x86
>   [] sys_socketcall+0x8f/0x244
>   [] sysenter_past_esp+0x6b/0xb5
>   ===
>  Code: 40 02 00 75 03 8b 52 0c 8b 02 5d 84 c0 b8 00 00 00 00 0f 49 d0 89 d0 
> c3 55 31 d2 83 f8 10 89 e5 74 34 e8 bc ff ff ff 85 c0 75 04 <0f> 0b eb fe 8b 
> 40 10 85 c0 75 04 0f 0b eb fe 8b 10 f6 c6 0c 74 
> 
> i had to apply the patch below to make the kernel boot again.
> 
> Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>
> 
> Index: linux/mm/slub.c
> ===
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
> @@ -2394,7 +2394,7 @@ size_t ksize(const void *object)
>   struct page *page;
>   struct kmem_cache *s;
>  
> - if (object == ZERO_SIZE_PTR)
> + if (object == ZERO_SIZE_PTR || !object)
>   return 0;
>  
>   page = get_object_page(object);
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch] slub crashes with recent -git

2007-07-20 Thread Fernando Lopez-Lezcano

On Thu, 2007-07-19 at 21:42 +0200, Ingo Molnar wrote:
 Linus, Christoph,
 
 recent slub commits in -git cause this bootup crash:
 
  Freeing unused kernel memory: 324k freed
  Write protecting the kernel read-only data: 1294k

Just curious, are the crashes even possible in 2.6.22.1? (I see the same
patchable code snippet in the source). Just wondering if I should also
apply this to 2.6.21.1-rt4...

-- Fernando


  [ cut here ]
  kernel BUG at mm/slub.c:2401!
  invalid opcode:  [#1]
  PREEMPT SMP 
  Modules linked in:
  CPU:0
  EIP:0060:[c017dac3]Not tainted VLI
  EFLAGS: 00010046   (2.6.22 #1)
  EIP is at ksize+0x13/0x42
  eax:    ebx:    ecx: 0020   edx: 
  esi: f76a4000   edi: 0004   ebp: f7b11e74   esp: f7b11e74
  ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
  Process udevd (pid: 824, ti=f7b11000 task=f7ca5000 task.ti=f7b11000)
  Stack: f7b11e94 c016c28b f768cb00 0020 f7ca5000 0004 f76a4000 
 fff4 
 f7b11eb4 c03cf158 0002 f76a4000 0020 f768cb00 f76a4000 
 f7b11ed8 
 f7b11ed0 c03cfbc6 f768cb00 f768cb00 c046bf80 f768cb00 000c 
 f7b11f6c 
  Call Trace:
   [c0105e3e] show_trace_log_lvl+0x19/0x2e
   [c0105ef0] show_stack_log_lvl+0x9d/0xa5
   [c010628f] show_registers+0x1f5/0x334
   [c01064e6] die+0x118/0x1fc
   [c0426e7f] do_trap+0x8e/0xa8
   [c0106ac3] do_invalid_op+0x88/0x92
   [c0426a92] error_code+0x72/0x78
   [c016c28b] krealloc+0x27/0x6d
   [c03cf158] netlink_realloc_groups+0x61/0xd9
   [c03cfbc6] netlink_bind+0x4f/0x121
   [c03afe8d] sys_bind+0x67/0x86
   [c03b11e3] sys_socketcall+0x8f/0x244
   [c0104ef2] sysenter_past_esp+0x6b/0xb5
   ===
  Code: 40 02 00 75 03 8b 52 0c 8b 02 5d 84 c0 b8 00 00 00 00 0f 49 d0 89 d0 
 c3 55 31 d2 83 f8 10 89 e5 74 34 e8 bc ff ff ff 85 c0 75 04 0f 0b eb fe 8b 
 40 10 85 c0 75 04 0f 0b eb fe 8b 10 f6 c6 0c 74 
 
 i had to apply the patch below to make the kernel boot again.
 
 Signed-off-by: Ingo Molnar [EMAIL PROTECTED]
 
 Index: linux/mm/slub.c
 ===
 --- linux.orig/mm/slub.c
 +++ linux/mm/slub.c
 @@ -2394,7 +2394,7 @@ size_t ksize(const void *object)
   struct page *page;
   struct kmem_cache *s;
  
 - if (object == ZERO_SIZE_PTR)
 + if (object == ZERO_SIZE_PTR || !object)
   return 0;
  
   page = get_object_page(object);
 -
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-18 Thread Fernando Lopez-Lezcano

On Wed, 2007-07-18 at 09:18 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > > does lockdep pinpoint anything?
> > 
> > Lots of stuff, and at the end the lock report for the problem. 
> > Hopefully some of this will help... I have attached the whole bootup 
> > sequence as logged in /var/log/messages.
> 
> yeah, it pinpointed the bug. It seems to be an interaction between 
> RCU-preempt (Paul Cc:-ed) and sched_mc_power_savings_store(): 
> detach_destroy_domains() uses synchronize_sched() which uses 
> getaffinity, which takes sched_hotcpu_mutex, and 
> arch_reinit_sched_domains does it too - see the lockdep report below. 
> I've added a quick workaround below as well, which should keep your box 
> from hanging.

I can confirm that flash9 does not hang with the patch. 
Thanks!!!
I presume the same would apply to 2.6.21.x and, say, rt21. I'll test.

But (of course, there's always a but somewhere) I just experienced a
complete hang - 2.6.22.1-rt4 with the little patch. This time there was
something in the logs, maybe it will help? This was when finishing the
install of an additional kernel module rpm package (ipw3945 drivers). 

-- Fernando


Jul 18 10:48:15 localhost kernel: BUG: sleeping function called from
invalid context modprobe(5001) at kernel/rtmutex.c:636
Jul 18 10:48:15 localhost kernel: in_atomic():1 [0001],
irqs_disabled():0
Jul 18 10:48:15 localhost kernel:  [] show_trace_log_lvl
+0x1a/0x2f
Jul 18 10:48:15 localhost kernel:  [] show_trace+0x12/0x14
Jul 18 10:48:15 localhost kernel:  [] dump_stack+0x16/0x18
Jul 18 10:48:15 localhost kernel:  [] __might_sleep+0xeb/0xf2
Jul 18 10:48:15 localhost kernel:  [] __rt_spin_lock+0x24/0x40
Jul 18 10:48:15 localhost kernel:  [] rt_spin_lock+0x8/0xa
Jul 18 10:48:15 localhost kernel:  [] get_zone_pcp+0x23/0x33
Jul 18 10:48:15 localhost kernel:  [] free_hot_cold_page
+0xcf/0x148
Jul 18 10:48:15 localhost kernel:  [] free_hot_page+0xa/0xc
Jul 18 10:48:15 localhost kernel:  [] __free_pages+0x25/0x30
Jul 18 10:48:15 localhost kernel:  [] free_pages+0x29/0x2b
Jul 18 10:48:15 localhost kernel:  [] quicklist_trim+0xd0/0xf5
Jul 18 10:48:15 localhost kernel:  [] check_pgt_cache
+0x1e/0x20
Jul 18 10:48:15 localhost kernel:  [] free_pgtables+0x52/0x147
Jul 18 10:48:15 localhost kernel:  [] unmap_region+0xe6/0x135
Jul 18 10:48:15 localhost kernel:  [] do_munmap+0x153/0x1b4
Jul 18 10:48:15 localhost kernel:  [] do_mremap+0x413/0x4c3
Jul 18 10:48:15 localhost kernel:  [] sys_mremap+0x36/0x56
Jul 18 10:48:15 localhost kernel:  [] syscall_call+0x7/0xb
Jul 18 10:48:15 localhost kernel:  ===
Jul 18 10:48:16 localhost kernel: BUG: sleeping function called from
invalid context head(5652) at kernel/rtmutex.c:636
Jul 18 10:48:16 localhost kernel: in_atomic():1 [0001],
irqs_disabled():0
Jul 18 10:48:16 localhost kernel:  [] show_trace_log_lvl
+0x1a/0x2f
Jul 18 10:48:16 localhost kernel:  [] show_trace+0x12/0x14
Jul 18 10:48:16 localhost kernel:  [] dump_stack+0x16/0x18
Jul 18 10:48:16 localhost kernel:  [] __might_sleep+0xeb/0xf2
Jul 18 10:48:16 localhost kernel:  [] __rt_spin_lock+0x24/0x40
Jul 18 10:48:16 localhost kernel:  [] rt_spin_lock+0x8/0xa
Jul 18 10:48:16 localhost kernel:  [] get_zone_pcp+0x23/0x33
Jul 18 10:48:16 localhost kernel:  [] free_hot_cold_page
+0xcf/0x148
Jul 18 10:48:16 localhost kernel:  [] free_hot_page+0xa/0xc
Jul 18 10:48:16 localhost kernel:  [] __free_pages+0x25/0x30
Jul 18 10:48:16 localhost kernel:  [] free_pages+0x29/0x2b
Jul 18 10:48:16 localhost kernel:  [] quicklist_trim+0xd0/0xf5
Jul 18 10:48:16 localhost kernel:  [] check_pgt_cache
+0x1e/0x20
Jul 18 10:48:16 localhost kernel:  [] free_pgtables+0x52/0x147
Jul 18 10:48:16 localhost kernel:  [] unmap_region+0xe6/0x135
Jul 18 10:48:16 localhost kernel:  [] do_munmap+0x153/0x1b4
Jul 18 10:48:16 localhost kernel:  [] sys_munmap+0x30/0x3f
Jul 18 10:48:16 localhost kernel:  [] syscall_call+0x7/0xb
Jul 18 10:48:16 localhost kernel:  ===
Jul 18 10:50:22 localhost syslogd 1.4.2: restart.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-18 Thread Fernando Lopez-Lezcano

On Wed, 2007-07-18 at 09:18 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
   does lockdep pinpoint anything?
  
  Lots of stuff, and at the end the lock report for the problem. 
  Hopefully some of this will help... I have attached the whole bootup 
  sequence as logged in /var/log/messages.
 
 yeah, it pinpointed the bug. It seems to be an interaction between 
 RCU-preempt (Paul Cc:-ed) and sched_mc_power_savings_store(): 
 detach_destroy_domains() uses synchronize_sched() which uses 
 getaffinity, which takes sched_hotcpu_mutex, and 
 arch_reinit_sched_domains does it too - see the lockdep report below. 
 I've added a quick workaround below as well, which should keep your box 
 from hanging.

I can confirm that flash9 does not hang with the patch. 
Thanks!!!
I presume the same would apply to 2.6.21.x and, say, rt21. I'll test.

But (of course, there's always a but somewhere) I just experienced a
complete hang - 2.6.22.1-rt4 with the little patch. This time there was
something in the logs, maybe it will help? This was when finishing the
install of an additional kernel module rpm package (ipw3945 drivers). 

-- Fernando


Jul 18 10:48:15 localhost kernel: BUG: sleeping function called from
invalid context modprobe(5001) at kernel/rtmutex.c:636
Jul 18 10:48:15 localhost kernel: in_atomic():1 [0001],
irqs_disabled():0
Jul 18 10:48:15 localhost kernel:  [c0405f34] show_trace_log_lvl
+0x1a/0x2f
Jul 18 10:48:15 localhost kernel:  [c0406a09] show_trace+0x12/0x14
Jul 18 10:48:15 localhost kernel:  [c0406a71] dump_stack+0x16/0x18
Jul 18 10:48:15 localhost kernel:  [c0423bfc] __might_sleep+0xeb/0xf2
Jul 18 10:48:15 localhost kernel:  [c0617242] __rt_spin_lock+0x24/0x40
Jul 18 10:48:15 localhost kernel:  [c0617266] rt_spin_lock+0x8/0xa
Jul 18 10:48:15 localhost kernel:  [c04621c9] get_zone_pcp+0x23/0x33
Jul 18 10:48:15 localhost kernel:  [c0462702] free_hot_cold_page
+0xcf/0x148
Jul 18 10:48:15 localhost kernel:  [c04627b2] free_hot_page+0xa/0xc
Jul 18 10:48:15 localhost kernel:  [c0462a82] __free_pages+0x25/0x30
Jul 18 10:48:15 localhost kernel:  [c0462ab6] free_pages+0x29/0x2b
Jul 18 10:48:15 localhost kernel:  [c047abf3] quicklist_trim+0xd0/0xf5
Jul 18 10:48:15 localhost kernel:  [c041f5d9] check_pgt_cache
+0x1e/0x20
Jul 18 10:48:15 localhost kernel:  [c046aedf] free_pgtables+0x52/0x147
Jul 18 10:48:15 localhost kernel:  [c046cdf7] unmap_region+0xe6/0x135
Jul 18 10:48:15 localhost kernel:  [c046d764] do_munmap+0x153/0x1b4
Jul 18 10:48:15 localhost kernel:  [c046f3de] do_mremap+0x413/0x4c3
Jul 18 10:48:15 localhost kernel:  [c046f4c4] sys_mremap+0x36/0x56
Jul 18 10:48:15 localhost kernel:  [c0404fca] syscall_call+0x7/0xb
Jul 18 10:48:15 localhost kernel:  ===
Jul 18 10:48:16 localhost kernel: BUG: sleeping function called from
invalid context head(5652) at kernel/rtmutex.c:636
Jul 18 10:48:16 localhost kernel: in_atomic():1 [0001],
irqs_disabled():0
Jul 18 10:48:16 localhost kernel:  [c0405f34] show_trace_log_lvl
+0x1a/0x2f
Jul 18 10:48:16 localhost kernel:  [c0406a09] show_trace+0x12/0x14
Jul 18 10:48:16 localhost kernel:  [c0406a71] dump_stack+0x16/0x18
Jul 18 10:48:16 localhost kernel:  [c0423bfc] __might_sleep+0xeb/0xf2
Jul 18 10:48:16 localhost kernel:  [c0617242] __rt_spin_lock+0x24/0x40
Jul 18 10:48:16 localhost kernel:  [c0617266] rt_spin_lock+0x8/0xa
Jul 18 10:48:16 localhost kernel:  [c04621c9] get_zone_pcp+0x23/0x33
Jul 18 10:48:16 localhost kernel:  [c0462702] free_hot_cold_page
+0xcf/0x148
Jul 18 10:48:16 localhost kernel:  [c04627b2] free_hot_page+0xa/0xc
Jul 18 10:48:16 localhost kernel:  [c0462a82] __free_pages+0x25/0x30
Jul 18 10:48:16 localhost kernel:  [c0462ab6] free_pages+0x29/0x2b
Jul 18 10:48:16 localhost kernel:  [c047abf3] quicklist_trim+0xd0/0xf5
Jul 18 10:48:16 localhost kernel:  [c041f5d9] check_pgt_cache
+0x1e/0x20
Jul 18 10:48:16 localhost kernel:  [c046aedf] free_pgtables+0x52/0x147
Jul 18 10:48:16 localhost kernel:  [c046cdf7] unmap_region+0xe6/0x135
Jul 18 10:48:16 localhost kernel:  [c046d764] do_munmap+0x153/0x1b4
Jul 18 10:48:16 localhost kernel:  [c046d7f5] sys_munmap+0x30/0x3f
Jul 18 10:48:16 localhost kernel:  [c0404fca] syscall_call+0x7/0xb
Jul 18 10:48:16 localhost kernel:  ===
Jul 18 10:50:22 localhost syslogd 1.4.2: restart.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 22:12 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
> > > * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I do get flash 9 (I know, not the best example) and tomboy to hang as 
> > > > reported by one of my Planet CCRMA users - flash 9 tested working on 
> > > > stock fedora 7 kernel - and both seem to hang in the same system call:
> > > > 
> > > > sched_getaffinity(3528, 32, 
> > > > 
> > > > Full output of strace attached for both cases.
> > > 
> > > hm, that's weird. Is it completely unkillable at that time? Could you do 
> > > a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
> > > a full task state dump via:
> > > 
> > >   echo t > /proc/sysrq-trigger
> > 
> > Trace attached... the process stays in D state no matter what. 
> 
> hm, seems to be related to:
> 
> Jul 17 12:51:18 localhost kernel: sched-powersa D [f0aaf930] 0005  6584  
> 3420   3407
> 
> which blocks the cpu-hotplug mutex:
> 
> Jul 17 12:51:18 localhost kernel: Call Trace:
> Jul 17 12:51:18 localhost kernel:  [] schedule+0xe0/0xfa
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_slowlock+0x164/0x20b
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_lock+0x3c/0x3f
> Jul 17 12:51:18 localhost kernel:  [] sched_getaffinity+0x14/0x94
> Jul 17 12:51:18 localhost kernel:  [] __synchronize_sched+0xd/0x5a
> Jul 17 12:51:18 localhost kernel:  [] 
> arch_reinit_sched_domains+0x18/0x33
> Jul 17 12:51:18 localhost kernel:  [] 
> sched_power_savings_store+0x3c/0x49
> Jul 17 12:51:18 localhost kernel:  [] sysdev_class_store+0x1e/0x22
> Jul 17 12:51:18 localhost kernel:  [] sysfs_write_file+0xa3/0xc6
> Jul 17 12:51:18 localhost kernel:  [] vfs_write+0xa8/0x154
> Jul 17 12:51:18 localhost kernel:  [] sys_write+0x41/0x67
> Jul 17 12:51:18 localhost kernel:  [] syscall_call+0x7/0xb
> 
> and firefox blocks on the same mutex too:
> 
> Jul 17 12:51:18 localhost kernel: firefox-bin   D [efc44670] 0012  6368  
> 4388  1
> Jul 17 12:51:18 localhost kernel: Call Trace:
> Jul 17 12:51:18 localhost kernel:  [] schedule+0xe0/0xfa
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_slowlock+0x164/0x20b
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_lock+0x3c/0x3f
> Jul 17 12:51:18 localhost kernel:  [] sched_getaffinity+0x14/0x94
> Jul 17 12:51:18 localhost kernel:  [] 
> sys_sched_getaffinity+0x1f/0x41
> Jul 17 12:51:18 localhost kernel:  [] syscall_call+0x7/0xb
> Jul 17 12:51:18 localhost kernel:  [] 0xb7f0f410
> 
> does lockdep pinpoint anything?

Lots of stuff, and at the end the lock report for the problem. Hopefully
some of this will help... I have attached the whole bootup sequence as
logged in /var/log/messages. 

-- Fernando



trace3.txt.gz
Description: GNU Zip compressed data

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 22:12 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
> > > * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I do get flash 9 (I know, not the best example) and tomboy to hang as 
> > > > reported by one of my Planet CCRMA users - flash 9 tested working on 
> > > > stock fedora 7 kernel - and both seem to hang in the same system call:
> > > > 
> > > > sched_getaffinity(3528, 32, 
> > > > 
> > > > Full output of strace attached for both cases.
> > > 
> > > hm, that's weird. Is it completely unkillable at that time? Could you do 
> > > a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
> > > a full task state dump via:
> > > 
> > >   echo t > /proc/sysrq-trigger
> > 
> > Trace attached... the process stays in D state no matter what. 

Just in case, it repeats under 2.6.22.1-rt4 (< rt4 did not boot into my
t61 laptop, this one at least does that). I'm including the (probably
redundant) dump. 

I have to build a new kernel with prove locking...

-- Fernando


> hm, seems to be related to:
> 
> Jul 17 12:51:18 localhost kernel: sched-powersa D [f0aaf930] 0005  6584  
> 3420   3407
> 
> which blocks the cpu-hotplug mutex:
> 
> Jul 17 12:51:18 localhost kernel: Call Trace:
> Jul 17 12:51:18 localhost kernel:  [] schedule+0xe0/0xfa
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_slowlock+0x164/0x20b
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_lock+0x3c/0x3f
> Jul 17 12:51:18 localhost kernel:  [] sched_getaffinity+0x14/0x94
> Jul 17 12:51:18 localhost kernel:  [] __synchronize_sched+0xd/0x5a
> Jul 17 12:51:18 localhost kernel:  [] 
> arch_reinit_sched_domains+0x18/0x33
> Jul 17 12:51:18 localhost kernel:  [] 
> sched_power_savings_store+0x3c/0x49
> Jul 17 12:51:18 localhost kernel:  [] sysdev_class_store+0x1e/0x22
> Jul 17 12:51:18 localhost kernel:  [] sysfs_write_file+0xa3/0xc6
> Jul 17 12:51:18 localhost kernel:  [] vfs_write+0xa8/0x154
> Jul 17 12:51:18 localhost kernel:  [] sys_write+0x41/0x67
> Jul 17 12:51:18 localhost kernel:  [] syscall_call+0x7/0xb
> 
> and firefox blocks on the same mutex too:
> 
> Jul 17 12:51:18 localhost kernel: firefox-bin   D [efc44670] 0012  6368  
> 4388  1
> Jul 17 12:51:18 localhost kernel: Call Trace:
> Jul 17 12:51:18 localhost kernel:  [] schedule+0xe0/0xfa
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_slowlock+0x164/0x20b
> Jul 17 12:51:18 localhost kernel:  [] rt_mutex_lock+0x3c/0x3f
> Jul 17 12:51:18 localhost kernel:  [] sched_getaffinity+0x14/0x94
> Jul 17 12:51:18 localhost kernel:  [] 
> sys_sched_getaffinity+0x1f/0x41
> Jul 17 12:51:18 localhost kernel:  [] syscall_call+0x7/0xb
> Jul 17 12:51:18 localhost kernel:  [] 0xb7f0f410
> 
> does lockdep pinpoint anything?
> 
>   Ingo


trace2.txt.gz
Description: GNU Zip compressed data

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > I do get flash 9 (I know, not the best example) and tomboy to hang as 
> > reported by one of my Planet CCRMA users - flash 9 tested working on 
> > stock fedora 7 kernel - and both seem to hang in the same system call:
> > 
> > sched_getaffinity(3528, 32, 
> > 
> > Full output of strace attached for both cases.
> 
> hm, that's weird. Is it completely unkillable at that time? Could you do 
> a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
> a full task state dump via:
> 
>   echo t > /proc/sysrq-trigger

Trace attached... the process stays in D state no matter what. 
-- Fernando



trace1.txt.gz
Description: GNU Zip compressed data

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> 
> > I do get flash 9 (I know, not the best example) and tomboy to hang as 
> > reported by one of my Planet CCRMA users - flash 9 tested working on 
> > stock fedora 7 kernel - and both seem to hang in the same system call:
> > 
> > sched_getaffinity(3528, 32, 
> > 
> > Full output of strace attached for both cases.
> 
> hm, that's weird. Is it completely unkillable at that time? Could you do 
> a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
> a full task state dump via:
> 
>   echo t > /proc/sysrq-trigger
> 
> thanks,

kill -9 does nothing. If there's another way to kill something let me
know :-) I'll try to get the dump asap. 

Hope you had a good time over the long weekend, you certainly deserve
some rest (and congrats over the scheduler inclusing in mainline!)

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  I do get flash 9 (I know, not the best example) and tomboy to hang as 
  reported by one of my Planet CCRMA users - flash 9 tested working on 
  stock fedora 7 kernel - and both seem to hang in the same system call:
  
  sched_getaffinity(3528, 32, unfinished ...
  
  Full output of strace attached for both cases.
 
 hm, that's weird. Is it completely unkillable at that time? Could you do 
 a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
 a full task state dump via:
 
   echo t  /proc/sysrq-trigger
 
 thanks,

kill -9 does nothing. If there's another way to kill something let me
know :-) I'll try to get the dump asap. 

Hope you had a good time over the long weekend, you certainly deserve
some rest (and congrats over the scheduler inclusing in mainline!)

-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  I do get flash 9 (I know, not the best example) and tomboy to hang as 
  reported by one of my Planet CCRMA users - flash 9 tested working on 
  stock fedora 7 kernel - and both seem to hang in the same system call:
  
  sched_getaffinity(3528, 32, unfinished ...
  
  Full output of strace attached for both cases.
 
 hm, that's weird. Is it completely unkillable at that time? Could you do 
 a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
 a full task state dump via:
 
   echo t  /proc/sysrq-trigger

Trace attached... the process stays in D state no matter what. 
-- Fernando



trace1.txt.gz
Description: GNU Zip compressed data

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 22:12 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
   * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
   
I do get flash 9 (I know, not the best example) and tomboy to hang as 
reported by one of my Planet CCRMA users - flash 9 tested working on 
stock fedora 7 kernel - and both seem to hang in the same system call:

sched_getaffinity(3528, 32, unfinished ...

Full output of strace attached for both cases.
   
   hm, that's weird. Is it completely unkillable at that time? Could you do 
   a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
   a full task state dump via:
   
 echo t  /proc/sysrq-trigger
  
  Trace attached... the process stays in D state no matter what. 

Just in case, it repeats under 2.6.22.1-rt4 ( rt4 did not boot into my
t61 laptop, this one at least does that). I'm including the (probably
redundant) dump. 

I have to build a new kernel with prove locking...

-- Fernando


 hm, seems to be related to:
 
 Jul 17 12:51:18 localhost kernel: sched-powersa D [f0aaf930] 0005  6584  
 3420   3407
 
 which blocks the cpu-hotplug mutex:
 
 Jul 17 12:51:18 localhost kernel: Call Trace:
 Jul 17 12:51:18 localhost kernel:  [c0603f46] schedule+0xe0/0xfa
 Jul 17 12:51:18 localhost kernel:  [c0604d0d] rt_mutex_slowlock+0x164/0x20b
 Jul 17 12:51:18 localhost kernel:  [c0604a5c] rt_mutex_lock+0x3c/0x3f
 Jul 17 12:51:18 localhost kernel:  [c0423bb4] sched_getaffinity+0x14/0x94
 Jul 17 12:51:18 localhost kernel:  [c045a647] __synchronize_sched+0xd/0x5a
 Jul 17 12:51:18 localhost kernel:  [c0423732] 
 arch_reinit_sched_domains+0x18/0x33
 Jul 17 12:51:18 localhost kernel:  [c0423789] 
 sched_power_savings_store+0x3c/0x49
 Jul 17 12:51:18 localhost kernel:  [c0552cd4] sysdev_class_store+0x1e/0x22
 Jul 17 12:51:18 localhost kernel:  [c04b195b] sysfs_write_file+0xa3/0xc6
 Jul 17 12:51:18 localhost kernel:  [c047a64a] vfs_write+0xa8/0x154
 Jul 17 12:51:18 localhost kernel:  [c047ac65] sys_write+0x41/0x67
 Jul 17 12:51:18 localhost kernel:  [c0404f7c] syscall_call+0x7/0xb
 
 and firefox blocks on the same mutex too:
 
 Jul 17 12:51:18 localhost kernel: firefox-bin   D [efc44670] 0012  6368  
 4388  1
 Jul 17 12:51:18 localhost kernel: Call Trace:
 Jul 17 12:51:18 localhost kernel:  [c0603f46] schedule+0xe0/0xfa
 Jul 17 12:51:18 localhost kernel:  [c0604d0d] rt_mutex_slowlock+0x164/0x20b
 Jul 17 12:51:18 localhost kernel:  [c0604a5c] rt_mutex_lock+0x3c/0x3f
 Jul 17 12:51:18 localhost kernel:  [c0423bb4] sched_getaffinity+0x14/0x94
 Jul 17 12:51:18 localhost kernel:  [c0423c53] 
 sys_sched_getaffinity+0x1f/0x41
 Jul 17 12:51:18 localhost kernel:  [c0404f7c] syscall_call+0x7/0xb
 Jul 17 12:51:18 localhost kernel:  [b7f0f410] 0xb7f0f410
 
 does lockdep pinpoint anything?
 
   Ingo


trace2.txt.gz
Description: GNU Zip compressed data

Re: v2.6.21.5-rt19 (sched_getaffinity?)

2007-07-17 Thread Fernando Lopez-Lezcano

On Tue, 2007-07-17 at 22:12 +0200, Ingo Molnar wrote:
 * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
 
  On Tue, 2007-07-17 at 21:32 +0200, Ingo Molnar wrote:
   * Fernando Lopez-Lezcano [EMAIL PROTECTED] wrote:
   
I do get flash 9 (I know, not the best example) and tomboy to hang as 
reported by one of my Planet CCRMA users - flash 9 tested working on 
stock fedora 7 kernel - and both seem to hang in the same system call:

sched_getaffinity(3528, 32, unfinished ...

Full output of strace attached for both cases.
   
   hm, that's weird. Is it completely unkillable at that time? Could you do 
   a few things: enable CONFIG_PROVE_LOCKING (lockdep), and also try to get 
   a full task state dump via:
   
 echo t  /proc/sysrq-trigger
  
  Trace attached... the process stays in D state no matter what. 
 
 hm, seems to be related to:
 
 Jul 17 12:51:18 localhost kernel: sched-powersa D [f0aaf930] 0005  6584  
 3420   3407
 
 which blocks the cpu-hotplug mutex:
 
 Jul 17 12:51:18 localhost kernel: Call Trace:
 Jul 17 12:51:18 localhost kernel:  [c0603f46] schedule+0xe0/0xfa
 Jul 17 12:51:18 localhost kernel:  [c0604d0d] rt_mutex_slowlock+0x164/0x20b
 Jul 17 12:51:18 localhost kernel:  [c0604a5c] rt_mutex_lock+0x3c/0x3f
 Jul 17 12:51:18 localhost kernel:  [c0423bb4] sched_getaffinity+0x14/0x94
 Jul 17 12:51:18 localhost kernel:  [c045a647] __synchronize_sched+0xd/0x5a
 Jul 17 12:51:18 localhost kernel:  [c0423732] 
 arch_reinit_sched_domains+0x18/0x33
 Jul 17 12:51:18 localhost kernel:  [c0423789] 
 sched_power_savings_store+0x3c/0x49
 Jul 17 12:51:18 localhost kernel:  [c0552cd4] sysdev_class_store+0x1e/0x22
 Jul 17 12:51:18 localhost kernel:  [c04b195b] sysfs_write_file+0xa3/0xc6
 Jul 17 12:51:18 localhost kernel:  [c047a64a] vfs_write+0xa8/0x154
 Jul 17 12:51:18 localhost kernel:  [c047ac65] sys_write+0x41/0x67
 Jul 17 12:51:18 localhost kernel:  [c0404f7c] syscall_call+0x7/0xb
 
 and firefox blocks on the same mutex too:
 
 Jul 17 12:51:18 localhost kernel: firefox-bin   D [efc44670] 0012  6368  
 4388  1
 Jul 17 12:51:18 localhost kernel: Call Trace:
 Jul 17 12:51:18 localhost kernel:  [c0603f46] schedule+0xe0/0xfa
 Jul 17 12:51:18 localhost kernel:  [c0604d0d] rt_mutex_slowlock+0x164/0x20b
 Jul 17 12:51:18 localhost kernel:  [c0604a5c] rt_mutex_lock+0x3c/0x3f
 Jul 17 12:51:18 localhost kernel:  [c0423bb4] sched_getaffinity+0x14/0x94
 Jul 17 12:51:18 localhost kernel:  [c0423c53] 
 sys_sched_getaffinity+0x1f/0x41
 Jul 17 12:51:18 localhost kernel:  [c0404f7c] syscall_call+0x7/0xb
 Jul 17 12:51:18 localhost kernel:  [b7f0f410] 0xb7f0f410
 
 does lockdep pinpoint anything?

Lots of stuff, and at the end the lock report for the problem. Hopefully
some of this will help... I have attached the whole bootup sequence as
logged in /var/log/messages. 

-- Fernando



trace3.txt.gz
Description: GNU Zip compressed data

Re: v2.6.22.1-rt3

2007-07-13 Thread Fernando Lopez-Lezcano

On Fri, 2007-07-13 at 13:22 +0200, Thomas Gleixner wrote:
> we are pleased to announce the v2.6.22.1-rt3 kernel
> 
> Attention! 
> 
> Ingo is off for a long weekend and therefor the download location for
> this release is:
> 
>  http://www.tglx.de/projects/preempt-rt/2.6.22.1
>   
> more info about the -rt patchset can be found in the RT wiki:
>   
>http://rt.wiki.kernel.org
>  
> This release is bugfix release:
> 
> - update of the x8664 -hrt queue (resolve boot problems)
> - gtod vsyscall fix from Gregory Haskins

Same problem as reported yesterday in 2.6.22.1-rt2 in a T61 laptop, boot
hangs, last BUG printed is similar to this (numbers changed since
yesterday, of course, functions listed appear to be the same). No serial
port available to dump everything...

This was copied from the screen yesterday:

BUG: spinlock lockup on CPU#1, swapper/0, c318da88
[] show_trace_log_lvl+0x1a/0x2f
[] show_trace+-x12/0x14
[] dump_stack+0x16/0x18
[] _raw_spin_lock+0xc1/0xe2
[] __spin_lock_irq+0x14/0x16
[] __sched_tex_start+0xd5/0xaef
[] schedule+0xe0/0xfa
[] rt_spin_lock_slowlock+0xcf/0x14f
[] __rt_spin_lock+0x3d/0x40
[] rt_spin_lock+0x8/0xa
[] acpi_idle_enter_c3+0x12d/0x232
[] cpuidle_idle_call+0x56/0x79
[] cpu_idle+0x9d/0xda
[] start_secondary+0x34e/0x356
[<>] 0x0

Same .config as before.
-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.22.1-rt3

2007-07-13 Thread Fernando Lopez-Lezcano

On Fri, 2007-07-13 at 13:22 +0200, Thomas Gleixner wrote:
 we are pleased to announce the v2.6.22.1-rt3 kernel
 
 Attention! 
 
 Ingo is off for a long weekend and therefor the download location for
 this release is:
 
  http://www.tglx.de/projects/preempt-rt/2.6.22.1
   
 more info about the -rt patchset can be found in the RT wiki:
   
http://rt.wiki.kernel.org
  
 This release is bugfix release:
 
 - update of the x8664 -hrt queue (resolve boot problems)
 - gtod vsyscall fix from Gregory Haskins

Same problem as reported yesterday in 2.6.22.1-rt2 in a T61 laptop, boot
hangs, last BUG printed is similar to this (numbers changed since
yesterday, of course, functions listed appear to be the same). No serial
port available to dump everything...

This was copied from the screen yesterday:

BUG: spinlock lockup on CPU#1, swapper/0, c318da88
[c0405f34] show_trace_log_lvl+0x1a/0x2f
[c0406a09] show_trace+-x12/0x14
[c0406a71] dump_stack+0x16/0x18
[c0617a91] _raw_spin_lock+0xc1/0xe2
[c061743f] __spin_lock_irq+0x14/0x16
[c061541d] __sched_tex_start+0xd5/0xaef
[c061600e] schedule+0xe0/0xfa
[c0616c15] rt_spin_lock_slowlock+0xcf/0x14f
[c061724b] __rt_spin_lock+0x3d/0x40
[c0617256] rt_spin_lock+0x8/0xa
[c052f95c] acpi_idle_enter_c3+0x12d/0x232
[c059af51] cpuidle_idle_call+0x56/0x79
[c04033a5] cpu_idle+0x9d/0xda
[c0419e21] start_secondary+0x34e/0x356
[] 0x0

Same .config as before.
-- Fernando


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Fernando Lopez-Lezcano

On Sun, 2007-07-08 at 15:36 -0700, Fernando Lopez-Lezcano wrote:
> On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
> > * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > > > Changes since 2.6.21.5-rt18:
> > > > >
> > > > > - Fixed a nasty and hard to track down slowness / boot problem on SMP
> > > > > machines with CONFIG_NOHZ enabled. The problem was caused by the timer
> > > > > wheel base lock held during the get_next_timer_interrupt() call in the
> > > > > idle path, which eventually led to a bogus PI boosting of the idle 
> > > > > task
> > > > > and in consequence a stale wrong scheduler selection for the affected 
> > > > > idle
> > > > > task.
> > > > >
> > > > > Kudos to Carsten Emde, who patiently and meticulously isolated the
> > > > > problem and provided the traces, which allowed to identify the root 
> > > > > cause.
> > > > >
> > > > > Problem solution: Prevent idle task boosting
> > 
> > > > Maybe someone remember me whining about troubles with 2.6.21-rt2..18 
> > > > on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).
> > > > 
> > > > Althought I'm still with my fingers crossed, I can tell the good 
> > > > news are that 2.6.21.5-rt19 (and -rt20) does behave far better now 
> > > > on the very same box.
> > > 
> > > Yes, it works much better indeed...
> > > 
> > > Ingo: is there a place where I can read about the changes in different 
> > > rtxx releases? What is new/better/fixed in rt20? (I see scheduler 
> > > stuff in a diff from rt19 to rt20 but I don't really know what it 
> > > means).
> > 
> > and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when 
> > CFS was merged.
> > 
> > i _think_ Rui might have seen two separate problems. Perhaps by the time 
> > we fixed the first problem (which Rui saw since -rt2) we introduced the 
> > other one via -rt11 - which then got fixed in -rt19.
> 
> Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
> really trying to provide a "stable" rt kernel for audio usage and
> including another subsystem into rt is - IMHO - not going to help.
> What's the chance of splitting things?
> 
> > btw., we'd love to get more feedback regarding CFS. CFS is a completely 
> > new scheduler for Linux. 
> 
> Then I'd rather have it separate from rt. 

Please?

I would like to provide the least ammount of new functionality that is
really necessary in my audio kernels. Audio related requirements include
the rt patch but not a new scheduler. 

> > It has a design centered around keeping 
> > application latencies down, so it is ultimately real-time friendly, and 
> > it should also make things work better for desktop-ish and audio-ish 
> > stuff as well. (even under SCHED_OTHER)
> 
> Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
> list):
> 
> On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
> > Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
> > on my desktop but on my laptop it makes Firefox and Tomboy to crash.
> > On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
> > problem.

It looks to my untrained eye like it is CFS related, I'm attaching the
last part of the strace of firefox while it tries to load a flash site.
The firefox process is left in an unkillable (not even by -9) state.
What else could I provide to debug the problem? (this is in a T61 laptop
with the Intel 7700 processor). 

-- Fernando


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: v2.6.21.5-rt19

2007-07-08 Thread Fernando Lopez-Lezcano

On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
> * Fernando Lopez-Lezcano <[EMAIL PROTECTED]> wrote:
> > > > Changes since 2.6.21.5-rt18:
> > > >
> > > > - Fixed a nasty and hard to track down slowness / boot problem on SMP
> > > > machines with CONFIG_NOHZ enabled. The problem was caused by the timer
> > > > wheel base lock held during the get_next_timer_interrupt() call in the
> > > > idle path, which eventually led to a bogus PI boosting of the idle task
> > > > and in consequence a stale wrong scheduler selection for the affected 
> > > > idle
> > > > task.
> > > >
> > > > Kudos to Carsten Emde, who patiently and meticulously isolated the
> > > > problem and provided the traces, which allowed to identify the root 
> > > > cause.
> > > >
> > > > Problem solution: Prevent idle task boosting
> 
> > > Maybe someone remember me whining about troubles with 2.6.21-rt2..18 
> > > on my Core2 T7200 laptop (fujitsu-siemens amilo i1520).
> > > 
> > > Althought I'm still with my fingers crossed, I can tell the good 
> > > news are that 2.6.21.5-rt19 (and -rt20) does behave far better now 
> > > on the very same box.
> > 
> > Yes, it works much better indeed...
> > 
> > Ingo: is there a place where I can read about the changes in different 
> > rtxx releases? What is new/better/fixed in rt20? (I see scheduler 
> > stuff in a diff from rt19 to rt20 but I don't really know what it 
> > means).
> 
> and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when 
> CFS was merged.
> 
> i _think_ Rui might have seen two separate problems. Perhaps by the time 
> we fixed the first problem (which Rui saw since -rt2) we introduced the 
> other one via -rt11 - which then got fixed in -rt19.

Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?

> btw., we'd love to get more feedback regarding CFS. CFS is a completely 
> new scheduler for Linux. 

Then I'd rather have it separate from rt. 

> It has a design centered around keeping 
> application latencies down, so it is ultimately real-time friendly, and 
> it should also make things work better for desktop-ish and audio-ish 
> stuff as well. (even under SCHED_OTHER)

Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
> Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
> on my desktop but on my laptop it makes Firefox and Tomboy to crash.
> On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
> problem.
> 
> Cheers,
> 
> Hector
> 
> 
> On 7/7/07, Hector Centeno <[EMAIL PROTECTED]> wrote:
> Hi Fernando,
> 
> I do have Flash installed but for me Firefox crashes when
> trying to
> access gmail (which AFAIK doesn't use Flash, does it?). Right
> now
> Firefox is frozen and I'm typing this email using Konkeror (in
> Gnome). 
> This is ps' output:
> 
> hector3595  1.1  2.2 194352 46336 ?D16:25
> 0:03
> /usr/lib/firefox-2.0.0.4/firefox-bin
> 
> I think the problem is not present in my Desktop but I have to
> double 
> check. In the same laptop using the stock fedora kernel both
> Tomboy
> and Firefox work fine. My laptop has a centrino duo processor,
> 2 gigs
> of ram and the Inte GMA950 graphics chip.
> 
> Hector

I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9). Does not seem to happen with flash 7. Have
not tried yet with gmail and flash uninstalled. I'll try to strace it to
see when/why it hangs. 

-- Fernando


> So it would be nice if you could keep an extra eye on any scheduling 
> artifacts or regressions, and make sure your favorite workload is still 
> handled by the Linux scheduler in the utmost best way. I'd like to hear 
> about any sort of "scheduling behavior / interactivity" regression you 
> might see, relative to the vanilla kernel. Or if you can see no such 
> problems then a line of "it works as well as the previous scheduler" is 
> important info to us too. Thanks!


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 165 matches

Mail list logo