from:"Frank van Maarseveen"

Re: Machine check exception with a kernel dependency

2008-02-15 Thread Frank van Maarseveen

On Fri, Feb 15, 2008 at 01:22:41PM +, Alan Cox wrote:
> On Wed, 13 Feb 2008 17:25:28 +0100
> Frank van Maarseveen <[EMAIL PROTECTED]> wrote:
> 
> > On at least two Dell optiplex 755 systems with a Core 2 Duo I get
> > 
> > Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
> > Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
> > Feb 13 15:14:01 inari Bank 0: b2400800 
> > Feb 13 15:14:01 inari Bank 5: b200221024080400 
> > 
> > 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> > it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> > didn't help either.
> 
> If you run the MCE numbers through a decoder what do you get back ?

I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports "MCG status:RIPV MCIP" for 0005 and "MCG
status:MCIP" for 0004.

I've collected several Bank # output lines:

#  text
---
26 Bank 0: b2400800
10 Bank 5: b200121014040400
 8 Bank 5: b200121020080400
 4 Bank 5: b200221010040400
 4 Bank 5: b200221024080400

but mcelog expects lines of the format

CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx

(they got broken by netconsole) so I made these up:

CPU 1: Machine Check Exception: 0004 Bank 0: b2400800
CPU 0: Machine Check Exception: 0005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0005 Bank 5: b200221024080400

result:

CPU 1: Machine Check Exception: 0004 Bank 0: b2400800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b2400800 MCGSTATUS 4

CPU 0: Machine Check Exception: 0005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5

The problem also exists on an entirely different Xeon system with 4 cores:

cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU   X3210  @ 2.13GHz
stepping: 11

-- 
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Machine check exception with a kernel dependency

2008-02-15 Thread Frank van Maarseveen

On Fri, Feb 15, 2008 at 01:22:41PM +, Alan Cox wrote:
 On Wed, 13 Feb 2008 17:25:28 +0100
 Frank van Maarseveen [EMAIL PROTECTED] wrote:
 
  On at least two Dell optiplex 755 systems with a Core 2 Duo I get
  
  Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
  Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
  Feb 13 15:14:01 inari Bank 0: b2400800 
  Feb 13 15:14:01 inari Bank 5: b200221024080400 
  
  2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
  it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
  didn't help either.
 
 If you run the MCE numbers through a decoder what do you get back ?

I've some trouble decoding these in a convincing way. mcelog --core2
--ascii reports MCG status:RIPV MCIP for 0005 and MCG
status:MCIP for 0004.

I've collected several Bank # output lines:

#  text
---
26 Bank 0: b2400800
10 Bank 5: b200121014040400
 8 Bank 5: b200121020080400
 4 Bank 5: b200221010040400
 4 Bank 5: b200221024080400

but mcelog expects lines of the format

CPU %u: Machine Check Exception: %16Lx Bank %d: %016Lx

(they got broken by netconsole) so I made these up:

CPU 1: Machine Check Exception: 0004 Bank 0: b2400800
CPU 0: Machine Check Exception: 0005 Bank 5: b200121014040400
CPU 0: Machine Check Exception: 0005 Bank 5: b200121020080400
CPU 0: Machine Check Exception: 0005 Bank 5: b200221010040400
CPU 0: Machine Check Exception: 0005 Bank 5: b200221024080400

result:

CPU 1: Machine Check Exception: 0004 Bank 0: b2400800
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: BUS Level-0 Originated-request Generic Memory-access Request-timeout Error
BQ_DCU_READ_TYPE BQ_ERR_HARD_TYPE BQ_ERR_HARD_TYPE
timeout BINIT (ROB timeout)
STATUS b2400800 MCGSTATUS 4

CPU 0: Machine Check Exception: 0005 Bank 5: b200121014040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121014040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200121020080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200121020080400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200221010040400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221010040400 MCGSTATUS 5

CPU 0: Machine Check Exception: 0005 Bank 5: b200221024080400
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 5 MCG status:RIPV MCIP 
MCi status:
Uncorrected error
Error enabled
Processor context corrupt
MCA: Internal Timer error
STATUS b200221024080400 MCGSTATUS 5


The problem also exists on an entirely different Xeon system with 4 cores:

cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU   X3210  @ 2.13GHz
stepping: 11


-- 
Frank
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.24 sysprof induced MCE on Core 2 Duo (was: Machine check exception with a kernel dependency)

2008-02-14 Thread Frank van Maarseveen

On Wed, Feb 13, 2008 at 05:25:28PM +0100, Frank van Maarseveen wrote:
> On at least two Dell optiplex 755 systems with a Core 2 Duo I get

s/two/three/

> 
> Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
> Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
> Feb 13 15:14:01 inari Bank 0: b2400800 
> Feb 13 15:14:01 inari Bank 5: b200221024080400 
> 
> 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
> it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
> didn't help either.

This appeared not entirely correct. The problem has now been narrowed down
to the following preconditions:

kernel: 2.6.24-rc8, 2.6.24 or 2.6.24.2 (probably safe to say 2.6.24*)
CPU: Core 2 Duo CPU E4500
sysprof 1.9 (out of tree module, author/maintainer CC'ed)

It appears impossible to finish even one kernel build (using "make")
on a local disk without hitting a MCE with above combination.
A different system with a Pentium D CPU shows no problem.

FWIW, diff /proc/cpuinfo(ok) /proc/cpuinfo(MCE) for 2.6.24.2 shows:

3,7c3,7
< cpu family: 15
< model : 6
< model name: Intel(R) Pentium(R) D CPU 2.80GHz
< stepping  : 2
< cpu MHz   : 2793.235
---
> cpu family: 6
> model : 15
> model name: Intel(R) Core(TM)2 Duo CPU E4500  @ 2.20GHz
> stepping  : 13
> cpu MHz   : 2194.499
19c19
< cpuid level   : 6
---
> cpuid level   : 10
21,22c21,22
< flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
< bogomips  : 5590.45
---
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
> arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
> bogomips  : 4391.73
27,31c27,31
< cpu family: 15
< model : 6
< model name: Intel(R) Pentium(R) D CPU 2.80GHz
< stepping  : 2
< cpu MHz   : 2793.235
---
> cpu family: 6
> model : 15
> model name: Intel(R) Core(TM)2 Duo CPU E4500  @ 2.20GHz
> stepping  : 13
> cpu MHz   : 2194.499
43c43
< cpuid level   : 6
---
> cpuid level   : 10
45,46c45,46
< flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
< bogomips  : 5586.21
---
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
> arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
> bogomips  : 4388.96

gcc version 3.4.6 (Debian 3.4.6-6)
config:

CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_SUPPORTS_OPROFILE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_LSF=y

CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
CONFIG_DEFAULT_IOSCHED="anticipat

2.6.24 sysprof induced MCE on Core 2 Duo (was: Machine check exception with a kernel dependency)

2008-02-14 Thread Frank van Maarseveen

On Wed, Feb 13, 2008 at 05:25:28PM +0100, Frank van Maarseveen wrote:
 On at least two Dell optiplex 755 systems with a Core 2 Duo I get

s/two/three/

 
 Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
 Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
 Feb 13 15:14:01 inari Bank 0: b2400800 
 Feb 13 15:14:01 inari Bank 5: b200221024080400 
 
 2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
 it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
 didn't help either.

This appeared not entirely correct. The problem has now been narrowed down
to the following preconditions:

kernel: 2.6.24-rc8, 2.6.24 or 2.6.24.2 (probably safe to say 2.6.24*)
CPU: Core 2 Duo CPU E4500
sysprof 1.9 (out of tree module, author/maintainer CC'ed)

It appears impossible to finish even one kernel build (using make)
on a local disk without hitting a MCE with above combination.
A different system with a Pentium D CPU shows no problem.

FWIW, diff /proc/cpuinfo(ok) /proc/cpuinfo(MCE) for 2.6.24.2 shows:

3,7c3,7
 cpu family: 15
 model : 6
 model name: Intel(R) Pentium(R) D CPU 2.80GHz
 stepping  : 2
 cpu MHz   : 2793.235
---
 cpu family: 6
 model : 15
 model name: Intel(R) Core(TM)2 Duo CPU E4500  @ 2.20GHz
 stepping  : 13
 cpu MHz   : 2194.499
19c19
 cpuid level   : 6
---
 cpuid level   : 10
21,22c21,22
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
 bogomips  : 5590.45
---
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
 arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
 bogomips  : 4391.73
27,31c27,31
 cpu family: 15
 model : 6
 model name: Intel(R) Pentium(R) D CPU 2.80GHz
 stepping  : 2
 cpu MHz   : 2793.235
---
 cpu family: 6
 model : 15
 model name: Intel(R) Core(TM)2 Duo CPU E4500  @ 2.20GHz
 stepping  : 13
 cpu MHz   : 2194.499
43c43
 cpuid level   : 6
---
 cpuid level   : 10
45,46c45,46
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
pebs bts sync_rdtsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm
 bogomips  : 5586.21
---
 flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc 
 arch_perfmon pebs bts pni monitor ds_cpl est tm2 ssse3 cx16 xtpr lahf_lm
 bogomips  : 4388.96

gcc version 3.4.6 (Debian 3.4.6-6)
config:

CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_SUPPORTS_OPROFILE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_X86_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_KTIME_SCALAR=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_FAIR_USER_SCHED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
CONFIG_SYSCTL=y
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_LBD=y
CONFIG_LSF=y

CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
CONFIG_DEFAULT_IOSCHED=anticipatory
CONFIG_PREEMPT_NOTIFIERS=y

CONFIG_TICK_ONESHOT=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_PC=y
CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER=y
CONFIG_PARAVIRT_GUEST=y
CONFIG_M586TSC=y
CONFIG_X86_GENERIC=y

Machine check exception with a kernel dependency

2008-02-13 Thread Frank van Maarseveen

On at least two Dell optiplex 755 systems with a Core 2 Duo I get

Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
Feb 13 15:14:01 inari Bank 0: b2400800 
Feb 13 15:14:01 inari Bank 5: b200221024080400 

2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
didn't help either.

Does it make sense to try to pinpoint the problem to a particular
changeset (using git bisect I guess but I've never used it) or is the
hardware just broken?

-- 
Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Machine check exception with a kernel dependency

2008-02-13 Thread Frank van Maarseveen

On at least two Dell optiplex 755 systems with a Core 2 Duo I get

Feb 13 15:14:01 inari CPU 1: Machine Check Exception: 0004 
Feb 13 15:14:01 inari CPU 0: Machine Check Exception: 0005 
Feb 13 15:14:01 inari Bank 0: b2400800 
Feb 13 15:14:01 inari Bank 5: b200221024080400 

2.6.22.10 shows the problem, 2.6.24.2 ditto but I'm unable to reproduce
it with 2.6.24-rc8. BIOS upgrade didn't help. Removing all PCI[e] cards
didn't help either.

Does it make sense to try to pinpoint the problem to a particular
changeset (using git bisect I guess but I've never used it) or is the
hardware just broken?

-- 
Frank
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Wed, Nov 07, 2007 at 02:56:45PM +0100, Frank van Maarseveen wrote:
> On Tue, Nov 06, 2007 at 05:13:50PM -0600, Robert Hancock wrote:
> > Frank van Maarseveen wrote:
> > >For quite some time I'm seeing occasional lockups spread over 50 different
> > >machines I'm maintaining. Symptom: a page allocation failure with order:1,
> > >GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
> > >pages, almost no swap used) followed by a lockup (everything dead). I've
> > >collected all (12) crash cases which occurred the last 10 weeks on 50
> > >machines total (i.e. 1 crash every 41 weeks on average). The kernel
> > >messages are summarized to show the interesting part (IMO) they have
> > >in common. Over the years this has become the crash cause #1 for stable
> > >kernels for me (fglrx doesn't count ;).
> > >
> > >One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
> > >network driver via that same driver (netconsole) may not be the smartest
> > >thing to do and this could be responsible for the lockup itself. However,
> > >the initial page allocation failure remains and I'm not sure how to
> > >address that problem.
> > >
> > >I still think the issue is memory fragmentation but if so, it looks
> > >a bit extreme to me: One system with 2GB of ram crashed after a day,
> > >merely running a couple of TCP server programs. All systems have either
> > >1 or 2GB ram and at least 1G of (merely unused) swap.
> > 
> > These are all order-1 allocations for received network packets that need 
> > to be allocated out of low memory (assuming you're using a 32-bit 
> > kernel), so it's quite possible for them to fail on occasion. (Are you 
> > using jumbo frames?)
> 
> I don't use jumbo frames.
> 
> 
> > 
> > That should not be causing a lockup though.. the received packet should 
> > just get dropped.
> 
> Ok, packet loss is recoverable to some extend. When a system crashes
> I often see a couple of page allocation failures in the same second,
> all reported via netconsole.

[snip]

I've grepped for 'Normal free:' assuming it is the low memory you mention to see
how it correlates. Of the 12 cases 7 did crash, 5 recovered:

Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
crash

Oct 29 11:48:07 somero Normal free:5412kB min:3736kB low:4668kB high:5604kB 
active:288068kB inactive:105708kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 29 11:48:07 somero Normal free:6704kB min:3736kB low:4668kB high:5604kB 
active:287940kB inactive:105084kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 29 11:48:08 somero Normal free:8332kB min:3736kB low:4668kB high:5604kB 
active:287760kB inactive:104240kB present:889680kB pages_scanned:54 
all_unreclaimable? no 
ok (more cases with increasing free memory not received via netconsole)

Oct 26 11:27:01 naantali Normal free:3976kB min:3736kB low:4668kB high:5604kB 
active:318568kB inactive:152928kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB 
active:318256kB inactive:152856kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB 
active:318256kB inactive:152856kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
crash

Oct 12 14:56:44 koli Normal free:11628kB min:3736kB low:4668kB high:5604kB 
active:238112kB inactive:157232kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
ok

Oct  1 08:51:58 salla Normal free:5496kB min:3736kB low:4668kB high:5604kB 
active:409500kB inactive:46388kB present:889680kB pages_scanned:137 
all_unreclaimable? no 
Oct  1 08:51:59 salla Normal free:7396kB min:3736kB low:4668kB high:5604kB 
active:408292kB inactive:46740kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
crash

Sep 17 10:34:49 lokka Normal free:39756kB min:3736kB low:4668kB high:5604kB 
active:236916kB inactive:175624kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
ok

Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB 
active:424420kB inactive:45380kB present:889680kB pages_scanned:144 
all_unreclaimable? no 
Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB 
active:424420kB ina

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Tue, Nov 06, 2007 at 05:13:50PM -0600, Robert Hancock wrote:
> Frank van Maarseveen wrote:
> >For quite some time I'm seeing occasional lockups spread over 50 different
> >machines I'm maintaining. Symptom: a page allocation failure with order:1,
> >GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
> >pages, almost no swap used) followed by a lockup (everything dead). I've
> >collected all (12) crash cases which occurred the last 10 weeks on 50
> >machines total (i.e. 1 crash every 41 weeks on average). The kernel
> >messages are summarized to show the interesting part (IMO) they have
> >in common. Over the years this has become the crash cause #1 for stable
> >kernels for me (fglrx doesn't count ;).
> >
> >One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
> >network driver via that same driver (netconsole) may not be the smartest
> >thing to do and this could be responsible for the lockup itself. However,
> >the initial page allocation failure remains and I'm not sure how to
> >address that problem.
> >
> >I still think the issue is memory fragmentation but if so, it looks
> >a bit extreme to me: One system with 2GB of ram crashed after a day,
> >merely running a couple of TCP server programs. All systems have either
> >1 or 2GB ram and at least 1G of (merely unused) swap.
> 
> These are all order-1 allocations for received network packets that need 
> to be allocated out of low memory (assuming you're using a 32-bit 
> kernel), so it's quite possible for them to fail on occasion. (Are you 
> using jumbo frames?)

I don't use jumbo frames.


> 
> That should not be causing a lockup though.. the received packet should 
> just get dropped.

Ok, packet loss is recoverable to some extend. When a system crashes
I often see a couple of page allocation failures in the same second,
all reported via netconsole. Here's the full log of such a case:

Oct 26 11:27:01 naantali kswapd0: page allocation failure. order:1, mode:0x4020 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali show_trace_log_lvl+0x1a/0x30 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali show_trace+0x12/0x20 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali dump_stack+0x16/0x20 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali __alloc_pages+0x27e/0x300 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali allocate_slab+0x46/0x90 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali new_slab+0x31/0x140 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali __slab_alloc+0xbc/0x180 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali __kmalloc_track_caller+0x74/0x80 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali __alloc_skb+0x4d/0x110 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_collapse+0x17e/0x3b0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_prune_queue+0x7f/0x1c0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_data_queue+0x487/0x720 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_rcv_established+0x3a0/0x6e0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_v4_do_rcv+0xe9/0x100 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali tcp_v4_rcv+0x7f1/0x8d0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali ip_local_deliver+0xef/0x250 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali ip_rcv+0x264/0x560 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali netif_receive_skb+0x2ad/0x320 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali process_backlog+0x91/0x120 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali net_rx_action+0x8d/0x170 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali __do_softirq+0x78/0x100 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali do_softirq+0x3c/0x40 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali irq_exit+0x45/0x50 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali do_IRQ+0x4f/0xa0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali common_interrupt+0x23/0x30 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali _spin_unlock+0x16/0x20 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali prune_dcache+0x142/0x1a0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali shrink_dcache_memory+0x1e/0x50 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali shrink_slab+0x139/0x1d0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali balance_pgdat+0x220/0x380 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali kswapd+0xd8/0x140 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali kthread+0x5c/0xa0 
Oct 26 11:27:01 naantali  [] 
Oct 26 11:27:01 naantali kernel_thread_helper+0x7/0x10 
Oct 26 11:27:01 naantali  === 
Oct 26 11:27:01 naantali Mem-info: 
Oct 26 11:27:01 naantali DMA per-cpu: 
Oct 26 11:27:01 naantali CPU0: Hot: hi:0, btch:   1 us

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Wed, Nov 07, 2007 at 09:01:17AM +1100, Nick Piggin wrote:
> On Tuesday 06 November 2007 04:42, Frank van Maarseveen wrote:
> > For quite some time I'm seeing occasional lockups spread over 50 different
> > machines I'm maintaining. Symptom: a page allocation failure with order:1,
> > GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
> > pages, almost no swap used) followed by a lockup (everything dead). I've
> > collected all (12) crash cases which occurred the last 10 weeks on 50
> > machines total (i.e. 1 crash every 41 weeks on average). The kernel
> > messages are summarized to show the interesting part (IMO) they have
> > in common. Over the years this has become the crash cause #1 for stable
> > kernels for me (fglrx doesn't count ;).
> >
> > One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
> > network driver via that same driver (netconsole) may not be the smartest
> > thing to do and this could be responsible for the lockup itself. However,
> > the initial page allocation failure remains and I'm not sure how to
> > address that problem.
> 
> It isn't unexpected. If an atomic allocation doesn't have enough memory,
> it kicks off kswapd to start freeing memory for it. However, it cannot
> wait for memory to become free (it's GFP_ATOMIC), so it has to return
> failure. GFP_ATOMIC allocation paths are designed so that the kernel can
> recover from this situation, and a subsequent allocation will have free
> memory.
> 
> Probably in production kernels we should default to only reporting this
> when page reclaim is not making any progress.
> 
> 
> > I still think the issue is memory fragmentation but if so, it looks
> > a bit extreme to me: One system with 2GB of ram crashed after a day,
> > merely running a couple of TCP server programs. All systems have either
> > 1 or 2GB ram and at least 1G of (merely unused) swap.
> 
> You can reduce the chances of it happening by increasing
> /proc/sys/vm/min_free_kbytes.

It's 3807 everywhere by default here which means roughly 950 pages if I
understand correctly. However, the problem occurs with much more free
pages as it seems. "grep '  free:' messages*" on the netconsole logging
machine shows:

messages:Nov  5 12:58:27 lokka  free:89531 slab:122946 mapped:4421 
pagetables:295 bounce:0
messages.0:Oct 29 11:48:07 somero  free:113487 slab:109249 mapped:177942 
pagetables:645 bounce:0
messages.1:Oct 26 11:27:01 naantali  free:41073 slab:97335 mapped:2540 
pagetables:149 bounce:0
messages.3:Oct 12 14:56:44 koli  free:25694 slab:111842 mapped:16299 
pagetables:794 bounce:0
messages.4:Oct  1 08:51:58 salla  free:2964 slab:97675 mapped:279622 
pagetables:650 bounce:0
messages.6:Sep 17 10:34:49 lokka  free:170034 slab:95601 mapped:5548 
pagetables:304 bounce:0
messages.6:Sep 17 10:48:48 karvio  free:33597 slab:94909 mapped:247838 
pagetables:695 bounce:0
messages.6:Sep 20 10:32:50 nivala  free:80943 slab:93154 mapped:200040 
pagetables:698 bounce:0
messages.8:Sep  3 09:46:11 lahti  free:8195 slab:125438 mapped:2911 
pagetables:192 bounce:0
messages.9:Aug 30 10:40:46 ropi  free:61633 slab:90119 mapped:272634 
pagetables:487 bounce:0
messages.9:Aug 30 10:46:58 ivalo  free:41600 slab:88279 mapped:272705 
pagetables:487 bounce:0
messages.9:Aug 31 16:30:02 lokka  free:40661 slab:115006 mapped:21208 
pagetables:493 bounce:0

So it is happening even with 170034 free pages (Sep 17 10:34:49),
i.e. 640M free. In this particular case the machine didn't crash but in
the majority of cases it does. Here's the full log of the 170034 free
pages case:

Sep 17 10:34:49 lokka kernel: ftxpd: page allocation failure. order:1, 
mode:0x4020
Sep 17 10:34:49 lokka kernel:  [] show_trace_log_lvl+0x1a/0x30
Sep 17 10:34:49 lokka kernel:  [] show_trace+0x12/0x20
Sep 17 10:34:49 lokka kernel:  [] dump_stack+0x16/0x20
Sep 17 10:34:49 lokka kernel:  [] __alloc_pages+0x27e/0x300
Sep 17 10:34:49 lokka kernel:  [] allocate_slab+0x46/0x90
Sep 17 10:34:49 lokka kernel:  [] new_slab+0x31/0x140
Sep 17 10:34:49 lokka kernel:  [] __slab_alloc+0xbc/0x180
Sep 17 10:34:49 lokka kernel:  [] __kmalloc_track_caller+0x74/0x80
Sep 17 10:34:49 lokka kernel:  [] __alloc_skb+0x4d/0x110
Sep 17 10:34:49 lokka kernel:  [] tcp_collapse+0x17e/0x3b0
Sep 17 10:34:49 lokka kernel:  [] tcp_prune_queue+0x7f/0x1c0
Sep 17 10:34:49 lokka kernel:  [] tcp_data_queue+0x487/0x720
Sep 17 10:34:49 lokka kernel:  [] tcp_rcv_established+0x3a0/0x6e0
Sep 17 10:34:49 lokka kernel:  [] tcp_v4_do_rcv+0xe9/0x100
Sep 17 10:34:49 lokka kernel:  [] tcp_v4_rcv+0x7f1/0x8d0
Sep 17 10:34:49 lokka kernel:  [] ip_local_deliver+0xef/0x250
Sep 17 10:34:49 lokka kernel:  [] ip_rcv+0x264/0x560
Sep 17 10:34:49 lokka kernel:  [] netif_receive_skb+0x2ad/0x320
Sep 17 10:34:49 lokka kernel:  [] process_backlog+0x91/0x120
Sep 17 10:34:49 lokka

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Wed, Nov 07, 2007 at 09:01:17AM +1100, Nick Piggin wrote:
 On Tuesday 06 November 2007 04:42, Frank van Maarseveen wrote:
  For quite some time I'm seeing occasional lockups spread over 50 different
  machines I'm maintaining. Symptom: a page allocation failure with order:1,
  GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
  pages, almost no swap used) followed by a lockup (everything dead). I've
  collected all (12) crash cases which occurred the last 10 weeks on 50
  machines total (i.e. 1 crash every 41 weeks on average). The kernel
  messages are summarized to show the interesting part (IMO) they have
  in common. Over the years this has become the crash cause #1 for stable
  kernels for me (fglrx doesn't count ;).
 
  One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
  network driver via that same driver (netconsole) may not be the smartest
  thing to do and this could be responsible for the lockup itself. However,
  the initial page allocation failure remains and I'm not sure how to
  address that problem.
 
 It isn't unexpected. If an atomic allocation doesn't have enough memory,
 it kicks off kswapd to start freeing memory for it. However, it cannot
 wait for memory to become free (it's GFP_ATOMIC), so it has to return
 failure. GFP_ATOMIC allocation paths are designed so that the kernel can
 recover from this situation, and a subsequent allocation will have free
 memory.
 
 Probably in production kernels we should default to only reporting this
 when page reclaim is not making any progress.
 
 
  I still think the issue is memory fragmentation but if so, it looks
  a bit extreme to me: One system with 2GB of ram crashed after a day,
  merely running a couple of TCP server programs. All systems have either
  1 or 2GB ram and at least 1G of (merely unused) swap.
 
 You can reduce the chances of it happening by increasing
 /proc/sys/vm/min_free_kbytes.

It's 3807 everywhere by default here which means roughly 950 pages if I
understand correctly. However, the problem occurs with much more free
pages as it seems. grep '  free:' messages* on the netconsole logging
machine shows:

messages:Nov  5 12:58:27 lokka  free:89531 slab:122946 mapped:4421 
pagetables:295 bounce:0
messages.0:Oct 29 11:48:07 somero  free:113487 slab:109249 mapped:177942 
pagetables:645 bounce:0
messages.1:Oct 26 11:27:01 naantali  free:41073 slab:97335 mapped:2540 
pagetables:149 bounce:0
messages.3:Oct 12 14:56:44 koli  free:25694 slab:111842 mapped:16299 
pagetables:794 bounce:0
messages.4:Oct  1 08:51:58 salla  free:2964 slab:97675 mapped:279622 
pagetables:650 bounce:0
messages.6:Sep 17 10:34:49 lokka  free:170034 slab:95601 mapped:5548 
pagetables:304 bounce:0
messages.6:Sep 17 10:48:48 karvio  free:33597 slab:94909 mapped:247838 
pagetables:695 bounce:0
messages.6:Sep 20 10:32:50 nivala  free:80943 slab:93154 mapped:200040 
pagetables:698 bounce:0
messages.8:Sep  3 09:46:11 lahti  free:8195 slab:125438 mapped:2911 
pagetables:192 bounce:0
messages.9:Aug 30 10:40:46 ropi  free:61633 slab:90119 mapped:272634 
pagetables:487 bounce:0
messages.9:Aug 30 10:46:58 ivalo  free:41600 slab:88279 mapped:272705 
pagetables:487 bounce:0
messages.9:Aug 31 16:30:02 lokka  free:40661 slab:115006 mapped:21208 
pagetables:493 bounce:0

So it is happening even with 170034 free pages (Sep 17 10:34:49),
i.e. 640M free. In this particular case the machine didn't crash but in
the majority of cases it does. Here's the full log of the 170034 free
pages case:

Sep 17 10:34:49 lokka kernel: ftxpd: page allocation failure. order:1, 
mode:0x4020
Sep 17 10:34:49 lokka kernel:  [c01054aa] show_trace_log_lvl+0x1a/0x30
Sep 17 10:34:49 lokka kernel:  [c01054d2] show_trace+0x12/0x20
Sep 17 10:34:49 lokka kernel:  [c01055f6] dump_stack+0x16/0x20
Sep 17 10:34:49 lokka kernel:  [c0156ade] __alloc_pages+0x27e/0x300
Sep 17 10:34:49 lokka kernel:  [c016e926] allocate_slab+0x46/0x90
Sep 17 10:34:49 lokka kernel:  [c016e9e1] new_slab+0x31/0x140
Sep 17 10:34:49 lokka kernel:  [c016efec] __slab_alloc+0xbc/0x180
Sep 17 10:34:49 lokka kernel:  [c0170364] __kmalloc_track_caller+0x74/0x80
Sep 17 10:34:49 lokka kernel:  [c04b1d8d] __alloc_skb+0x4d/0x110
Sep 17 10:34:49 lokka kernel:  [c0500cce] tcp_collapse+0x17e/0x3b0
Sep 17 10:34:49 lokka kernel:  [c050103f] tcp_prune_queue+0x7f/0x1c0
Sep 17 10:34:49 lokka kernel:  [c05008b7] tcp_data_queue+0x487/0x720
Sep 17 10:34:49 lokka kernel:  [c0501a20] tcp_rcv_established+0x3a0/0x6e0
Sep 17 10:34:49 lokka kernel:  [c0509859] tcp_v4_do_rcv+0xe9/0x100
Sep 17 10:34:49 lokka kernel:  [c050a061] tcp_v4_rcv+0x7f1/0x8d0
Sep 17 10:34:49 lokka kernel:  [c04ed2bf] ip_local_deliver+0xef/0x250
Sep 17 10:34:49 lokka kernel:  [c04ed874] ip_rcv+0x264/0x560
Sep 17 10:34:49 lokka kernel:  [c04b905d] netif_receive_skb+0x2ad/0x320
Sep 17 10:34:49 lokka kernel:  [c04b9161] process_backlog+0x91/0x120
Sep 17 10:34:49 lokka kernel:  [c04b927d] net_rx_action+0x8d/0x170
Sep 17 10:34:49 lokka kernel

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Wed, Nov 07, 2007 at 02:56:45PM +0100, Frank van Maarseveen wrote:
 On Tue, Nov 06, 2007 at 05:13:50PM -0600, Robert Hancock wrote:
  Frank van Maarseveen wrote:
  For quite some time I'm seeing occasional lockups spread over 50 different
  machines I'm maintaining. Symptom: a page allocation failure with order:1,
  GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
  pages, almost no swap used) followed by a lockup (everything dead). I've
  collected all (12) crash cases which occurred the last 10 weeks on 50
  machines total (i.e. 1 crash every 41 weeks on average). The kernel
  messages are summarized to show the interesting part (IMO) they have
  in common. Over the years this has become the crash cause #1 for stable
  kernels for me (fglrx doesn't count ;).
  
  One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
  network driver via that same driver (netconsole) may not be the smartest
  thing to do and this could be responsible for the lockup itself. However,
  the initial page allocation failure remains and I'm not sure how to
  address that problem.
  
  I still think the issue is memory fragmentation but if so, it looks
  a bit extreme to me: One system with 2GB of ram crashed after a day,
  merely running a couple of TCP server programs. All systems have either
  1 or 2GB ram and at least 1G of (merely unused) swap.
  
  These are all order-1 allocations for received network packets that need 
  to be allocated out of low memory (assuming you're using a 32-bit 
  kernel), so it's quite possible for them to fail on occasion. (Are you 
  using jumbo frames?)
 
 I don't use jumbo frames.
 
 
  
  That should not be causing a lockup though.. the received packet should 
  just get dropped.
 
 Ok, packet loss is recoverable to some extend. When a system crashes
 I often see a couple of page allocation failures in the same second,
 all reported via netconsole.

[snip]

I've grepped for 'Normal free:' assuming it is the low memory you mention to see
how it correlates. Of the 12 cases 7 did crash, 5 recovered:

Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
Nov  5 12:58:27 lokka Normal free:6444kB min:3736kB low:4668kB high:5604kB 
active:235196kB inactive:104336kB present:889680kB pages_scanned:44 
all_unreclaimable? no 
crash

Oct 29 11:48:07 somero Normal free:5412kB min:3736kB low:4668kB high:5604kB 
active:288068kB inactive:105708kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 29 11:48:07 somero Normal free:6704kB min:3736kB low:4668kB high:5604kB 
active:287940kB inactive:105084kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 29 11:48:08 somero Normal free:8332kB min:3736kB low:4668kB high:5604kB 
active:287760kB inactive:104240kB present:889680kB pages_scanned:54 
all_unreclaimable? no 
ok (more cases with increasing free memory not received via netconsole)

Oct 26 11:27:01 naantali Normal free:3976kB min:3736kB low:4668kB high:5604kB 
active:318568kB inactive:152928kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB 
active:318256kB inactive:152856kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
Oct 26 11:27:01 naantali Normal free:4408kB min:3736kB low:4668kB high:5604kB 
active:318256kB inactive:152856kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
crash

Oct 12 14:56:44 koli Normal free:11628kB min:3736kB low:4668kB high:5604kB 
active:238112kB inactive:157232kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
ok

Oct  1 08:51:58 salla Normal free:5496kB min:3736kB low:4668kB high:5604kB 
active:409500kB inactive:46388kB present:889680kB pages_scanned:137 
all_unreclaimable? no 
Oct  1 08:51:59 salla Normal free:7396kB min:3736kB low:4668kB high:5604kB 
active:408292kB inactive:46740kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
crash

Sep 17 10:34:49 lokka Normal free:39756kB min:3736kB low:4668kB high:5604kB 
active:236916kB inactive:175624kB present:889680kB pages_scanned:0 
all_unreclaimable? no 
ok

Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB 
active:424420kB inactive:45380kB present:889680kB pages_scanned:144 
all_unreclaimable? no 
Sep 17 10:48:48 karvio Normal free:11648kB min:3736kB low:4668kB high:5604kB 
active:424420kB inactive:45380kB present:889680kB pages_scanned:144 
all_unreclaimable? no 
crash

Sep 20 10:32:50 nivala Normal free:27276kB min:3736kB low:4668kB high:5604kB 
active:354084kB inactive:104152kB present:889680kB pages_scanned:260 
all_unreclaimable? no 
crash

Sep  3 09:46:11 lahti Normal free:26200kB min:3736kB low:4668kB high:5604kB 
active

Re: VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-07 Thread Frank van Maarseveen

On Tue, Nov 06, 2007 at 05:13:50PM -0600, Robert Hancock wrote:
 Frank van Maarseveen wrote:
 For quite some time I'm seeing occasional lockups spread over 50 different
 machines I'm maintaining. Symptom: a page allocation failure with order:1,
 GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
 pages, almost no swap used) followed by a lockup (everything dead). I've
 collected all (12) crash cases which occurred the last 10 weeks on 50
 machines total (i.e. 1 crash every 41 weeks on average). The kernel
 messages are summarized to show the interesting part (IMO) they have
 in common. Over the years this has become the crash cause #1 for stable
 kernels for me (fglrx doesn't count ;).
 
 One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
 network driver via that same driver (netconsole) may not be the smartest
 thing to do and this could be responsible for the lockup itself. However,
 the initial page allocation failure remains and I'm not sure how to
 address that problem.
 
 I still think the issue is memory fragmentation but if so, it looks
 a bit extreme to me: One system with 2GB of ram crashed after a day,
 merely running a couple of TCP server programs. All systems have either
 1 or 2GB ram and at least 1G of (merely unused) swap.
 
 These are all order-1 allocations for received network packets that need 
 to be allocated out of low memory (assuming you're using a 32-bit 
 kernel), so it's quite possible for them to fail on occasion. (Are you 
 using jumbo frames?)

I don't use jumbo frames.


 
 That should not be causing a lockup though.. the received packet should 
 just get dropped.

Ok, packet loss is recoverable to some extend. When a system crashes
I often see a couple of page allocation failures in the same second,
all reported via netconsole. Here's the full log of such a case:

Oct 26 11:27:01 naantali kswapd0: page allocation failure. order:1, mode:0x4020 
Oct 26 11:27:01 naantali  [c01054aa] 
Oct 26 11:27:01 naantali show_trace_log_lvl+0x1a/0x30 
Oct 26 11:27:01 naantali  [c01054d2] 
Oct 26 11:27:01 naantali show_trace+0x12/0x20 
Oct 26 11:27:01 naantali  [c01055f6] 
Oct 26 11:27:01 naantali dump_stack+0x16/0x20 
Oct 26 11:27:01 naantali  [c0156ace] 
Oct 26 11:27:01 naantali __alloc_pages+0x27e/0x300 
Oct 26 11:27:01 naantali  [c016e916] 
Oct 26 11:27:01 naantali allocate_slab+0x46/0x90 
Oct 26 11:27:01 naantali  [c016e9d1] 
Oct 26 11:27:01 naantali new_slab+0x31/0x140 
Oct 26 11:27:01 naantali  [c016efdc] 
Oct 26 11:27:01 naantali __slab_alloc+0xbc/0x180 
Oct 26 11:27:01 naantali  [c0170354] 
Oct 26 11:27:01 naantali __kmalloc_track_caller+0x74/0x80 
Oct 26 11:27:01 naantali  [c04b193d] 
Oct 26 11:27:01 naantali __alloc_skb+0x4d/0x110 
Oct 26 11:27:01 naantali  [c050088e] 
Oct 26 11:27:01 naantali tcp_collapse+0x17e/0x3b0 
Oct 26 11:27:01 naantali  [c0500bff] 
Oct 26 11:27:01 naantali tcp_prune_queue+0x7f/0x1c0 
Oct 26 11:27:01 naantali  [c0500477] 
Oct 26 11:27:01 naantali tcp_data_queue+0x487/0x720 
Oct 26 11:27:01 naantali  [c05015e0] 
Oct 26 11:27:01 naantali tcp_rcv_established+0x3a0/0x6e0 
Oct 26 11:27:01 naantali  [c0509419] 
Oct 26 11:27:01 naantali tcp_v4_do_rcv+0xe9/0x100 
Oct 26 11:27:01 naantali  [c0509c21] 
Oct 26 11:27:01 naantali tcp_v4_rcv+0x7f1/0x8d0 
Oct 26 11:27:01 naantali  [c04ece8f] 
Oct 26 11:27:01 naantali ip_local_deliver+0xef/0x250 
Oct 26 11:27:01 naantali  [c04ed444] 
Oct 26 11:27:01 naantali ip_rcv+0x264/0x560 
Oct 26 11:27:01 naantali  [c04b8c2d] 
Oct 26 11:27:01 naantali netif_receive_skb+0x2ad/0x320 
Oct 26 11:27:01 naantali  [c04b8d31] 
Oct 26 11:27:01 naantali process_backlog+0x91/0x120 
Oct 26 11:27:01 naantali  [c04b8e4d] 
Oct 26 11:27:01 naantali net_rx_action+0x8d/0x170 
Oct 26 11:27:01 naantali  [c0128de8] 
Oct 26 11:27:01 naantali __do_softirq+0x78/0x100 
Oct 26 11:27:01 naantali  [c0128eac] 
Oct 26 11:27:01 naantali do_softirq+0x3c/0x40 
Oct 26 11:27:01 naantali  [c0128f15] 
Oct 26 11:27:01 naantali irq_exit+0x45/0x50 
Oct 26 11:27:01 naantali  [c0106d2f] 
Oct 26 11:27:01 naantali do_IRQ+0x4f/0xa0 
Oct 26 11:27:01 naantali  [c01050b3] 
Oct 26 11:27:01 naantali common_interrupt+0x23/0x30 
Oct 26 11:27:01 naantali  [c0575736] 
Oct 26 11:27:01 naantali _spin_unlock+0x16/0x20 
Oct 26 11:27:01 naantali  [c0185052] 
Oct 26 11:27:01 naantali prune_dcache+0x142/0x1a0 
Oct 26 11:27:01 naantali  [c01855ee] 
Oct 26 11:27:01 naantali shrink_dcache_memory+0x1e/0x50 
Oct 26 11:27:01 naantali  [c015a4d9] 
Oct 26 11:27:01 naantali shrink_slab+0x139/0x1d0 
Oct 26 11:27:01 naantali  [c015b980] 
Oct 26 11:27:01 naantali balance_pgdat+0x220/0x380 
Oct 26 11:27:01 naantali  [c015bbb8] 
Oct 26 11:27:01 naantali kswapd+0xd8/0x140 
Oct 26 11:27:01 naantali  [c0136eac] 
Oct 26 11:27:01 naantali kthread+0x5c/0xa0 
Oct 26 11:27:01 naantali  [c0105337] 
Oct 26 11:27:01 naantali kernel_thread_helper+0x7/0x10 
Oct 26 11:27:01 naantali  === 
Oct 26 11:27:01 naantali Mem-info: 
Oct 26 11:27:01 naantali DMA per-cpu

VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-05 Thread Frank van Maarseveen

For quite some time I'm seeing occasional lockups spread over 50 different
machines I'm maintaining. Symptom: a page allocation failure with order:1,
GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
pages, almost no swap used) followed by a lockup (everything dead). I've
collected all (12) crash cases which occurred the last 10 weeks on 50
machines total (i.e. 1 crash every 41 weeks on average). The kernel
messages are summarized to show the interesting part (IMO) they have
in common. Over the years this has become the crash cause #1 for stable
kernels for me (fglrx doesn't count ;).

One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
network driver via that same driver (netconsole) may not be the smartest
thing to do and this could be responsible for the lockup itself. However,
the initial page allocation failure remains and I'm not sure how to
address that problem.

I still think the issue is memory fragmentation but if so, it looks
a bit extreme to me: One system with 2GB of ram crashed after a day,
merely running a couple of TCP server programs. All systems have either
1 or 2GB ram and at least 1G of (merely unused) swap.


2.6.22.10: lokka kswapd0: page allocation failure. order:1, mode:0x4020
Nov  5 12:58:27 lokka __alloc_skb+0x4d/0x110 
Nov  5 12:58:27 lokka __netdev_alloc_skb+0x23/0x50 
Nov  5 12:58:27 lokka e1000_alloc_rx_buffers+0x127/0x320  
Nov  5 12:58:27 lokka e1000_clean_rx_irq+0x299/0x510 
Nov  5 12:58:27 lokka e1000_intr+0x80/0x140 
Nov  5 12:58:27 lokka e1000_intr+0x80/0x140
Nov  5 12:58:27 lokka handle_IRQ_event+0x28/0x60
Nov  5 12:58:27 lokka handle_fasteoi_irq+0x6e/0xd0
Nov  5 12:58:27 lokka do_IRQ+0x4a/0xa0
...
Nov  5 12:58:27 lokka Active:139432 inactive:152030 dirty:4406 writeback:4 
unstable:0 
Nov  5 12:58:27 lokka  free:89531 slab:122946 mapped:4421 pagetables:295 
bounce:0 
...
Nov  5 12:58:27 lokka Swap cache: add 16, delete 16, find 0/0, race 0+0 
Nov  5 12:58:27 lokka Free swap  = 1004004kB 
Nov  5 12:58:27 lokka Total swap = 1004052kB 
Nov  5 12:58:27 lokka Free swap:   1004004kB 
Nov  5 12:58:27 lokka 523912 pages of RAM 
Nov  5 12:58:27 lokka 294536 pages of HIGHMEM 


2.6.21.6: somero ftxpd: page allocation failure. order:1, mode:0x20 
Oct 29 11:48:07 somero __alloc_skb+0x4c/0x110
Oct 29 11:48:07 somero __netdev_alloc_skb+0x1d/0x50
Oct 29 11:48:07 somero tg3_alloc_rx_skb+0x74/0x150
Oct 29 11:48:07 somero tg3_rx+0x35a/0x440
Oct 29 11:48:07 somero tg3_poll+0x77/0x1e0
Oct 29 11:48:07 somero net_rx_action+0x99/0x190
Oct 29 11:48:07 somero __do_softirq+0x80/0xf0
Oct 29 11:48:07 somero do_softirq+0x3c/0x40
Oct 29 11:48:07 somero irq_exit+0x45/0x50
Oct 29 11:48:07 somero do_IRQ+0x50/0xa0
...
Oct 29 11:48:07 somero Active:227052 inactive:48283 dirty:3423 writeback:0 
unstable:0 
Oct 29 11:48:07 somero  free:113487 slab:109249 mapped:177942 pagetables:645 
bounce:0 
...
Oct 29 11:48:07 somero Swap cache: add 428601, delete 422625, find 
84672794/84685901, race 0+7 
Oct 29 11:48:07 somero Free swap  = 1901644kB 
Oct 29 11:48:07 somero Total swap = 2008116kB 
Oct 29 11:48:07 somero Free swap:   1901644kB 
Oct 29 11:48:07 somero 521862 pages of RAM 
Oct 29 11:48:07 somero 292486 pages of HIGHMEM 


2.6.22.10: naantali kswapd0: page allocation failure. order:1, mode:0x4020
(uptime: one day, merely running a couple of TCP server programs)
Oct 26 11:27:01 naantali __alloc_skb+0x4d/0x110
Oct 26 11:27:01 naantali tcp_collapse+0x17e/0x3b0
Oct 26 11:27:01 naantali tcp_prune_queue+0x7f/0x1c0
Oct 26 11:27:01 naantali tcp_data_queue+0x487/0x720
Oct 26 11:27:01 naantali tcp_rcv_established+0x3a0/0x6e0
Oct 26 11:27:01 naantali tcp_v4_do_rcv+0xe9/0x100
Oct 26 11:27:01 naantali tcp_v4_rcv+0x7f1/0x8d0
Oct 26 11:27:01 naantali ip_local_deliver+0xef/0x250
Oct 26 11:27:01 naantali ip_rcv+0x264/0x560
Oct 26 11:27:01 naantali netif_receive_skb+0x2ad/0x320
Oct 26 11:27:01 naantali process_backlog+0x91/0x120
Oct 26 11:27:01 naantali net_rx_action+0x8d/0x170
Oct 26 11:27:01 naantali __do_softirq+0x78/0x100
Oct 26 11:27:01 naantali do_softirq+0x3c/0x40
Oct 26 11:27:01 naantali irq_exit+0x45/0x50
Oct 26 11:27:01 naantali do_IRQ+0x4f/0xa0
...
Oct 26 11:27:01 naantali Active:115770 inactive:257188 dirty:10243 
writeback:1920 unstable:0
Oct 26 11:27:01 naantali  free:41073 slab:97335 mapped:2540 pagetables:149 
bounce:0
...
Oct 26 11:27:01 naantali Swap cache: add 90, delete 88, find 18/28, race 0+0
Oct 26 11:27:01 naantali Free swap  = 2008048kB
Oct 26 11:27:01 naantali Total swap = 2008084kB
Oct 26 11:27:01 naantali Free swap:   2008048kB
Oct 26 11:27:01 naantali 524144 pages of RAM
Oct 26 11:27:01 naantali 294768 pages of HIGHMEM


2.6.21.6: koli ftxpd: page allocation failure. order:1, mode:0x20
Oct 12 14:56:44 koli __alloc_skb+0x4c/0x110 
Oct 12 14:56:44 koli __netdev_alloc_skb+0x1d/0x50
Oct 12 14:56:44 koli tg3_alloc_rx_skb+0x74/0x150
Oct 12 14:56:44 koli tg3_rx+0x35a/0x440
Oct 12 14:56:44 koli tg3_poll+0x77/0x1e0
Oct 12 14:56:44 koli

VM/networking crash cause #1: page allocation failure (order:1, GFP_ATOMIC)

2007-11-05 Thread Frank van Maarseveen

For quite some time I'm seeing occasional lockups spread over 50 different
machines I'm maintaining. Symptom: a page allocation failure with order:1,
GFP_ATOMIC, while there is plenty of memory, as it seems (lots of free
pages, almost no swap used) followed by a lockup (everything dead). I've
collected all (12) crash cases which occurred the last 10 weeks on 50
machines total (i.e. 1 crash every 41 weeks on average). The kernel
messages are summarized to show the interesting part (IMO) they have
in common. Over the years this has become the crash cause #1 for stable
kernels for me (fglrx doesn't count ;).

One note: I suspect that reporting a GFP_ATOMIC allocation failure in an
network driver via that same driver (netconsole) may not be the smartest
thing to do and this could be responsible for the lockup itself. However,
the initial page allocation failure remains and I'm not sure how to
address that problem.

I still think the issue is memory fragmentation but if so, it looks
a bit extreme to me: One system with 2GB of ram crashed after a day,
merely running a couple of TCP server programs. All systems have either
1 or 2GB ram and at least 1G of (merely unused) swap.


2.6.22.10: lokka kswapd0: page allocation failure. order:1, mode:0x4020
Nov  5 12:58:27 lokka __alloc_skb+0x4d/0x110 
Nov  5 12:58:27 lokka __netdev_alloc_skb+0x23/0x50 
Nov  5 12:58:27 lokka e1000_alloc_rx_buffers+0x127/0x320  
Nov  5 12:58:27 lokka e1000_clean_rx_irq+0x299/0x510 
Nov  5 12:58:27 lokka e1000_intr+0x80/0x140 
Nov  5 12:58:27 lokka e1000_intr+0x80/0x140
Nov  5 12:58:27 lokka handle_IRQ_event+0x28/0x60
Nov  5 12:58:27 lokka handle_fasteoi_irq+0x6e/0xd0
Nov  5 12:58:27 lokka do_IRQ+0x4a/0xa0
...
Nov  5 12:58:27 lokka Active:139432 inactive:152030 dirty:4406 writeback:4 
unstable:0 
Nov  5 12:58:27 lokka  free:89531 slab:122946 mapped:4421 pagetables:295 
bounce:0 
...
Nov  5 12:58:27 lokka Swap cache: add 16, delete 16, find 0/0, race 0+0 
Nov  5 12:58:27 lokka Free swap  = 1004004kB 
Nov  5 12:58:27 lokka Total swap = 1004052kB 
Nov  5 12:58:27 lokka Free swap:   1004004kB 
Nov  5 12:58:27 lokka 523912 pages of RAM 
Nov  5 12:58:27 lokka 294536 pages of HIGHMEM 


2.6.21.6: somero ftxpd: page allocation failure. order:1, mode:0x20 
Oct 29 11:48:07 somero __alloc_skb+0x4c/0x110
Oct 29 11:48:07 somero __netdev_alloc_skb+0x1d/0x50
Oct 29 11:48:07 somero tg3_alloc_rx_skb+0x74/0x150
Oct 29 11:48:07 somero tg3_rx+0x35a/0x440
Oct 29 11:48:07 somero tg3_poll+0x77/0x1e0
Oct 29 11:48:07 somero net_rx_action+0x99/0x190
Oct 29 11:48:07 somero __do_softirq+0x80/0xf0
Oct 29 11:48:07 somero do_softirq+0x3c/0x40
Oct 29 11:48:07 somero irq_exit+0x45/0x50
Oct 29 11:48:07 somero do_IRQ+0x50/0xa0
...
Oct 29 11:48:07 somero Active:227052 inactive:48283 dirty:3423 writeback:0 
unstable:0 
Oct 29 11:48:07 somero  free:113487 slab:109249 mapped:177942 pagetables:645 
bounce:0 
...
Oct 29 11:48:07 somero Swap cache: add 428601, delete 422625, find 
84672794/84685901, race 0+7 
Oct 29 11:48:07 somero Free swap  = 1901644kB 
Oct 29 11:48:07 somero Total swap = 2008116kB 
Oct 29 11:48:07 somero Free swap:   1901644kB 
Oct 29 11:48:07 somero 521862 pages of RAM 
Oct 29 11:48:07 somero 292486 pages of HIGHMEM 


2.6.22.10: naantali kswapd0: page allocation failure. order:1, mode:0x4020
(uptime: one day, merely running a couple of TCP server programs)
Oct 26 11:27:01 naantali __alloc_skb+0x4d/0x110
Oct 26 11:27:01 naantali tcp_collapse+0x17e/0x3b0
Oct 26 11:27:01 naantali tcp_prune_queue+0x7f/0x1c0
Oct 26 11:27:01 naantali tcp_data_queue+0x487/0x720
Oct 26 11:27:01 naantali tcp_rcv_established+0x3a0/0x6e0
Oct 26 11:27:01 naantali tcp_v4_do_rcv+0xe9/0x100
Oct 26 11:27:01 naantali tcp_v4_rcv+0x7f1/0x8d0
Oct 26 11:27:01 naantali ip_local_deliver+0xef/0x250
Oct 26 11:27:01 naantali ip_rcv+0x264/0x560
Oct 26 11:27:01 naantali netif_receive_skb+0x2ad/0x320
Oct 26 11:27:01 naantali process_backlog+0x91/0x120
Oct 26 11:27:01 naantali net_rx_action+0x8d/0x170
Oct 26 11:27:01 naantali __do_softirq+0x78/0x100
Oct 26 11:27:01 naantali do_softirq+0x3c/0x40
Oct 26 11:27:01 naantali irq_exit+0x45/0x50
Oct 26 11:27:01 naantali do_IRQ+0x4f/0xa0
...
Oct 26 11:27:01 naantali Active:115770 inactive:257188 dirty:10243 
writeback:1920 unstable:0
Oct 26 11:27:01 naantali  free:41073 slab:97335 mapped:2540 pagetables:149 
bounce:0
...
Oct 26 11:27:01 naantali Swap cache: add 90, delete 88, find 18/28, race 0+0
Oct 26 11:27:01 naantali Free swap  = 2008048kB
Oct 26 11:27:01 naantali Total swap = 2008084kB
Oct 26 11:27:01 naantali Free swap:   2008048kB
Oct 26 11:27:01 naantali 524144 pages of RAM
Oct 26 11:27:01 naantali 294768 pages of HIGHMEM


2.6.21.6: koli ftxpd: page allocation failure. order:1, mode:0x20
Oct 12 14:56:44 koli __alloc_skb+0x4c/0x110 
Oct 12 14:56:44 koli __netdev_alloc_skb+0x1d/0x50
Oct 12 14:56:44 koli tg3_alloc_rx_skb+0x74/0x150
Oct 12 14:56:44 koli tg3_rx+0x35a/0x440
Oct 12 14:56:44 koli tg3_poll+0x77/0x1e0
Oct 12 14:56:44 koli

Re: [2.6 patch] arch/i386/kernel/smpboot.c:setup_trampoline() must be __cpuinit

2007-10-15 Thread Frank van Maarseveen

On Tue, Jul 10, 2007 at 02:05:52AM +0200, Adrian Bunk wrote:
> This patch fixes the following section mismatch reported by
> Frank van Maarseveen:
> 
> <--  snip  -->
> 
> ...
>   MODPOST vmlinux
> WARNING: arch/i386/kernel/built-in.o(.text+0xf201): Section mismatch: 
> reference to .init.data:trampoline_end (between 'setup_trampoline' and 
> 'cpu_coregroup_map')
> WARNING: arch/i386/kernel/built-in.o(.text+0xf207): Section mismatch: 
> reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> 'cpu_coregroup_map')
> WARNING: arch/i386/kernel/built-in.o(.text+0xf21a): Section mismatch: 
> reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> 'cpu_coregroup_map')
> ...
> 
> <--  snip  -->
> 
> Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]>
> 
> ---
> 
> @stable:
> Harmless but annoying warnings present when building an i386 SMP kernel 
> with CONFIG_HOTPLUG_CPU=n and gcc < 4.0 .
> 
> --- linux-2.6.22-rc6-mm1/arch/i386/kernel/smpboot.c.old   2007-07-10 
> 01:26:07.0 +0200
> +++ linux-2.6.22-rc6-mm1/arch/i386/kernel/smpboot.c   2007-07-10 
> 01:26:18.0 +0200
> @@ -117,7 +117,7 @@
>   * has made sure it's suitably aligned.
>   */
>  
> -static unsigned long __devinit setup_trampoline(void)
> +static unsigned long __cpuinit setup_trampoline(void)
>  {
>   memcpy(trampoline_base, trampoline_data, trampoline_end - 
> trampoline_data);
>   return virt_to_phys(trampoline_base);
> 

This one hasn't been merged yet.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] arch/i386/kernel/smpboot.c:setup_trampoline() must be __cpuinit

2007-10-15 Thread Frank van Maarseveen

On Tue, Jul 10, 2007 at 02:05:52AM +0200, Adrian Bunk wrote:
 This patch fixes the following section mismatch reported by
 Frank van Maarseveen:
 
 --  snip  --
 
 ...
   MODPOST vmlinux
 WARNING: arch/i386/kernel/built-in.o(.text+0xf201): Section mismatch: 
 reference to .init.data:trampoline_end (between 'setup_trampoline' and 
 'cpu_coregroup_map')
 WARNING: arch/i386/kernel/built-in.o(.text+0xf207): Section mismatch: 
 reference to .init.data:trampoline_data (between 'setup_trampoline' and 
 'cpu_coregroup_map')
 WARNING: arch/i386/kernel/built-in.o(.text+0xf21a): Section mismatch: 
 reference to .init.data:trampoline_data (between 'setup_trampoline' and 
 'cpu_coregroup_map')
 ...
 
 --  snip  --
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
 
 ---
 
 @stable:
 Harmless but annoying warnings present when building an i386 SMP kernel 
 with CONFIG_HOTPLUG_CPU=n and gcc  4.0 .
 
 --- linux-2.6.22-rc6-mm1/arch/i386/kernel/smpboot.c.old   2007-07-10 
 01:26:07.0 +0200
 +++ linux-2.6.22-rc6-mm1/arch/i386/kernel/smpboot.c   2007-07-10 
 01:26:18.0 +0200
 @@ -117,7 +117,7 @@
   * has made sure it's suitably aligned.
   */
  
 -static unsigned long __devinit setup_trampoline(void)
 +static unsigned long __cpuinit setup_trampoline(void)
  {
   memcpy(trampoline_base, trampoline_data, trampoline_end - 
 trampoline_data);
   return virt_to_phys(trampoline_base);
 

This one hasn't been merged yet.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22.6: still seeing section mismatch warnings

2007-09-13 Thread Frank van Maarseveen

On Thu, Sep 13, 2007 at 09:02:00AM +0200, Sam Ravnborg wrote:
> On Wed, Sep 12, 2007 at 06:00:17PM +0200, Frank van Maarseveen wrote:
> > I'm still seeing the warnings below (2.6.22 started off with lots of
> > section mismatch warning) but I have no idea if it is safe to ignore
> > these:
> > 
> > WARNING: arch/i386/kernel/built-in.o(.text+0xea62): Section mismatch: 
> > reference to .init.data:trampoline_end (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.text+0xea67): Section mismatch: 
> > reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.text+0xea79): Section mismatch: 
> > reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
> > reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
> > WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: 
> > reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 
> > 'mce_work')
> > WARNING: kernel/built-in.o(.text+0x1b415): Section mismatch: reference to 
> > .init.text: (between 'kthreadd' and 'init_waitqueue_head')
> > 
> > 
> > gcc version 3.4.6 (Debian 3.4.6-5)
> 
> Does not appear with an allmodconfig.
> Care to send me your .config then I may take a look.

See attached .config

> 
> The warnings are the kernel being much more anal with respect to use
> of __init sections and friends and you can for normal use ignore them.
> They usually find bugs in errorhandling which seldom trigger and
> the bugs has been there for ages in most cases - but it is onyl now
> we detect them.

thanks,

-- 
Frank
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22.6-y165
# Thu Sep 13 10:07:03 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CPUSETS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
CONFIG_M586TSC=y
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_M

Re: 2.6.22.6: still seeing section mismatch warnings

2007-09-13 Thread Frank van Maarseveen

On Thu, Sep 13, 2007 at 09:02:00AM +0200, Sam Ravnborg wrote:
 On Wed, Sep 12, 2007 at 06:00:17PM +0200, Frank van Maarseveen wrote:
  I'm still seeing the warnings below (2.6.22 started off with lots of
  section mismatch warning) but I have no idea if it is safe to ignore
  these:
  
  WARNING: arch/i386/kernel/built-in.o(.text+0xea62): Section mismatch: 
  reference to .init.data:trampoline_end (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.text+0xea67): Section mismatch: 
  reference to .init.data:trampoline_data (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.text+0xea79): Section mismatch: 
  reference to .init.data:trampoline_data (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
  reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
  WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: 
  reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 
  'mce_work')
  WARNING: kernel/built-in.o(.text+0x1b415): Section mismatch: reference to 
  .init.text: (between 'kthreadd' and 'init_waitqueue_head')
  
  
  gcc version 3.4.6 (Debian 3.4.6-5)
 
 Does not appear with an allmodconfig.
 Care to send me your .config then I may take a look.

See attached .config

 
 The warnings are the kernel being much more anal with respect to use
 of __init sections and friends and you can for normal use ignore them.
 They usually find bugs in errorhandling which seldom trigger and
 the bugs has been there for ages in most cases - but it is onyl now
 we detect them.

thanks,

-- 
Frank
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22.6-y165
# Thu Sep 13 10:07:03 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CPUSETS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=anticipatory

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
CONFIG_M586TSC=y
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set

2.6.22.6: still seeing section mismatch warnings

2007-09-12 Thread Frank van Maarseveen

I'm still seeing the warnings below (2.6.22 started off with lots of
section mismatch warning) but I have no idea if it is safe to ignore
these:

WARNING: arch/i386/kernel/built-in.o(.text+0xea62): Section mismatch: reference 
to .init.data:trampoline_end (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xea67): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xea79): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x1b415): Section mismatch: reference to 
.init.text: (between 'kthreadd' and 'init_waitqueue_head')


gcc version 3.4.6 (Debian 3.4.6-5)

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22.6: still seeing section mismatch warnings

2007-09-12 Thread Frank van Maarseveen

I'm still seeing the warnings below (2.6.22 started off with lots of
section mismatch warning) but I have no idea if it is safe to ignore
these:

WARNING: arch/i386/kernel/built-in.o(.text+0xea62): Section mismatch: reference 
to .init.data:trampoline_end (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xea67): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xea79): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x1b415): Section mismatch: reference to 
.init.text: (between 'kthreadd' and 'init_waitqueue_head')


gcc version 3.4.6 (Debian 3.4.6-5)

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 09:50:12AM -0400, Trond Myklebust wrote:
> On Fri, 2007-08-31 at 15:12 +0200, Frank van Maarseveen wrote:
> 
> > IMHO I'd only consider returning EBUSY when trying to mount _exactly_
> > the same directory with different flags, not for arbitrary subtrees. The
> > client should preferably not be bothered with server side disk
> > partitioning (at least not beyond the obvious such as df output).
> 
> That is utterly inconsistent and confusing too.
> 
> If you have a filesystem "/foo" exported on the server "remote", then
> why should
> 
> mount -oro remote:/foo
> mount -orw remote:/foo/a
> 
> be allowed, but
> 
> mount -oro remote:/foo
> mount -orw remote:/foo
> 
> be forbidden?

I'm not arguing to forbid the second case but confronting the sysadmin
there with nosharedcache is much less likely to harm existing setups than
the first case. Let's consider the most likely intention. The first case
is probably used as:

mount -oro remote:/foo  /foo
mount -orw remote:/foo/a/foo/a

and I don't see a real issue with that, sharedcache or not. Ditto with:

mount -oro remote:/foo/a/a
mount -orw remote:/foo/b/b

These are all typical use cases, without multiple views on the same
tree. But

mount -oro remote:/foo  /foo1
mount -orw remote:/foo  /foo2

is strange and much less likely.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 08:11:38AM -0400, Trond Myklebust wrote:
> On Fri, 2007-08-31 at 01:07 -0700, Linus Torvalds wrote:
> > 
> 
> > If you want new behaviour, you add a new flag saying you want new 
> > behaviour. You don't just start behaving differently from what you've 
> > always done before (and what *other* UNIXes do, for that matter).
> > 
> > Besides, even *if* it was a matter of somebody doing a mount with "rw", 
> > when the previous mount was "ro", returning EBUSY is still the wrong thing 
> > to do! If the user asks for a new mount that is read-write, he should just 
> > get it - ie we should not re-use the old client handles, and we should do 
> > what Solaris apparently does, namely to just make it a totally different 
> > mount.
> > 
> > In other words, it should (as I already mentioned once) have used 
> > "nosharecache" by default, which makes it all work.
> > 
> > Then, people who want to re-use the caches (which in turn may mean that 
> > everything needs to have the same flags), THOSE PEOPLE, who want the NEW 
> > SEMANTICS (errors and all) should then use a "sharecache" flag.
> 
> That would be a major change in existing semantics. The default has been
> "sharecache" ever since Al Viro introduced the "sget()" function some 6
> or 7 years ago. The problem was that we never advertised the fact that
> the kernel was overriding your mount options, and so sysadmins were
> (rightly IMO) complaining that they should _know_ when the client does
> this.
> 
> The list of known problems with a "nosharecache" default is nasty too:
> 
> - file and directory attribute and data caching breaks.
> Applications will see stale data in cases where they otherwise
> would not expect it.
> 
> - the existing dcache and icache issues when a file is renamed
> or deleted on the server are now extended to also include the
> case where the rename or deletion occurs on an alias in another
> directory on the client itself. In particular, sillyrename will
> break.
> 
> - file locking breaks (the server knows that the client holds
> locks on one file, whereas the client thinks it holds locks on
> several).
> 
> - the NFSv4 delegation model breaks: the client will be using
> OPEN when it could use cached opens. More importantly, when
> performing an operation that requires it to return the
> delegation on the aliased file, it won't know until the server
> sends it a callback.
> 
> ...and of course, the amount of unnecessary traffic to the server
> increases. I'm not aware of any sane way of dealing with those issues,
> and I doubt Solaris has a solution for them either.

All of this won't happen when server foo exports /bar and a client
mounts /bar/x and /bar/y separately: there must be a shared subtree or
hard-links between files within them, right?

An obvious (but disruptive) server side workaround is to export the
subtrees with different fsid= but that would give the same list of
problems as above, right?

IMHO I'd only consider returning EBUSY when trying to mount _exactly_
the same directory with different flags, not for arbitrary subtrees. The
client should preferably not be bothered with server side disk
partitioning (at least not beyond the obvious such as df output).

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Thu, Aug 30, 2007 at 02:07:43PM -0700, Hua Zhong wrote:
> I am re-sending this after help from Ian and git-bisect. To me it's a
> show-stopper: I cannot find an acceptable workaround that I can implement.
> 
> The problem: upgrading to 2.6.23-rc4 from 2.6.22 causes several autofs
> mounts to fail silently - they just not appear when they should.
> 
> I believe it's caused by the NFS change that forces multiple mounts from
> different directories under the same server side filesystem to have the same
> mount options by default, otherwise it returns EBUSY.
> 
> For example, if server has a filesystem /a, and it exports /a/x and /a/y
> (maybe with rw or ro), and a client must mount /a/x and /a/y with the same
> mount options now.
> 
> Since in my setup they are managed by autofs, and the autofs map is managed
> by nis, there is no way I could easily workaround it..
> 
> If we have to live with this regression, I want to hear some suggestions
> about how to fix them realistically. Thanks.
> 
> By the way, I am not sure if I did the bisect right, but FWIW, git-bisect
> says:
> 
> c98451bdb2f3e6d6cc1e03adad641e9497512b49 is first bad commit
> commit c98451bdb2f3e6d6cc1e03adad641e9497512b49
> Author: Frank van Maarseveen <[EMAIL PROTECTED]>
> Date:   Mon Jul 9 22:25:29 2007 +0200
> 
> NLM: fix source address of callback to client
> 
> Use the destination address of the original NLM request as the
> source address in callbacks to the client.
> 
> Signed-off-by: Frank van Maarseveen <[EMAIL PROTECTED]>
> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> 
> :04 04 675c84bd8b2c50744018becaa0db4aeca19b8f9f
> 105fbd3cb3fa5e3019836b4b5268125d0181a72d M  fs
> :04 04 0138796e0806b4ebd1cc3850ed4e8c7ab24d2d41
> 2fec08debe51c20423a88b1a0d4281c683ba5daf M  include

This does not have any relation with the mount problem, assuming commit
and comment do match.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 09:40:28AM +0200, Jakob Oestergaard wrote:
> On Thu, Aug 30, 2007 at 10:16:37PM -0700, Linus Torvalds wrote:
> > 
> ...
> > > Why aren't we doing that for any other filesystem than NFS?
> > 
> > How hard is it to acknowledge the following little word:
> > 
> > "regression"
> > 
> > It's simple. You broke things. You may want to fix them, but you need to 
> > fix them in a way that does not break user space.
> 
> Trond has a point Linus.
> 
> What he "broke" is, for example, a ro mount being mounted as rw.
> 
> That *could* be a very serious security (etc.etc.) problem which he just 
> fixed.
> Anything depending on read-only not being enforced will cease to work, of
> course, and that is what a few people complain about(!).
> 
> If ext3 in some rare case (which would still mean it hit a few thousand users)
> failed to remember that a file had been marked read-only and allowed writes to
> it, wouldn't we want to fix that too?  It would cause regressions, but we'd 
> fix
> it, right?
> 
> mount passes back the error code on a failed mount. autofs passes that error
> along too (when people configure syslog correctly). In short; when these
> serious mistakes are made and caught, the admin sees an error in his logs.

Hua explained already that seeing the error is not the same as fixing
the error: he cannot fix it because NFS implies other systems we _must_
co-operate with.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 09:50:12AM -0400, Trond Myklebust wrote:
 On Fri, 2007-08-31 at 15:12 +0200, Frank van Maarseveen wrote:
 
  IMHO I'd only consider returning EBUSY when trying to mount _exactly_
  the same directory with different flags, not for arbitrary subtrees. The
  client should preferably not be bothered with server side disk
  partitioning (at least not beyond the obvious such as df output).
 
 That is utterly inconsistent and confusing too.
 
 If you have a filesystem /foo exported on the server remote, then
 why should
 
 mount -oro remote:/foo
 mount -orw remote:/foo/a
 
 be allowed, but
 
 mount -oro remote:/foo
 mount -orw remote:/foo
 
 be forbidden?

I'm not arguing to forbid the second case but confronting the sysadmin
there with nosharedcache is much less likely to harm existing setups than
the first case. Let's consider the most likely intention. The first case
is probably used as:

mount -oro remote:/foo  path/foo
mount -orw remote:/foo/apath/foo/a

and I don't see a real issue with that, sharedcache or not. Ditto with:

mount -oro remote:/foo/apath/a
mount -orw remote:/foo/bpath/b

These are all typical use cases, without multiple views on the same
tree. But

mount -oro remote:/foo  /foo1
mount -orw remote:/foo  /foo2

is strange and much less likely.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 09:40:28AM +0200, Jakob Oestergaard wrote:
 On Thu, Aug 30, 2007 at 10:16:37PM -0700, Linus Torvalds wrote:
  
 ...
   Why aren't we doing that for any other filesystem than NFS?
  
  How hard is it to acknowledge the following little word:
  
  regression
  
  It's simple. You broke things. You may want to fix them, but you need to 
  fix them in a way that does not break user space.
 
 Trond has a point Linus.
 
 What he broke is, for example, a ro mount being mounted as rw.
 
 That *could* be a very serious security (etc.etc.) problem which he just 
 fixed.
 Anything depending on read-only not being enforced will cease to work, of
 course, and that is what a few people complain about(!).
 
 If ext3 in some rare case (which would still mean it hit a few thousand users)
 failed to remember that a file had been marked read-only and allowed writes to
 it, wouldn't we want to fix that too?  It would cause regressions, but we'd 
 fix
 it, right?
 
 mount passes back the error code on a failed mount. autofs passes that error
 along too (when people configure syslog correctly). In short; when these
 serious mistakes are made and caught, the admin sees an error in his logs.

Hua explained already that seeing the error is not the same as fixing
the error: he cannot fix it because NFS implies other systems we _must_
co-operate with.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Thu, Aug 30, 2007 at 02:07:43PM -0700, Hua Zhong wrote:
 I am re-sending this after help from Ian and git-bisect. To me it's a
 show-stopper: I cannot find an acceptable workaround that I can implement.
 
 The problem: upgrading to 2.6.23-rc4 from 2.6.22 causes several autofs
 mounts to fail silently - they just not appear when they should.
 
 I believe it's caused by the NFS change that forces multiple mounts from
 different directories under the same server side filesystem to have the same
 mount options by default, otherwise it returns EBUSY.
 
 For example, if server has a filesystem /a, and it exports /a/x and /a/y
 (maybe with rw or ro), and a client must mount /a/x and /a/y with the same
 mount options now.
 
 Since in my setup they are managed by autofs, and the autofs map is managed
 by nis, there is no way I could easily workaround it..
 
 If we have to live with this regression, I want to hear some suggestions
 about how to fix them realistically. Thanks.
 
 By the way, I am not sure if I did the bisect right, but FWIW, git-bisect
 says:
 
 c98451bdb2f3e6d6cc1e03adad641e9497512b49 is first bad commit
 commit c98451bdb2f3e6d6cc1e03adad641e9497512b49
 Author: Frank van Maarseveen [EMAIL PROTECTED]
 Date:   Mon Jul 9 22:25:29 2007 +0200
 
 NLM: fix source address of callback to client
 
 Use the destination address of the original NLM request as the
 source address in callbacks to the client.
 
 Signed-off-by: Frank van Maarseveen [EMAIL PROTECTED]
 Signed-off-by: Trond Myklebust [EMAIL PROTECTED]
 
 :04 04 675c84bd8b2c50744018becaa0db4aeca19b8f9f
 105fbd3cb3fa5e3019836b4b5268125d0181a72d M  fs
 :04 04 0138796e0806b4ebd1cc3850ed4e8c7ab24d2d41
 2fec08debe51c20423a88b1a0d4281c683ba5daf M  include

This does not have any relation with the mount problem, assuming commit
and comment do match.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: recent nfs change causes autofs regression

2007-08-31 Thread Frank van Maarseveen

On Fri, Aug 31, 2007 at 08:11:38AM -0400, Trond Myklebust wrote:
 On Fri, 2007-08-31 at 01:07 -0700, Linus Torvalds wrote:
  
 
  If you want new behaviour, you add a new flag saying you want new 
  behaviour. You don't just start behaving differently from what you've 
  always done before (and what *other* UNIXes do, for that matter).
  
  Besides, even *if* it was a matter of somebody doing a mount with rw, 
  when the previous mount was ro, returning EBUSY is still the wrong thing 
  to do! If the user asks for a new mount that is read-write, he should just 
  get it - ie we should not re-use the old client handles, and we should do 
  what Solaris apparently does, namely to just make it a totally different 
  mount.
  
  In other words, it should (as I already mentioned once) have used 
  nosharecache by default, which makes it all work.
  
  Then, people who want to re-use the caches (which in turn may mean that 
  everything needs to have the same flags), THOSE PEOPLE, who want the NEW 
  SEMANTICS (errors and all) should then use a sharecache flag.
 
 That would be a major change in existing semantics. The default has been
 sharecache ever since Al Viro introduced the sget() function some 6
 or 7 years ago. The problem was that we never advertised the fact that
 the kernel was overriding your mount options, and so sysadmins were
 (rightly IMO) complaining that they should _know_ when the client does
 this.
 
 The list of known problems with a nosharecache default is nasty too:
 
 - file and directory attribute and data caching breaks.
 Applications will see stale data in cases where they otherwise
 would not expect it.
 
 - the existing dcache and icache issues when a file is renamed
 or deleted on the server are now extended to also include the
 case where the rename or deletion occurs on an alias in another
 directory on the client itself. In particular, sillyrename will
 break.
 
 - file locking breaks (the server knows that the client holds
 locks on one file, whereas the client thinks it holds locks on
 several).
 
 - the NFSv4 delegation model breaks: the client will be using
 OPEN when it could use cached opens. More importantly, when
 performing an operation that requires it to return the
 delegation on the aliased file, it won't know until the server
 sends it a callback.
 
 ...and of course, the amount of unnecessary traffic to the server
 increases. I'm not aware of any sane way of dealing with those issues,
 and I doubt Solaris has a solution for them either.

All of this won't happen when server foo exports /bar and a client
mounts /bar/x and /bar/y separately: there must be a shared subtree or
hard-links between files within them, right?

An obvious (but disruptive) server side workaround is to export the
subtrees with different fsid= but that would give the same list of
problems as above, right?

IMHO I'd only consider returning EBUSY when trying to mount _exactly_
the same directory with different flags, not for arbitrary subtrees. The
client should preferably not be bothered with server side disk
partitioning (at least not beyond the obvious such as df output).

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: section mismatch warnings

2007-07-10 Thread Frank van Maarseveen

On Tue, Jul 10, 2007 at 02:14:14AM +0200, Adrian Bunk wrote:
> On Mon, Jul 09, 2007 at 08:42:01PM +0200, Frank van Maarseveen wrote:
> >...
> > WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
> > .init.text: (between 'kthreadd' and 'init_waitqueue_head')
> 
> Below is the fix in -mm for this (and other issues).

thanks,

Note that these still remain:

WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: section mismatch warnings

2007-07-10 Thread Frank van Maarseveen

On Tue, Jul 10, 2007 at 02:14:14AM +0200, Adrian Bunk wrote:
 On Mon, Jul 09, 2007 at 08:42:01PM +0200, Frank van Maarseveen wrote:
 ...
  WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
  .init.text: (between 'kthreadd' and 'init_waitqueue_head')
 
 Below is the fix in -mm for this (and other issues).

thanks,

Note that these still remain:

WARNING: arch/i386/kernel/built-in.o(.exit.text+0x26): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xee0): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: section mismatch warnings

2007-07-09 Thread Frank van Maarseveen

On Mon, Jul 09, 2007 at 09:45:40PM +0200, Adrian Bunk wrote:
> On Mon, Jul 09, 2007 at 08:42:01PM +0200, Frank van Maarseveen wrote:
> > WARNING: arch/i386/kernel/built-in.o(.text+0xf5c1): Section mismatch: 
> > reference to .init.data:trampoline_end (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.text+0xf5c7): Section mismatch: 
> > reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.text+0xf5da): Section mismatch: 
> > reference to .init.data:trampoline_data (between 'setup_trampoline' and 
> > 'cpu_coregroup_map')
> > WARNING: arch/i386/kernel/built-in.o(.exit.text+0x1c): Section mismatch: 
> > reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
> > WARNING: arch/i386/kernel/built-in.o(.data+0xe80): Section mismatch: 
> > reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 
> > 'mce_work')
> > WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
> > .init.text: (between 'kthreadd' and 'init_waitqueue_head')
> 
> I think I see the problem, but please send your .config so that I can 
> verify it.

attached,

-- 
Frank
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-x163
# Mon Jul  9 21:05:42 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CPUSETS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U3

2.6.22: section mismatch warnings

2007-07-09 Thread Frank van Maarseveen

WARNING: arch/i386/kernel/built-in.o(.text+0xf5c1): Section mismatch: reference 
to .init.data:trampoline_end (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xf5c7): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xf5da): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x1c): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xe80): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
.init.text: (between 'kthreadd' and 'init_waitqueue_head')

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.22: section mismatch warnings

2007-07-09 Thread Frank van Maarseveen

WARNING: arch/i386/kernel/built-in.o(.text+0xf5c1): Section mismatch: reference 
to .init.data:trampoline_end (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xf5c7): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.text+0xf5da): Section mismatch: reference 
to .init.data:trampoline_data (between 'setup_trampoline' and 
'cpu_coregroup_map')
WARNING: arch/i386/kernel/built-in.o(.exit.text+0x1c): Section mismatch: 
reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
WARNING: arch/i386/kernel/built-in.o(.data+0xe80): Section mismatch: reference 
to .init.text: (between 'thermal_throttle_cpu_notifier' and 'mce_work')
WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
.init.text: (between 'kthreadd' and 'init_waitqueue_head')

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.22: section mismatch warnings

2007-07-09 Thread Frank van Maarseveen

On Mon, Jul 09, 2007 at 09:45:40PM +0200, Adrian Bunk wrote:
 On Mon, Jul 09, 2007 at 08:42:01PM +0200, Frank van Maarseveen wrote:
  WARNING: arch/i386/kernel/built-in.o(.text+0xf5c1): Section mismatch: 
  reference to .init.data:trampoline_end (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.text+0xf5c7): Section mismatch: 
  reference to .init.data:trampoline_data (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.text+0xf5da): Section mismatch: 
  reference to .init.data:trampoline_data (between 'setup_trampoline' and 
  'cpu_coregroup_map')
  WARNING: arch/i386/kernel/built-in.o(.exit.text+0x1c): Section mismatch: 
  reference to .init.text: (between 'cache_remove_dev' and 'ffh_cstate_exit')
  WARNING: arch/i386/kernel/built-in.o(.data+0xe80): Section mismatch: 
  reference to .init.text: (between 'thermal_throttle_cpu_notifier' and 
  'mce_work')
  WARNING: kernel/built-in.o(.text+0x1add5): Section mismatch: reference to 
  .init.text: (between 'kthreadd' and 'init_waitqueue_head')
 
 I think I see the problem, but please send your .config so that I can 
 verify it.

attached,

-- 
Frank
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.22-x163
# Mon Jul  9 21:05:42 2007
#
CONFIG_X86_32=y
CONFIG_GENERIC_TIME=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_QUICKLIST=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_DMI=y
CONFIG_DEFCONFIG_LIST=/lib/modules/$UNAME_RELEASE/.config

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

#
# General setup
#
CONFIG_LOCALVERSION=
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
# CONFIG_IPC_NS is not set
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_UTS_NS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_CPUSETS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

#
# Block layer
#
CONFIG_BLOCK=y
CONFIG_LBD=y
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_LSF=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED=anticipatory

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_SMP=y
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_PARAVIRT is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MCORE2 is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
CONFIG_MK8=y
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_XADD=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y

apic related BUG: soft lockup detected on CPU#0!

2007-06-04 Thread Frank van Maarseveen

FYI,

2.6.21.1, tainted with ATI fglrx driver (so maybe take it with a grain
of salt):

When I attempted to kill -9 an unresponsive looping X server (desktop
processes were gone at that time) the system locked up and reported
the following:

BUG: soft lockup detected on CPU#0!
 [] show_trace_log_lvl+0x19/0x30
 [] show_trace+0x12/0x20
 [] dump_stack+0x14/0x20
 [] softlockup_tick+0xa2/0xc0
 [] run_local_timers+0x12/0x20
 [] update_process_times+0x5d/0x90
 [] tick_sched_timer+0x4f/0xb0
 [] hrtimer_interrupt+0x177/0x1e0
 [] local_apic_timer_interrupt+0x57/0x60
 [] smp_apic_timer_interrupt+0x23/0x40
 [] apic_timer_interrupt+0x28/0x30
 ===

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

apic related BUG: soft lockup detected on CPU#0!

2007-06-04 Thread Frank van Maarseveen

FYI,

2.6.21.1, tainted with ATI fglrx driver (so maybe take it with a grain
of salt):

When I attempted to kill -9 an unresponsive looping X server (desktop
processes were gone at that time) the system locked up and reported
the following:

BUG: soft lockup detected on CPU#0!
 [c01054a9] show_trace_log_lvl+0x19/0x30
 [c01054d2] show_trace+0x12/0x20
 [c01055d4] dump_stack+0x14/0x20
 [c014a7a2] softlockup_tick+0xa2/0xc0
 [c012bf02] run_local_timers+0x12/0x20
 [c012bd1d] update_process_times+0x5d/0x90
 [c013d91f] tick_sched_timer+0x4f/0xb0
 [c01393f7] hrtimer_interrupt+0x177/0x1e0
 [c010fb07] local_apic_timer_interrupt+0x57/0x60
 [c010fb33] smp_apic_timer_interrupt+0x23/0x40
 [c0105158] apic_timer_interrupt+0x28/0x30
 ===

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: JBD: ext2online wants too many credits (744 > 256)

2007-05-07 Thread Frank van Maarseveen

On Sun, May 06, 2007 at 09:40:14PM -0700, Andrew Morton wrote:
> On Mon, 7 May 2007 00:26:26 +0200 Frank van Maarseveen <[EMAIL PROTECTED]> 
> wrote:
> 
> > Steps to reproduce:
> > Create a 3G partition, say /dev/vol1/project
> > mke2fs -j -b 4096 /dev/vol1/project 22812
> > mount it
> > ext2online /dev/vol1/project said:
> > 
> > | ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
> > | ext2online: ext2_ioctl: No space left on device
> > | 
> > | ext2online: unable to resize /dev/mapper/vol1-project
> > 
> > kernel said:
> > 
> > | JBD: ext2online wants too many credits (721 > 256)

There's a threshold for the problem depending on the initial
size. This one fails:

mke2fs -j -b 4096 /dev/<3GB-blockdev> 32768
(mount + ext2online or resize2fs)
kernel: JBD: resize2fs wants too many credits (1034 > 1024)

Add one block to the initial mke2fs (32768+1 == 32769) and the
problem is gone.

Without the -b 4096 there's another resize problem

mke2fs -j /dev/loop1 2048
mount /dev/loop1 /1
resize2fs /dev/loop1

says:
resize2fs 1.40-WIP (14-Nov-2006)
Filesystem at /dev/loop1 is mounted on /1; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 12
Performing an on-line resize of /dev/loop1 to 3072000 (1k) blocks.
resize2fs: Invalid argument While trying to add group #256

and the kernel says:

May  7 15:36:08 lokka EXT3-fs warning (device loop1): verify_reserved_gdb: 
May  7 15:36:08 lokka reserved GDT 10 missing grp 1 (8202)
May  7 15:36:08 lokka  

After that, the filesystem has been resized to 2GB. I recall a 2G
(?) limit for ext3 resizing with 1k blocksize but trying the above with
4096 1k blocks seems to work. fsck says it's ok all the time.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: JBD: ext2online wants too many credits (744 256)

2007-05-07 Thread Frank van Maarseveen

On Sun, May 06, 2007 at 09:40:14PM -0700, Andrew Morton wrote:
 On Mon, 7 May 2007 00:26:26 +0200 Frank van Maarseveen [EMAIL PROTECTED] 
 wrote:
 
  Steps to reproduce:
  Create a 3G partition, say /dev/vol1/project
  mke2fs -j -b 4096 /dev/vol1/project 22812
  mount it
  ext2online /dev/vol1/project said:
  
  | ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
  | ext2online: ext2_ioctl: No space left on device
  | 
  | ext2online: unable to resize /dev/mapper/vol1-project
  
  kernel said:
  
  | JBD: ext2online wants too many credits (721  256)

There's a threshold for the problem depending on the initial
size. This one fails:

mke2fs -j -b 4096 /dev/3GB-blockdev 32768
(mount + ext2online or resize2fs)
kernel: JBD: resize2fs wants too many credits (1034  1024)

Add one block to the initial mke2fs (32768+1 == 32769) and the
problem is gone.

Without the -b 4096 there's another resize problem

mke2fs -j /dev/loop1 2048
mount /dev/loop1 /1
resize2fs /dev/loop1

says:
resize2fs 1.40-WIP (14-Nov-2006)
Filesystem at /dev/loop1 is mounted on /1; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 12
Performing an on-line resize of /dev/loop1 to 3072000 (1k) blocks.
resize2fs: Invalid argument While trying to add group #256

and the kernel says:

May  7 15:36:08 lokka EXT3-fs warning (device loop1): verify_reserved_gdb: 
May  7 15:36:08 lokka reserved GDT 10 missing grp 1 (8202)
May  7 15:36:08 lokka  

After that, the filesystem has been resized to 2GB. I recall a 2G
(?) limit for ext3 resizing with 1k blocksize but trying the above with
4096 1k blocks seems to work. fsck says it's ok all the time.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

JBD: ext2online wants too many credits (744 > 256)

2007-05-06 Thread Frank van Maarseveen

2.6.20.6, FC4:

I created a 91248k ext3 fs with 4k blocksize:

| mke2fs -j -b 4096 /dev/vol1/project 
| mke2fs 1.38 (30-Jun-2005)
| Filesystem label=
| OS type: Linux
| Block size=4096 (log=2)
| Fragment size=4096 (log=2)
| 23552 inodes, 23552 blocks
| 1177 blocks (5.00%) reserved for the super user
| First data block=0
| Maximum filesystem blocks=25165824
| 1 block group
| 32768 blocks per group, 32768 fragments per group
| 23552 inodes per group

Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

Next, I tried to resize it to about 3G using ext2online while mounted:

| # ext2online /dev/vol1/project 
| ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
| ext2online: ext2_ioctl: No space left on device
|
| ext2online: unable to resize /dev/mapper/vol1-project

At that time the kernel said:

|JBD: ext2online wants too many credits (744 > 256)

What is the limitation I should be aware of? Has it something to do with
the journal log size?

The size actually did increase a bit, to 128112k.


Steps to reproduce:
Create a 3G partition, say /dev/vol1/project
mke2fs -j -b 4096 /dev/vol1/project 22812
mount it
ext2online /dev/vol1/project said:

| ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
| ext2online: ext2_ioctl: No space left on device
| 
| ext2online: unable to resize /dev/mapper/vol1-project

kernel said:

| JBD: ext2online wants too many credits (721 > 256)

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: unprivileged mount problems: device permissions ignored, mount sharing

2007-05-06 Thread Frank van Maarseveen

On Wed, May 02, 2007 at 09:56:52AM +0200, Miklos Szeredi wrote:
> > I tried the unprivileged mount v5 patches with 2.6.21.1. I made some
> > experiments with normal filesystems (ext3, xfs, iso9660). I removed the
> > FS_SAFE checks for that.
> 
> Thanks for looking at this.
> 
> > Mounting and umounting as unprivileged user (user1) works, e.g.
> > (/mnt/user1 is a mount owned by user1)
> > 
> > [EMAIL PROTECTED] ~]$ mmount -t xfs /dev/mapper/vg00-test /mnt/user1
> > 
> > But the device permissions are ignored. The unprivileged user can mount
> > the block device even there are no permissions to access it:
> > 
> > brw--- 1 root root 253, 5 Apr 29 18:32 /dev/mapper/vg00-test
> 
> Yes, I'm aware of this.  Before we enable FS_SAFE for block
> filesystems, this must be addressed.
> 
> But I'm not sure _if_ we'll ever want this.  It is very likely that
> there are some other security holes in most filesystems that are
> difficult to address.  One example is checking for hard-linked
> directories, which is normally only done during an fsck.

Some filesystems are not hardened against mounting a corrupted image and
can Oops. This is also a problem for automounting USB sticks except that
in that case you're (probably) sitting next to the hardware so there
are other ways to do bad things.

>From a security point of view I'd like to restrict unprivileged mounts
to a configurable list of filesystem types.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: unprivileged mount problems: device permissions ignored, mount sharing

2007-05-06 Thread Frank van Maarseveen

On Wed, May 02, 2007 at 09:56:52AM +0200, Miklos Szeredi wrote:
  I tried the unprivileged mount v5 patches with 2.6.21.1. I made some
  experiments with normal filesystems (ext3, xfs, iso9660). I removed the
  FS_SAFE checks for that.
 
 Thanks for looking at this.
 
  Mounting and umounting as unprivileged user (user1) works, e.g.
  (/mnt/user1 is a mount owned by user1)
  
  [EMAIL PROTECTED] ~]$ mmount -t xfs /dev/mapper/vg00-test /mnt/user1
  
  But the device permissions are ignored. The unprivileged user can mount
  the block device even there are no permissions to access it:
  
  brw--- 1 root root 253, 5 Apr 29 18:32 /dev/mapper/vg00-test
 
 Yes, I'm aware of this.  Before we enable FS_SAFE for block
 filesystems, this must be addressed.
 
 But I'm not sure _if_ we'll ever want this.  It is very likely that
 there are some other security holes in most filesystems that are
 difficult to address.  One example is checking for hard-linked
 directories, which is normally only done during an fsck.

Some filesystems are not hardened against mounting a corrupted image and
can Oops. This is also a problem for automounting USB sticks except that
in that case you're (probably) sitting next to the hardware so there
are other ways to do bad things.

From a security point of view I'd like to restrict unprivileged mounts
to a configurable list of filesystem types.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

JBD: ext2online wants too many credits (744 256)

2007-05-06 Thread Frank van Maarseveen

2.6.20.6, FC4:

I created a 91248k ext3 fs with 4k blocksize:

| mke2fs -j -b 4096 /dev/vol1/project 
| mke2fs 1.38 (30-Jun-2005)
| Filesystem label=
| OS type: Linux
| Block size=4096 (log=2)
| Fragment size=4096 (log=2)
| 23552 inodes, 23552 blocks
| 1177 blocks (5.00%) reserved for the super user
| First data block=0
| Maximum filesystem blocks=25165824
| 1 block group
| 32768 blocks per group, 32768 fragments per group
| 23552 inodes per group

Writing inode tables: done
Creating journal (1024 blocks): done
Writing superblocks and filesystem accounting information: done

Next, I tried to resize it to about 3G using ext2online while mounted:

| # ext2online /dev/vol1/project 
| ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
| ext2online: ext2_ioctl: No space left on device
|
| ext2online: unable to resize /dev/mapper/vol1-project

At that time the kernel said:

|JBD: ext2online wants too many credits (744  256)

What is the limitation I should be aware of? Has it something to do with
the journal log size?

The size actually did increase a bit, to 128112k.


Steps to reproduce:
Create a 3G partition, say /dev/vol1/project
mke2fs -j -b 4096 /dev/vol1/project 22812
mount it
ext2online /dev/vol1/project said:

| ext2online v1.1.18 - 2001/03/18 for EXT2FS 0.5b
| ext2online: ext2_ioctl: No space left on device
| 
| ext2online: unable to resize /dev/mapper/vol1-project

kernel said:

| JBD: ext2online wants too many credits (721  256)

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20.6 kernel BUG at mm/slab.c:2877

2007-05-04 Thread Frank van Maarseveen

2.6.20.6, FYI,

This suddenly cropped up after starting to use the i915 and drm module
but maybe it is unrelated to that:

kernel BUG at mm/slab.c:2877!
invalid opcode:  [#1]
SMP 
Modules linked in: i915 drm
CPU:0
EIP:0060:[]Not tainted VLI
EFLAGS: 00010083   (2.6.20.6-x153 #1)
EIP is at cache_free_debugcheck+0x1db/0x1f0
eax: 0047f480   ebx: 6820   ecx: 00130c00   edx: 0047f480
esi: f7a3bbc0   edi: cffe2bc0   ebp: c1ecde3c   esp: c1ecde14
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 216, ti=c1ecc000 task=c1ec9510 task.ti=c1ecc000)
Stack: c1ec9510  0002 0246 170fc2a5 cffe2000 c01b56e3 f7a3cdf4 
   f7a3bbc0 cffe2bc4 c1ecde54 c0166ef5 0246 cffe2c74 cffe2c74 c1ecde94 
   c1ecde5c c01b56e3 c1ecde68 c017c1d3 cffe2c7c c1ecde80 c017c570 0060 
Call Trace:
 [] show_trace_log_lvl+0x19/0x30
 [] show_stack_log_lvl+0x8b/0xb0
 [] show_registers+0x1b5/0x2d0
 [] die+0x10f/0x240
 [] do_trap+0x72/0xb0
 [] do_invalid_op+0xa3/0xb0
 [] error_code+0x7c/0x90
 [] kmem_cache_free+0x55/0xa0
 [] ext3_destroy_inode+0x13/0x20
 [] destroy_inode+0x23/0x40
 [] dispose_list+0x70/0xf0
 [] prune_icache+0xdc/0x200
 [] shrink_icache_memory+0x15/0x40
 [] shrink_slab+0x129/0x1d0
 [] balance_pgdat+0x229/0x370
 [] kswapd+0xbb/0x110
 [] kthread+0x8a/0xc0
 [] kernel_thread_helper+0x7/0x10
 ===
Code: 8e 98 00 00 00 e9 a5 fe ff ff 89 f0 e8 cf e2 ff ff b9 05 00 00 00 01 f8 
89 f2 ff 96 bc 00 00 00 8b 8e 98 00 00 00 e9 cb fe ff
EIP: [] cache_free_debugcheck+0x1db/0x1f0 SS:ESP 0068:c1ecde14


-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata /dev/scd0 problem: mount after burn fails without eject

2007-05-04 Thread Frank van Maarseveen

On Fri, May 04, 2007 at 10:41:41AM +0200, Tejun Heo wrote:
> Frank van Maarseveen wrote:
> > On Fri, May 04, 2007 at 10:16:32AM +0200, Tejun Heo wrote:
> >> Michal Piotrowski wrote:
> >>> On 01/05/07, Mark Lord <[EMAIL PROTECTED]> wrote:
> >>>> Forwarding to linux-scsi and linux-ide mailing lists.
> >>>>
> >>>> Frank van Maarseveen wrote:
> >>>>> Tested on 2.6.20.6 and 2.6.21.1
> >>>>>
> >>>>> I decided to swich from the old IDE drivers to libata and now there
> >>>>> seems to be a little but annoying problem: cannot mount an ISO image
> >>>>> after burning it.
> >>>>>
> >>>>> May  1 14:32:55 kernel: attempt to access beyond end of device
> >>>>> May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
> >>>>> May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0,
> >>>> iso_blknum=16, block=16
> >>>>> an "eject" command seems to fix the state of the PATA DVD writer
> >>>>> or driver. The problem occurs for burning a CD and for DVD too with
> >>>>> identical error messages.
> >> Right after burning, if you run 'fuser -v /dev/sr0', what does it say?
> > 
> > Tried the fuser as root to be sure but it didn't show anything.
> 
> I guess sr is forgetting to set media changed flag somewhere.

Plausible. I get the same kernel messages when I try to mount the CD
before burning.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata /dev/scd0 problem: mount after burn fails without eject

2007-05-04 Thread Frank van Maarseveen

On Fri, May 04, 2007 at 10:16:32AM +0200, Tejun Heo wrote:
> Michal Piotrowski wrote:
> > On 01/05/07, Mark Lord <[EMAIL PROTECTED]> wrote:
> >> Forwarding to linux-scsi and linux-ide mailing lists.
> >>
> >> Frank van Maarseveen wrote:
> >> > Tested on 2.6.20.6 and 2.6.21.1
> >> >
> >> > I decided to swich from the old IDE drivers to libata and now there
> >> > seems to be a little but annoying problem: cannot mount an ISO image
> >> > after burning it.
> >> >
> >> > May  1 14:32:55 kernel: attempt to access beyond end of device
> >> > May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
> >> > May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0,
> >> iso_blknum=16, block=16
> >> >
> >> > an "eject" command seems to fix the state of the PATA DVD writer
> >> > or driver. The problem occurs for burning a CD and for DVD too with
> >> > identical error messages.
> 
> Right after burning, if you run 'fuser -v /dev/sr0', what does it say?

Tried the fuser as root to be sure but it didn't show anything.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata /dev/scd0 problem: mount after burn fails without eject

2007-05-04 Thread Frank van Maarseveen

On Fri, May 04, 2007 at 10:16:32AM +0200, Tejun Heo wrote:
 Michal Piotrowski wrote:
  On 01/05/07, Mark Lord [EMAIL PROTECTED] wrote:
  Forwarding to linux-scsi and linux-ide mailing lists.
 
  Frank van Maarseveen wrote:
   Tested on 2.6.20.6 and 2.6.21.1
  
   I decided to swich from the old IDE drivers to libata and now there
   seems to be a little but annoying problem: cannot mount an ISO image
   after burning it.
  
   May  1 14:32:55 kernel: attempt to access beyond end of device
   May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
   May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0,
  iso_blknum=16, block=16
  
   an eject command seems to fix the state of the PATA DVD writer
   or driver. The problem occurs for burning a CD and for DVD too with
   identical error messages.
 
 Right after burning, if you run 'fuser -v /dev/sr0', what does it say?

Tried the fuser as root to be sure but it didn't show anything.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata /dev/scd0 problem: mount after burn fails without eject

2007-05-04 Thread Frank van Maarseveen

On Fri, May 04, 2007 at 10:41:41AM +0200, Tejun Heo wrote:
 Frank van Maarseveen wrote:
  On Fri, May 04, 2007 at 10:16:32AM +0200, Tejun Heo wrote:
  Michal Piotrowski wrote:
  On 01/05/07, Mark Lord [EMAIL PROTECTED] wrote:
  Forwarding to linux-scsi and linux-ide mailing lists.
 
  Frank van Maarseveen wrote:
  Tested on 2.6.20.6 and 2.6.21.1
 
  I decided to swich from the old IDE drivers to libata and now there
  seems to be a little but annoying problem: cannot mount an ISO image
  after burning it.
 
  May  1 14:32:55 kernel: attempt to access beyond end of device
  May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
  May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0,
  iso_blknum=16, block=16
  an eject command seems to fix the state of the PATA DVD writer
  or driver. The problem occurs for burning a CD and for DVD too with
  identical error messages.
  Right after burning, if you run 'fuser -v /dev/sr0', what does it say?
  
  Tried the fuser as root to be sure but it didn't show anything.
 
 I guess sr is forgetting to set media changed flag somewhere.

Plausible. I get the same kernel messages when I try to mount the CD
before burning.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20.6 kernel BUG at mm/slab.c:2877

2007-05-04 Thread Frank van Maarseveen

2.6.20.6, FYI,

This suddenly cropped up after starting to use the i915 and drm module
but maybe it is unrelated to that:

kernel BUG at mm/slab.c:2877!
invalid opcode:  [#1]
SMP 
Modules linked in: i915 drm
CPU:0
EIP:0060:[c016640b]Not tainted VLI
EFLAGS: 00010083   (2.6.20.6-x153 #1)
EIP is at cache_free_debugcheck+0x1db/0x1f0
eax: 0047f480   ebx: 6820   ecx: 00130c00   edx: 0047f480
esi: f7a3bbc0   edi: cffe2bc0   ebp: c1ecde3c   esp: c1ecde14
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 216, ti=c1ecc000 task=c1ec9510 task.ti=c1ecc000)
Stack: c1ec9510  0002 0246 170fc2a5 cffe2000 c01b56e3 f7a3cdf4 
   f7a3bbc0 cffe2bc4 c1ecde54 c0166ef5 0246 cffe2c74 cffe2c74 c1ecde94 
   c1ecde5c c01b56e3 c1ecde68 c017c1d3 cffe2c7c c1ecde80 c017c570 0060 
Call Trace:
 [c01044a9] show_trace_log_lvl+0x19/0x30
 [c010456b] show_stack_log_lvl+0x8b/0xb0
 [c0104795] show_registers+0x1b5/0x2d0
 [c0104a0f] die+0x10f/0x240
 [c0104bb2] do_trap+0x72/0xb0
 [c0104ec3] do_invalid_op+0xa3/0xb0
 [c050a25c] error_code+0x7c/0x90
 [c0166ef5] kmem_cache_free+0x55/0xa0
 [c01b56e3] ext3_destroy_inode+0x13/0x20
 [c017c1d3] destroy_inode+0x23/0x40
 [c017c570] dispose_list+0x70/0xf0
 [c017c83c] prune_icache+0xdc/0x200
 [c017c975] shrink_icache_memory+0x15/0x40
 [c0152229] shrink_slab+0x129/0x1d0
 [c01536a9] balance_pgdat+0x229/0x370
 [c01538ab] kswapd+0xbb/0x110
 [c013543a] kthread+0x8a/0xc0
 [c0104317] kernel_thread_helper+0x7/0x10
 ===
Code: 8e 98 00 00 00 e9 a5 fe ff ff 89 f0 e8 cf e2 ff ff b9 05 00 00 00 01 f8 
89 f2 ff 96 bc 00 00 00 8b 8e 98 00 00 00 e9 cb fe ff
EIP: [c016640b] cache_free_debugcheck+0x1db/0x1f0 SS:ESP 0068:c1ecde14


-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

libata /dev/scd0 problem: mount after burn fails without eject

2007-05-01 Thread Frank van Maarseveen

Tested on 2.6.20.6 and 2.6.21.1

I decided to swich from the old IDE drivers to libata and now there
seems to be a little but annoying problem: cannot mount an ISO image
after burning it.

May  1 14:32:55 kernel: attempt to access beyond end of device
May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0, iso_blknum=16, 
block=16

an "eject" command seems to fix the state of the PATA DVD writer
or driver. The problem occurs for burning a CD and for DVD too with
identical error messages.


relevant kernel boot messages:
| ata1: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ffa0 irq 14
| ata2: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001ffa8 irq 15
| scsi0 : ata_piix
| ata1.00: ATA-7: ST3120814A, 3.AAJ, max UDMA/100
| ata1.00: 234441648 sectors, multi 8: LBA48 
| ata1.00: configured for UDMA/100
| scsi1 : ata_piix
| ata2.00: ATAPI, max UDMA/33
| ata2.01: ATAPI, max UDMA/33
| ata2.00: configured for UDMA/33
| ata2.01: configured for UDMA/33
| scsi 0:0:0:0: Direct-Access ATA  ST3120814A   3.AA PQ: 0 ANSI: 5
| SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
| sda: Write Protect is off
| SCSI device sda: write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
| SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
| sda: Write Protect is off
| SCSI device sda: write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
|  sda: sda1 sda2 sda4
| sd 0:0:0:0: Attached scsi disk sda
| sd 0:0:0:0: Attached scsi generic sg0 type 0
| scsi 1:0:0:0: CD-ROMHP   DVD Writer 640c  CS30 PQ: 0 ANSI: 5
| sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
| Uniform CD-ROM driver Revision: 3.20
| sr 1:0:0:0: Attached scsi generic sg1 type 5
| scsi 1:0:1:0: CD-ROMSAMSUNG  CD-ROM SC-148C   B105 PQ: 0 ANSI: 5
| sr1: scsi3-mmc drive: 1x/48x cd/rw xa/form2 cdda tray
| sr 1:0:1:0: Attached scsi generic sg2 type 5

stripped config (well, as far of I'm sure it shouldn't matter):
| CONFIG_PM=y
| CONFIG_PM_LEGACY=y
| 
| CONFIG_ACPI=y
| CONFIG_ACPI_PROCFS=y
| CONFIG_ACPI_FAN=y
| CONFIG_ACPI_PROCESSOR=y
| CONFIG_ACPI_THERMAL=y
| CONFIG_ACPI_BLACKLIST_YEAR=0
| CONFIG_ACPI_EC=y
| CONFIG_ACPI_POWER=y
| CONFIG_ACPI_SYSTEM=y
| CONFIG_X86_PM_TIMER=y
| 
| CONFIG_PCI=y
| CONFIG_PCI_GOANY=y
| CONFIG_PCI_BIOS=y
| CONFIG_PCI_DIRECT=y
| CONFIG_PCI_MMCONFIG=y
| CONFIG_PCIEPORTBUS=y
| CONFIG_PCIEAER=y
| CONFIG_HT_IRQ=y
| CONFIG_ISA_DMA_API=y
| 
| CONFIG_PNP=y
| 
| CONFIG_PNPACPI=y
| 
| CONFIG_CDROM_PKTCDVD=y
| CONFIG_CDROM_PKTCDVD_BUFFERS=8
| 
| CONFIG_IDE=y
| 
| CONFIG_RAID_ATTRS=y
| CONFIG_SCSI=y
| CONFIG_SCSI_PROC_FS=y
| 
| CONFIG_BLK_DEV_SD=y
| CONFIG_BLK_DEV_SR=y
| CONFIG_CHR_DEV_SG=y
| 
| CONFIG_SCSI_MULTI_LUN=y
| CONFIG_SCSI_CONSTANTS=y
| CONFIG_SCSI_LOGGING=y
| 
| CONFIG_SCSI_SPI_ATTRS=y
| 
| CONFIG_ATA=y
| CONFIG_SATA_AHCI=y
| CONFIG_ATA_PIIX=y
| CONFIG_SATA_INTEL_COMBINED=y
| CONFIG_SATA_ACPI=y
| CONFIG_PATA_SERVERWORKS=y


lspci:
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM 
Controller/Host-Hub Interface (rev 01)
00:01.0 PCI bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP 
Bridge (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM 
(ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440 AGP 
8x] (rev a2)
02:0c.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)

lspci -v -v -v for IDE
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01) 
(prog-if 8a [Master SecP PriP])
Subsystem: Dell Unknown device 0142
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- 
SERR- http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

libata /dev/scd0 problem: mount after burn fails without eject

2007-05-01 Thread Frank van Maarseveen

Tested on 2.6.20.6 and 2.6.21.1

I decided to swich from the old IDE drivers to libata and now there
seems to be a little but annoying problem: cannot mount an ISO image
after burning it.

May  1 14:32:55 kernel: attempt to access beyond end of device
May  1 14:32:55 kernel: sr0: rw=0, want=68, limit=4
May  1 14:32:55 kernel: isofs_fill_super: bread failed, dev=sr0, iso_blknum=16, 
block=16

an eject command seems to fix the state of the PATA DVD writer
or driver. The problem occurs for burning a CD and for DVD too with
identical error messages.


relevant kernel boot messages:
| ata1: PATA max UDMA/100 cmd 0x000101f0 ctl 0x000103f6 bmdma 0x0001ffa0 irq 14
| ata2: PATA max UDMA/100 cmd 0x00010170 ctl 0x00010376 bmdma 0x0001ffa8 irq 15
| scsi0 : ata_piix
| ata1.00: ATA-7: ST3120814A, 3.AAJ, max UDMA/100
| ata1.00: 234441648 sectors, multi 8: LBA48 
| ata1.00: configured for UDMA/100
| scsi1 : ata_piix
| ata2.00: ATAPI, max UDMA/33
| ata2.01: ATAPI, max UDMA/33
| ata2.00: configured for UDMA/33
| ata2.01: configured for UDMA/33
| scsi 0:0:0:0: Direct-Access ATA  ST3120814A   3.AA PQ: 0 ANSI: 5
| SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
| sda: Write Protect is off
| SCSI device sda: write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
| SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
| sda: Write Protect is off
| SCSI device sda: write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
|  sda: sda1 sda2 sda4
| sd 0:0:0:0: Attached scsi disk sda
| sd 0:0:0:0: Attached scsi generic sg0 type 0
| scsi 1:0:0:0: CD-ROMHP   DVD Writer 640c  CS30 PQ: 0 ANSI: 5
| sr0: scsi3-mmc drive: 40x/40x writer cd/rw xa/form2 cdda tray
| Uniform CD-ROM driver Revision: 3.20
| sr 1:0:0:0: Attached scsi generic sg1 type 5
| scsi 1:0:1:0: CD-ROMSAMSUNG  CD-ROM SC-148C   B105 PQ: 0 ANSI: 5
| sr1: scsi3-mmc drive: 1x/48x cd/rw xa/form2 cdda tray
| sr 1:0:1:0: Attached scsi generic sg2 type 5

stripped config (well, as far of I'm sure it shouldn't matter):
| CONFIG_PM=y
| CONFIG_PM_LEGACY=y
| 
| CONFIG_ACPI=y
| CONFIG_ACPI_PROCFS=y
| CONFIG_ACPI_FAN=y
| CONFIG_ACPI_PROCESSOR=y
| CONFIG_ACPI_THERMAL=y
| CONFIG_ACPI_BLACKLIST_YEAR=0
| CONFIG_ACPI_EC=y
| CONFIG_ACPI_POWER=y
| CONFIG_ACPI_SYSTEM=y
| CONFIG_X86_PM_TIMER=y
| 
| CONFIG_PCI=y
| CONFIG_PCI_GOANY=y
| CONFIG_PCI_BIOS=y
| CONFIG_PCI_DIRECT=y
| CONFIG_PCI_MMCONFIG=y
| CONFIG_PCIEPORTBUS=y
| CONFIG_PCIEAER=y
| CONFIG_HT_IRQ=y
| CONFIG_ISA_DMA_API=y
| 
| CONFIG_PNP=y
| 
| CONFIG_PNPACPI=y
| 
| CONFIG_CDROM_PKTCDVD=y
| CONFIG_CDROM_PKTCDVD_BUFFERS=8
| 
| CONFIG_IDE=y
| 
| CONFIG_RAID_ATTRS=y
| CONFIG_SCSI=y
| CONFIG_SCSI_PROC_FS=y
| 
| CONFIG_BLK_DEV_SD=y
| CONFIG_BLK_DEV_SR=y
| CONFIG_CHR_DEV_SG=y
| 
| CONFIG_SCSI_MULTI_LUN=y
| CONFIG_SCSI_CONSTANTS=y
| CONFIG_SCSI_LOGGING=y
| 
| CONFIG_SCSI_SPI_ATTRS=y
| 
| CONFIG_ATA=y
| CONFIG_SATA_AHCI=y
| CONFIG_ATA_PIIX=y
| CONFIG_SATA_INTEL_COMBINED=y
| CONFIG_SATA_ACPI=y
| CONFIG_PATA_SERVERWORKS=y


lspci:
00:00.0 Host bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE DRAM 
Controller/Host-Hub Interface (rev 01)
00:01.0 PCI bridge: Intel Corporation 82845G/GL[Brookdale-G]/GE/PE Host-to-AGP 
Bridge (rev 01)
00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #1 (rev 01)
00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #2 (rev 01)
00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) 
USB UHCI Controller #3 (rev 01)
00:1d.7 USB Controller: Intel Corporation 82801DB/DBM (ICH4/ICH4-M) USB2 EHCI 
Controller (rev 01)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 81)
00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface 
Bridge (rev 01)
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01)
00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus 
Controller (rev 01)
00:1f.5 Multimedia audio controller: Intel Corporation 82801DB/DBL/DBM 
(ICH4/ICH4-L/ICH4-M) AC'97 Audio Controller (rev 01)
01:00.0 VGA compatible controller: nVidia Corporation NV18 [GeForce4 MX 440 AGP 
8x] (rev a2)
02:0c.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)

lspci -v -v -v for IDE
00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 01) 
(prog-if 8a [Master SecP PriP])
Subsystem: Dell Unknown device 0142
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium TAbort- 
TAbort- MAbort- SERR- PERR-
Latency: 0
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at 01f0 [size=8]
Region 1: I/O ports at 03f4 [size=1]
Region 2: I/O ports at 0170 [size=8]
Region 3: I/O ports at 0374 [size=1]
Region 4: I/O ports at ffa0 [size=16]

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-13 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 09:40:25PM +0100, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Monday 12 March 2007, Frank van Maarseveen wrote:
> > On Mon, Mar 12, 2007 at 01:21:18PM +0100, Bartlomiej Zolnierkiewicz wrote:
> > > 
> > > Hi,
> > > 
> > > Could you check if this is the same problem as this one:
> > > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=8169
> > 
> > Looks like it except that I don't see "lost interrupt" messages here. So,
> > it might be something different (I don't know).
> 
> From the first mail:
> 
> hda: max request size: 128KiB
> hda: 40021632 sectors (20491 MB) w/2048KiB Cache, CHS=39704/16/63
> hda: cache flushes not supported
>  hda: hda1 hda2 hda4
> 
> It seems that DMA is not used by default (CONFIG_IDEDMA_PCI_AUTO=n),
> so this is probably exactly the same issue.
> 
> Please try the patch attached to the bugzilla bug entry.

2.6.20.2 rejects this patch and I don't see a way to apply it by hand:
ide_set_dma() isn't there, nothing seems to match.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-13 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 09:40:25PM +0100, Bartlomiej Zolnierkiewicz wrote:
 
 Hi,
 
 On Monday 12 March 2007, Frank van Maarseveen wrote:
  On Mon, Mar 12, 2007 at 01:21:18PM +0100, Bartlomiej Zolnierkiewicz wrote:
   
   Hi,
   
   Could you check if this is the same problem as this one:
   
   http://bugzilla.kernel.org/show_bug.cgi?id=8169
  
  Looks like it except that I don't see lost interrupt messages here. So,
  it might be something different (I don't know).
 
 From the first mail:
 
 hda: max request size: 128KiB
 hda: 40021632 sectors (20491 MB) w/2048KiB Cache, CHS=39704/16/63
 hda: cache flushes not supported
  hda: hda1 hda2 hda4
 
 It seems that DMA is not used by default (CONFIG_IDEDMA_PCI_AUTO=n),
 so this is probably exactly the same issue.
 
 Please try the patch attached to the bugzilla bug entry.

2.6.20.2 rejects this patch and I don't see a way to apply it by hand:
ide_set_dma() isn't there, nothing seems to match.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 12:07:18PM +, Alistair John Strachan wrote:
> On Monday 12 March 2007 11:24, Frank van Maarseveen wrote:
> > On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> > > 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> > > -d 1 /dev/hda):
> > >
> > >   hda: dma_timer_expiry: dma status == 0x20
> > >   hda: DMA timeout retry
> > >   hda: timeout waiting for DMA
> > >   hda: status error: status=0x58 {
> > >   DriveReady
> > >   SeekComplete
> > >   DataRequest
> > >   }
> [snip]
> > This system has SATA but there's only one PATA disk
> 
> Not a solution, unfortunately, but try disabling CONFIG_IDE and using Alan's 
> new PATA drivers. For your Intel systems, this should mean you need only:
> 
> CONFIG_ATA_PIIX
> 
> For both SATA and PATA support. You'll need the appropriate SCSI modules 
> built 
> in (if you say =y), i.e. SCSI disk and SCSI CDROM should be built in.

yes, that worked... after booting with root=/dev/sda2 and s/hda/sda/
/etc/fstab /etc/lilo.conf + lilo. didn't mount a /dev/sr0 for a loong
time.

So, are /dev/hd* going to disappear in a few years? iow, does it make
sense to _slowly_ start to migrate to /dev/sd*?

The problem is there's no plan B in case of any troubles except rename
everything back again to boot an old kernel.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 01:21:18PM +0100, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> Could you check if this is the same problem as this one:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=8169

Looks like it except that I don't see "lost interrupt" messages here. So,
it might be something different (I don't know).

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
> 
> 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
> -d 1 /dev/hda):
> 
>   hda: dma_timer_expiry: dma status == 0x20
>   hda: DMA timeout retry
>   hda: timeout waiting for DMA
>   hda: status error: status=0x58 {
>   DriveReady
>   SeekComplete
>   DataRequest
>   }

I have a totally different PATA based system (P4 HT) with similar symptoms
except that it seem to recover by switching DMA off during boot after
5 errors:

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command

So in this case it doesn't hang but is not really usable either.

lspci:
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub 
Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI 
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface 
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller 
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 
02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) 
AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] 
(rev a1)
02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)

This system has SATA but there's only one PATA disk

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.20*: PATA DMA timeout, hangs

2007-03-12 Thread Frank van Maarseveen


2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
-d 1 /dev/hda):

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 {
DriveReady
SeekComplete
DataRequest
}

Linux version 2.6.20.2-x152 ([EMAIL PROTECTED]) (gcc version 3.4.6 (Debian 
3.4.6-4)) #1 SMP Sun Mar 11 21:21:07 CET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009fc00 end: 
0009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009fc00 size: 0400 end: 
000a type: 2
copy_e820_map() start: 000e size: 0002 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1fdd end: 
1fed type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1fed size: 0002 end: 
1fef type: 4
copy_e820_map() start: 1fef size: 0001 end: 
1ff0 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: feea size: 0116 end: 
0001 type: 2
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1fed (usable)
 BIOS-e820: 1fed - 1fef (ACPI NVS)
 BIOS-e820: 1fef - 1ff0 (usable)
 BIOS-e820: feea - 0001 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
found SMP MP-table at 000f8da0
Zone PFN ranges:
  DMA 0 -> 4096
  Normal   4096 ->   130816
  HighMem130816 ->   130816
early_node_map[1] active PFN ranges
0:0 ->   130816
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0xf808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:8 APIC version 17
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 2000 (gap: 1ff0:defa)
Detected 996.780 MHz processor.
Built 1 zonelists.  Total pages: 129156
Kernel command line: auto BOOT_IMAGE=2.6.20.2-x152 ro root=302 nomodules 
panic=60 [EMAIL PROTECTED]/,[EMAIL PROTECTED]/00:12:3f:85:17:52
netconsole: local port 6665
netconsole: local IP 172.17.1.60
netconsole: interface eth0
netconsole: remote port 514
netconsole: remote IP 172.17.1.64
netconsole: remote ethernet address 00:12:3f:85:17:52
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:8
... MAX_LOCK_DEPTH:  30
... MAX_LOCKDEP_KEYS:2048
... CLASSHASH_SIZE:   1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS:  16384
... CHAINHASH_SIZE:  8192
 memory used by lock dependency info: 1064 kB
 per task-struct memory footprint: 1200 bytes
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 506148k/523264k available (4226k kernel code, 16352k reserved, 1988k 
data, 288k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0xffe16000 - 0xf000   (1956 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xe080 - 0xff7fe000   ( 495 MB)
lowmem  : 0xc000 - 0xdff0   ( 511 MB)
  .init : 0xc0719000 - 0xc0761000   ( 288 kB)
  .data : 0xc0520b39 - 0xc0711c60   (1988 kB)
  .text : 0xc010 - 0xc0520b39   (4226 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1994.91 BogoMIPS (lpj=997456)
Mount-cache hash table entries: 512
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 17k freed
ACPI: Core revision 20060707
CPU0: Intel Pentium III (Coppermine) stepping 06
Total of 1 processors activated (1994.91 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Brought up 1 CPUs
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at

2.6.20*: PATA DMA timeout, hangs

2007-03-12 Thread Frank van Maarseveen


2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
-d 1 /dev/hda):

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 {
DriveReady
SeekComplete
DataRequest
}

Linux version 2.6.20.2-x152 ([EMAIL PROTECTED]) (gcc version 3.4.6 (Debian 
3.4.6-4)) #1 SMP Sun Mar 11 21:21:07 CET 2007
BIOS-provided physical RAM map:
sanitize start
sanitize end
copy_e820_map() start:  size: 0009fc00 end: 
0009fc00 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 0009fc00 size: 0400 end: 
000a type: 2
copy_e820_map() start: 000e size: 0002 end: 
0010 type: 2
copy_e820_map() start: 0010 size: 1fdd end: 
1fed type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: 1fed size: 0002 end: 
1fef type: 4
copy_e820_map() start: 1fef size: 0001 end: 
1ff0 type: 1
copy_e820_map() type is E820_RAM
copy_e820_map() start: feea size: 0116 end: 
0001 type: 2
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - 1fed (usable)
 BIOS-e820: 1fed - 1fef (ACPI NVS)
 BIOS-e820: 1fef - 1ff0 (usable)
 BIOS-e820: feea - 0001 (reserved)
0MB HIGHMEM available.
511MB LOWMEM available.
found SMP MP-table at 000f8da0
Zone PFN ranges:
  DMA 0 - 4096
  Normal   4096 -   130816
  HighMem130816 -   130816
early_node_map[1] active PFN ranges
0:0 -   130816
DMI 2.3 present.
ACPI: PM-Timer IO Port: 0xf808
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 6:8 APIC version 17
ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 2000 (gap: 1ff0:defa)
Detected 996.780 MHz processor.
Built 1 zonelists.  Total pages: 129156
Kernel command line: auto BOOT_IMAGE=2.6.20.2-x152 ro root=302 nomodules 
panic=60 [EMAIL PROTECTED]/,[EMAIL PROTECTED]/00:12:3f:85:17:52
netconsole: local port 6665
netconsole: local IP 172.17.1.60
netconsole: interface eth0
netconsole: remote port 514
netconsole: remote IP 172.17.1.64
netconsole: remote ethernet address 00:12:3f:85:17:52
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
PID hash table entries: 2048 (order: 11, 8192 bytes)
Console: colour VGA+ 80x25
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES:8
... MAX_LOCK_DEPTH:  30
... MAX_LOCKDEP_KEYS:2048
... CLASSHASH_SIZE:   1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS:  16384
... CHAINHASH_SIZE:  8192
 memory used by lock dependency info: 1064 kB
 per task-struct memory footprint: 1200 bytes
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 506148k/523264k available (4226k kernel code, 16352k reserved, 1988k 
data, 288k init, 0k highmem)
virtual kernel memory layout:
fixmap  : 0xffe16000 - 0xf000   (1956 kB)
pkmap   : 0xff80 - 0xffc0   (4096 kB)
vmalloc : 0xe080 - 0xff7fe000   ( 495 MB)
lowmem  : 0xc000 - 0xdff0   ( 511 MB)
  .init : 0xc0719000 - 0xc0761000   ( 288 kB)
  .data : 0xc0520b39 - 0xc0711c60   (1988 kB)
  .text : 0xc010 - 0xc0520b39   (4226 kB)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 1994.91 BogoMIPS (lpj=997456)
Mount-cache hash table entries: 512
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to e000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 17k freed
ACPI: Core revision 20060707
CPU0: Intel Pentium III (Coppermine) stepping 06
Total of 1 processors activated (1994.91 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
Brought up 1 CPUs
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: PCI BIOS revision 2.10 entry at

2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
 
 2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
 -d 1 /dev/hda):
 
   hda: dma_timer_expiry: dma status == 0x20
   hda: DMA timeout retry
   hda: timeout waiting for DMA
   hda: status error: status=0x58 {
   DriveReady
   SeekComplete
   DataRequest
   }

I have a totally different PATA based system (P4 HT) with similar symptoms
except that it seem to recover by switching DMA off during boot after
5 errors:

hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command
hda: dma_timer_expiry: dma status == 0x20
hda: DMA timeout retry
hda: timeout waiting for DMA
hda: status error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: drive not ready for command

So in this case it doesn't hang but is not really usable either.

lspci:
00:00.0 Host bridge: Intel Corporation 82865G/PE/P DRAM Controller/Host-Hub 
Interface (rev 02)
00:01.0 PCI bridge: Intel Corporation 82865G/PE/P PCI to AGP Controller (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI 
Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI 
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2)
00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface 
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller 
(rev 02)
00:1f.2 IDE interface: Intel Corporation 82801EB (ICH5) SATA Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 
02)
00:1f.5 Multimedia audio controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) 
AC'97 Audio Controller (rev 02)
01:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200] 
(rev a1)
02:00.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)

This system has SATA but there's only one PATA disk

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 01:21:18PM +0100, Bartlomiej Zolnierkiewicz wrote:
 
 Hi,
 
 Could you check if this is the same problem as this one:
 
 http://bugzilla.kernel.org/show_bug.cgi?id=8169

Looks like it except that I don't see lost interrupt messages here. So,
it might be something different (I don't know).

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.20*: PATA DMA timeout, hangs (2)

2007-03-12 Thread Frank van Maarseveen

On Mon, Mar 12, 2007 at 12:07:18PM +, Alistair John Strachan wrote:
 On Monday 12 March 2007 11:24, Frank van Maarseveen wrote:
  On Mon, Mar 12, 2007 at 09:54:47AM +0100, Frank van Maarseveen wrote:
   2.6.19 is ok, 2.6.20.[12] hangs from the moment DMA is turned on (hdparm
   -d 1 /dev/hda):
  
 hda: dma_timer_expiry: dma status == 0x20
 hda: DMA timeout retry
 hda: timeout waiting for DMA
 hda: status error: status=0x58 {
 DriveReady
 SeekComplete
 DataRequest
 }
 [snip]
  This system has SATA but there's only one PATA disk
 
 Not a solution, unfortunately, but try disabling CONFIG_IDE and using Alan's 
 new PATA drivers. For your Intel systems, this should mean you need only:
 
 CONFIG_ATA_PIIX
 
 For both SATA and PATA support. You'll need the appropriate SCSI modules 
 built 
 in (if you say =y), i.e. SCSI disk and SCSI CDROM should be built in.

yes, that worked... after booting with root=/dev/sda2 and s/hda/sda/
/etc/fstab /etc/lilo.conf + lilo. didn't mount a /dev/sr0 for a loong
time.

So, are /dev/hd* going to disappear in a few years? iow, does it make
sense to _slowly_ start to migrate to /dev/sd*?

The problem is there's no plan B in case of any troubles except rename
everything back again to boot an old kernel.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.19.2 nfs BUG: warning at mm/truncate.c:398/invalidate_inode_pages2_range()

2007-02-15 Thread Frank van Maarseveen

FYI,

Just captured this one, I'm not sure it's NFS at fault because I saw
at least another AIO related mm/truncate.c:398 report with a totally
different stack trace.

The machine seems still running happily, as usual with a considerable
load. kernel is tainted by fglrx.

kernel: BUG: warning at mm/truncate.c:398/invalidate_inode_pages2_range()
kernel:  [] dump_trace+0x227/0x240
kernel:  [] show_trace_log_lvl+0x2f/0x50
kernel:  [] show_trace+0x27/0x30
kernel:  [] dump_stack+0x26/0x30
kernel:  [] invalidate_inode_pages2_range+0x305/0x310
kernel:  [] invalidate_inode_pages2+0x21/0x30
kernel:  [] nfs_revalidate_mapping+0x98/0x180
kernel:  [] nfs_file_read+0x46/0xb0
kernel:  [] do_sync_read+0xde/0x130
kernel:  [] vfs_read+0xa1/0x1b0
kernel:  [] sys_read+0x47/0x70
kernel:  [] syscall_call+0x7/0xb
kernel:  [] 0xb7ec0cbe
kernel:  ===

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.19.2 nfs BUG: warning at mm/truncate.c:398/invalidate_inode_pages2_range()

2007-02-15 Thread Frank van Maarseveen

FYI,

Just captured this one, I'm not sure it's NFS at fault because I saw
at least another AIO related mm/truncate.c:398 report with a totally
different stack trace.

The machine seems still running happily, as usual with a considerable
load. kernel is tainted by fglrx.

kernel: BUG: warning at mm/truncate.c:398/invalidate_inode_pages2_range()
kernel:  [c0104897] dump_trace+0x227/0x240
kernel:  [c010498f] show_trace_log_lvl+0x2f/0x50
kernel:  [c01049d7] show_trace+0x27/0x30
kernel:  [c0104b06] dump_stack+0x26/0x30
kernel:  [c015a495] invalidate_inode_pages2_range+0x305/0x310
kernel:  [c015a4c1] invalidate_inode_pages2+0x21/0x30
kernel:  [c01f7aa8] nfs_revalidate_mapping+0x98/0x180
kernel:  [c01f5726] nfs_file_read+0x46/0xb0
kernel:  [c017441e] do_sync_read+0xde/0x130
kernel:  [c0174511] vfs_read+0xa1/0x1b0
kernel:  [c0174947] sys_read+0x47/0x70
kernel:  [c0103387] syscall_call+0x7/0xb
kernel:  [b7ec0cbe] 0xb7ec0cbe
kernel:  ===

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usb2 external disk not recognized if connected during boot, but recognized if not connected during boot

2007-01-24 Thread Frank van Maarseveen

On Tue, Jan 23, 2007 at 11:13:03AM +0200, Yakov Lerner wrote:
> On a small Celeron-based appliance, Usb2 disk is not recognized *if*
> it is connected during kernel boot.
> But if not connected during boot, and I connect it later, it is
> recognized and works ok.
> I tried various 2.6.16, 17 and 18 kernels, both modular, and
> all-static, with the same result.
> What can this be.
> This is ehci controller. Can it be problematic Irq assignments on the
> motherboard ?
> Btw, during boot, access lights go on forever on the Usb drive. Couple
> of kernels are stuck
> at this point. Most kernels go through, but disk is still not
> recognized. If I disconnect and re-connect usb drive later, then it is
> never recognized (if it was connected earlier during boot). Again, if
> the disk was not connected during boot, all kernels recognize it and
> work with it.

Try making EHCI a module and modprobe it after booting the kernel. I
have a mainboard/USB stick combo which refuses to work when plugged in
during cold boot with EHCI driver in kernel. The driver repeatedly tried
to reset USB hardware and then gave up. Replugging did work however.

I worked around the issue by booting without EHCI driver in kernel. A more
modern USB stick fixed it too. After some investigation consensus was that
the hardware combination was broken and the EHCI driver was not at fault.

Maybe this is a similar hardware problem. Does the USB disk has its
own power? then replugging USB may not be enough and you might also
need to toggle USB disk power in order to make it work again.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: usb2 external disk not recognized if connected during boot, but recognized if not connected during boot

2007-01-24 Thread Frank van Maarseveen

On Tue, Jan 23, 2007 at 11:13:03AM +0200, Yakov Lerner wrote:
 On a small Celeron-based appliance, Usb2 disk is not recognized *if*
 it is connected during kernel boot.
 But if not connected during boot, and I connect it later, it is
 recognized and works ok.
 I tried various 2.6.16, 17 and 18 kernels, both modular, and
 all-static, with the same result.
 What can this be.
 This is ehci controller. Can it be problematic Irq assignments on the
 motherboard ?
 Btw, during boot, access lights go on forever on the Usb drive. Couple
 of kernels are stuck
 at this point. Most kernels go through, but disk is still not
 recognized. If I disconnect and re-connect usb drive later, then it is
 never recognized (if it was connected earlier during boot). Again, if
 the disk was not connected during boot, all kernels recognize it and
 work with it.

Try making EHCI a module and modprobe it after booting the kernel. I
have a mainboard/USB stick combo which refuses to work when plugged in
during cold boot with EHCI driver in kernel. The driver repeatedly tried
to reset USB hardware and then gave up. Replugging did work however.

I worked around the issue by booting without EHCI driver in kernel. A more
modern USB stick fixed it too. After some investigation consensus was that
the hardware combination was broken and the EHCI driver was not at fault.

Maybe this is a similar hardware problem. Does the USB disk has its
own power? then replugging USB may not be enough and you might also
need to toggle USB disk power in order to make it work again.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-09 Thread Frank van Maarseveen

On Tue, Jan 09, 2007 at 11:26:25AM -0500, Steven Rostedt wrote:
> On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:
> 
> > > 50% probability of false positive on 4G files seems like very ugly
> > > design problem to me.
> > 
> > 4 billion files, each with more than one link is pretty far fetched.
> > And anyway, filesystems can take steps to prevent collisions, as they
> > do currently for 32bit st_ino, without serious difficulties
> > apparently.
> 
> Maybe not 4 billion files, but you can get a large number of >1 linked
> files, when you copy full directories with "cp -rl".

Yes but "cp -rl" is typically done by _developers_ and they tend to
have a better understanding of this (uh, at least within linux context
I hope so).

Also, just adding hard-links doesn't increase the number of inodes.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-09 Thread Frank van Maarseveen

On Tue, Jan 09, 2007 at 11:26:25AM -0500, Steven Rostedt wrote:
 On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:
 
   50% probability of false positive on 4G files seems like very ugly
   design problem to me.
  
  4 billion files, each with more than one link is pretty far fetched.
  And anyway, filesystems can take steps to prevent collisions, as they
  do currently for 32bit st_ino, without serious difficulties
  apparently.
 
 Maybe not 4 billion files, but you can get a large number of 1 linked
 files, when you copy full directories with cp -rl.

Yes but cp -rl is typically done by _developers_ and they tend to
have a better understanding of this (uh, at least within linux context
I hope so).

Also, just adding hard-links doesn't increase the number of inodes.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-05 Thread Frank van Maarseveen

On Fri, Jan 05, 2007 at 09:43:22AM +0100, Miklos Szeredi wrote:
> > > > > High probability is all you have.  Cosmic radiation hitting your
> > > > > computer will more likly cause problems, than colliding 64bit inode
> > > > > numbers ;)
> > > > 
> > > > Some of us have machines designed to cope with cosmic rays, and would be
> > > > unimpressed with a decrease in reliability.
> > > 
> > > With the suggested samefile() interface you'd get a failure with just
> > > about 100% reliability for any application which needs to compare a
> > > more than a few files.  The fact is open files are _very_ expensive,
> > > no wonder they are limited in various ways.
> > > 
> > > What should 'tar' do when it runs out of open files, while searching
> > > for hardlinks?  Should it just give up?  Then the samefile() interface
> > > would be _less_ reliable than the st_ino one by a significant margin.
> > 
> > You need at most two simultenaously open files for examining any
> > number of hardlinks. So yes, you can make it reliable.
> 
> Well, sort of.  Samefile without keeping fds open doesn't have any
> protection against the tree changing underneath between first
> registering a file and later opening it.  The inode number is more
> useful in this respect.  In fact inode number + generation number will
> give you a unique identifier in time as well, which is a _lot_ more
> useful to determine if the file you are checking is actually the same
> as one that you've come across previously.

Samefile with keeping fds open doesn't buy you much anyway. What exactly
would be the value of a directory tree seen by operating only on fds
(even for directories) when some rogue process is renaming, moving,
updating stuff underneath?  One ends up with a tree which misses alot
of files and hardly bears any resemblance with the actual tree at any
point in time and I'm not even talking about filedata.

It is futile to try to get a consistent tree view on a live filesystem,
with- or without using fds. It just doesn't work without fundamental
support for some kind of "freezing" or time-travel inside the
kernel. Snapshots at the block device level are problematic too.

> 
> So instead of samefile() I'd still suggest an extended attribute
> interface which exports the file's unique (in space and time)
> identifier as an opaque cookie.

But then you're just _shifting_ the problem instead of fixing it:
st_ino/st_mtime (st_ctime?) are designed for this purpose. If the
filesystem doesn't support it properly: live with the consequences
which are mostly minor. Notable exceptions are of course backup tools
but backups _must_ be verified anyway so you'll discover soon.

(btw, that's what I noticed after restoring a system from a CD (iso9660
 with RR): all hardlinks were gone)

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-05 Thread Frank van Maarseveen

On Fri, Jan 05, 2007 at 09:43:22AM +0100, Miklos Szeredi wrote:
 High probability is all you have.  Cosmic radiation hitting your
 computer will more likly cause problems, than colliding 64bit inode
 numbers ;)

Some of us have machines designed to cope with cosmic rays, and would be
unimpressed with a decrease in reliability.
   
   With the suggested samefile() interface you'd get a failure with just
   about 100% reliability for any application which needs to compare a
   more than a few files.  The fact is open files are _very_ expensive,
   no wonder they are limited in various ways.
   
   What should 'tar' do when it runs out of open files, while searching
   for hardlinks?  Should it just give up?  Then the samefile() interface
   would be _less_ reliable than the st_ino one by a significant margin.
  
  You need at most two simultenaously open files for examining any
  number of hardlinks. So yes, you can make it reliable.
 
 Well, sort of.  Samefile without keeping fds open doesn't have any
 protection against the tree changing underneath between first
 registering a file and later opening it.  The inode number is more
 useful in this respect.  In fact inode number + generation number will
 give you a unique identifier in time as well, which is a _lot_ more
 useful to determine if the file you are checking is actually the same
 as one that you've come across previously.

Samefile with keeping fds open doesn't buy you much anyway. What exactly
would be the value of a directory tree seen by operating only on fds
(even for directories) when some rogue process is renaming, moving,
updating stuff underneath?  One ends up with a tree which misses alot
of files and hardly bears any resemblance with the actual tree at any
point in time and I'm not even talking about filedata.

It is futile to try to get a consistent tree view on a live filesystem,
with- or without using fds. It just doesn't work without fundamental
support for some kind of freezing or time-travel inside the
kernel. Snapshots at the block device level are problematic too.

 
 So instead of samefile() I'd still suggest an extended attribute
 interface which exports the file's unique (in space and time)
 identifier as an opaque cookie.

But then you're just _shifting_ the problem instead of fixing it:
st_ino/st_mtime (st_ctime?) are designed for this purpose. If the
filesystem doesn't support it properly: live with the consequences
which are mostly minor. Notable exceptions are of course backup tools
but backups _must_ be verified anyway so you'll discover soon.

(btw, that's what I noticed after restoring a system from a CD (iso9660
 with RR): all hardlinks were gone)

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Thu, Jan 04, 2007 at 12:43:20AM +0100, Mikulas Patocka wrote:
> On Wed, 3 Jan 2007, Frank van Maarseveen wrote:
> >Currently, large file support is already necessary to handle dvd and
> >video. It's also useful for images for virtualization. So the failing 
> >stat()
> >calls should already be a thing of the past with modern distributions.
> 
> As long as glibc compiles by default with 32-bit ino_t, the problem exists 
> and is severe --- programs handling large files, such as coreutils, tar, 
> mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or 
> script) may type something like:
> 
> cat >file.c < #include 
> #include 
> main()
> {
>   int h;
>   struct stat st;
>   if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
>   if (fstat(h, )) perror("stat"), exit(1);
>   close(h);
>   return 0;
> }
> EOF
> gcc file.c; ./a.out
> 
> --- and you certainly do not want this to fail (unless you are out of disk 
> space).
> 
> The difference is, that with 32-bit program and 64-bit off_t, you get 
> deterministic failure on large files, with 32-bit program and 64-bit 
> ino_t, you get random failures.

What's (technically) the problem with changing the gcc default?

Alternatively we could make the error deterministic in various ways. Start
st_ino numbering from 4G (except for a few special ones maybe such
as root/mounts). Or make old and new programs look differently at the
ELF level or by sys_personality() and/or check against a "ino64" mount
flag/filesystem feature. Lots of possibilities.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote:
> >On any decent filesystem st_ino should uniquely identify an object and
> >reliably provide hardlink information. The UNIX world has relied upon 
> this
> >for decades. A filesystem with st_ino collisions without being hardlinked
> >(or the other way around) needs a fix.
> 
> But for at least the last of those decades, filesystems that could not do 
> that were not uncommon.  They had to present 32 bit inode numbers and 
> either allowed more than 4G files or just didn't have the means of 
> assigning inode numbers with the proper uniqueness to files.  And the sky 
> did not fall.  I don't have an explanation why,

I think it's mostly high end use and high end users tend to understand
more. But we're going to see more really large filesystems in "normal"
use so..

Currently, large file support is already necessary to handle dvd and
video. It's also useful for images for virtualization. So the failing stat()
calls should already be a thing of the past with modern distributions.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 08:31:32PM +0100, Mikulas Patocka wrote:
> I didn't hardlink directories, I just patched stat, lstat and fstat to
> always return st_ino == 0 --- and I've seen those failures. These 
> failures
> are going to happen on non-POSIX filesystems in real world too, very
> rarely.
> >>>
> >>>I don't want to spoil your day but testing with st_ino==0 is a bad choice
> >>>because it is a special number. Anyway, one can only find breakage,
> >>>not prove that all the other programs handle this correctly so this is
> >>>kind of pointless.
> >>>
> >>>On any decent filesystem st_ino should uniquely identify an object and
> >>>reliably provide hardlink information. The UNIX world has relied upon 
> >>>this
> >>>for decades. A filesystem with st_ino collisions without being hardlinked
> >>>(or the other way around) needs a fix.
> >>
> >>... and that's the problem --- the UNIX world specified something that
> >>isn't implementable in real world.
> >
> >Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
> >inode number space in 64 bit (of course it is a matter of time for it to
> >jump to 128 bit and more)
> 
> If the filesystem was designed by someone not from Unix world (FAT, SMB, 
> ...), then not. And users still want to access these filesystems.

They can. Hey, it's not perfect but who expects FAT/SMB to be "perfect" anyway?

> 
> 64-bit inode numbers space is not yet implemented on Linux --- the problem 
> is that if you return ino >= 2^32, programs compiled without 
> -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
> failure is specified in POSIX, but not very useful.

hmm, checking iunique(), ino_t, __kernel_ino_t... I see. Pity. So at
some point in time we may need a sort of "ino64" mount option to be
able to switch to a 64 bit number space on mount basis. Or (conversely)
refuse to mount without that option if we know there are >32 bit st_ino
out there. And invent iunique64() and use that when "ino64" specified
for FAT/SMB/...  when those filesystems haven't been replaced by a
successor by that time.

At that time probably all programs are either compiled with
-D_FILE_OFFSET_BITS=64 (most already are because of files bigger than 2G)
or completely 64 bit. 

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 08:17:34PM +0100, Mikulas Patocka wrote:
> 
> On Wed, 3 Jan 2007, Frank van Maarseveen wrote:
> 
> >On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
> >>
> >>I didn't hardlink directories, I just patched stat, lstat and fstat to
> >>always return st_ino == 0 --- and I've seen those failures. These failures
> >>are going to happen on non-POSIX filesystems in real world too, very
> >>rarely.
> >
> >I don't want to spoil your day but testing with st_ino==0 is a bad choice
> >because it is a special number. Anyway, one can only find breakage,
> >not prove that all the other programs handle this correctly so this is
> >kind of pointless.
> >
> >On any decent filesystem st_ino should uniquely identify an object and
> >reliably provide hardlink information. The UNIX world has relied upon this
> >for decades. A filesystem with st_ino collisions without being hardlinked
> >(or the other way around) needs a fix.
> 
> ... and that's the problem --- the UNIX world specified something that 
> isn't implementable in real world.

Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
inode number space in 64 bit (of course it is a matter of time for it to
jump to 128 bit and more)

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
> 
> I didn't hardlink directories, I just patched stat, lstat and fstat to 
> always return st_ino == 0 --- and I've seen those failures. These failures 
> are going to happen on non-POSIX filesystems in real world too, very 
> rarely.

I don't want to spoil your day but testing with st_ino==0 is a bad choice
because it is a special number. Anyway, one can only find breakage,
not prove that all the other programs handle this correctly so this is
kind of pointless.

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon this
for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.

Synthetic filesystems such as /proc are special due to their dynamic
nature and I think st_ino uniqueness is far more important than being able
to provide hardlinks there. Most tree handling programs ("cp", "rm", ...)
break horribly when the tree underneath changes at the same time.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
 
 I didn't hardlink directories, I just patched stat, lstat and fstat to 
 always return st_ino == 0 --- and I've seen those failures. These failures 
 are going to happen on non-POSIX filesystems in real world too, very 
 rarely.

I don't want to spoil your day but testing with st_ino==0 is a bad choice
because it is a special number. Anyway, one can only find breakage,
not prove that all the other programs handle this correctly so this is
kind of pointless.

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon this
for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.

Synthetic filesystems such as /proc are special due to their dynamic
nature and I think st_ino uniqueness is far more important than being able
to provide hardlinks there. Most tree handling programs (cp, rm, ...)
break horribly when the tree underneath changes at the same time.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 08:17:34PM +0100, Mikulas Patocka wrote:
 
 On Wed, 3 Jan 2007, Frank van Maarseveen wrote:
 
 On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
 
 I didn't hardlink directories, I just patched stat, lstat and fstat to
 always return st_ino == 0 --- and I've seen those failures. These failures
 are going to happen on non-POSIX filesystems in real world too, very
 rarely.
 
 I don't want to spoil your day but testing with st_ino==0 is a bad choice
 because it is a special number. Anyway, one can only find breakage,
 not prove that all the other programs handle this correctly so this is
 kind of pointless.
 
 On any decent filesystem st_ino should uniquely identify an object and
 reliably provide hardlink information. The UNIX world has relied upon this
 for decades. A filesystem with st_ino collisions without being hardlinked
 (or the other way around) needs a fix.
 
 ... and that's the problem --- the UNIX world specified something that 
 isn't implementable in real world.

Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
inode number space in 64 bit (of course it is a matter of time for it to
jump to 128 bit and more)

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 08:31:32PM +0100, Mikulas Patocka wrote:
 I didn't hardlink directories, I just patched stat, lstat and fstat to
 always return st_ino == 0 --- and I've seen those failures. These 
 failures
 are going to happen on non-POSIX filesystems in real world too, very
 rarely.
 
 I don't want to spoil your day but testing with st_ino==0 is a bad choice
 because it is a special number. Anyway, one can only find breakage,
 not prove that all the other programs handle this correctly so this is
 kind of pointless.
 
 On any decent filesystem st_ino should uniquely identify an object and
 reliably provide hardlink information. The UNIX world has relied upon 
 this
 for decades. A filesystem with st_ino collisions without being hardlinked
 (or the other way around) needs a fix.
 
 ... and that's the problem --- the UNIX world specified something that
 isn't implementable in real world.
 
 Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
 inode number space in 64 bit (of course it is a matter of time for it to
 jump to 128 bit and more)
 
 If the filesystem was designed by someone not from Unix world (FAT, SMB, 
 ...), then not. And users still want to access these filesystems.

They can. Hey, it's not perfect but who expects FAT/SMB to be perfect anyway?

 
 64-bit inode numbers space is not yet implemented on Linux --- the problem 
 is that if you return ino = 2^32, programs compiled without 
 -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
 failure is specified in POSIX, but not very useful.

hmm, checking iunique(), ino_t, __kernel_ino_t... I see. Pity. So at
some point in time we may need a sort of ino64 mount option to be
able to switch to a 64 bit number space on mount basis. Or (conversely)
refuse to mount without that option if we know there are 32 bit st_ino
out there. And invent iunique64() and use that when ino64 specified
for FAT/SMB/...  when those filesystems haven't been replaced by a
successor by that time.

At that time probably all programs are either compiled with
-D_FILE_OFFSET_BITS=64 (most already are because of files bigger than 2G)
or completely 64 bit. 

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Finding hardlinks

2007-01-03 Thread Frank van Maarseveen

On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote:
 On any decent filesystem st_ino should uniquely identify an object and
 reliably provide hardlink information. The UNIX world has relied upon 
 this
 for decades. A filesystem with st_ino collisions without being hardlinked
 (or the other way around) needs a fix.
 
 But for at least the last of those decades, filesystems that could not do 
 that were not uncommon.  They had to present 32 bit inode numbers and 
 either allowed more than 4G files or just didn't have the means of 
 assigning inode numbers with the proper uniqueness to files.  And the sky 
 did not fall.  I don't have an explanation why,

I think it's mostly high end use and high end users tend to understand
more. But we're going to see more really large filesystems in normal
use so..

Currently, large file support is already necessary to handle dvd and
video. It's also useful for images for virtualization. So the failing stat()
calls should already be a thing of the past with modern distributions.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

radix-tree.c:__lookup_slot() dead code removal

2006-12-03 Thread Frank van Maarseveen

Most of the code suggests that it is valid to insert a NULL item,
possibly a zero item with pointer cast. However, in __lookup_slot()
whether or not the slot is found seems to depend on the actual value
of the item in one special case. But further on it doesn't make any
difference so to remove some dead code:

--- a/lib/radix-tree.c  2006-12-03 13:23:00.0 +0100
+++ b/lib/radix-tree.c  2006-12-03 17:57:03.0 +0100
@@ -319,9 +319,6 @@ static inline void **__lookup_slot(struc
if (index > radix_tree_maxindex(height))
return NULL;
 
-   if (height == 0 && root->rnode)
-   return (void **)>rnode;
-
shift = (height-1) * RADIX_TREE_MAP_SHIFT;
slot = >rnode;
 

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

radix-tree.c:__lookup_slot() dead code removal

2006-12-03 Thread Frank van Maarseveen

Most of the code suggests that it is valid to insert a NULL item,
possibly a zero item with pointer cast. However, in __lookup_slot()
whether or not the slot is found seems to depend on the actual value
of the item in one special case. But further on it doesn't make any
difference so to remove some dead code:

--- a/lib/radix-tree.c  2006-12-03 13:23:00.0 +0100
+++ b/lib/radix-tree.c  2006-12-03 17:57:03.0 +0100
@@ -319,9 +319,6 @@ static inline void **__lookup_slot(struc
if (index  radix_tree_maxindex(height))
return NULL;
 
-   if (height == 0  root-rnode)
-   return (void **)root-rnode;
-
shift = (height-1) * RADIX_TREE_MAP_SHIFT;
slot = root-rnode;
 

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13 SMP on AMD Athlon64 X2 + FC4: PS/2 keyboard b0rken; taskset/sched_setaffinity() saves the day!

2005-09-06 Thread Frank van Maarseveen

While playing with a new AMD Athlon64 X2 3800+ (i386) the keyboard goes
wild for 10 (20?) seconds, behaves normally for 10 (20?) seconds, and
then goes wild again: when "wild", every keypress results in a random
number of repeats, e.g.:

$ pppsss aaxxxuuu
bash: pppsss: command not found
$
$
$
$
$
$
$
$

Upgrading Xorg to xorg-x11-6.8.2-37.FC4.45 did not help.

Booting with "nosmp" seems to fix it. And this _seems_ to fix it too:

taskset -p 1 `ps axo comm,pid|awk '$1=="X"{print $2}'`

I haven't seen this problem on the console.

The taskset work-around does not work when PnP OS is set to true in the
BIOS. I've compared kernel messages with- and without PnP set to true
but didn't see any interesting difference though.

The 2.6.11-1.1369_FC4smp kernel shows the same problem but the taskset
fix doesn't work there. Also, that kernel has the "LOC" interrupt
completely gone wild for both CPUs and occasionally crashes so lets
forget about that.

memory is good (1TB disk copy/md5summing + 150 kernel builds did not
reveal any problem). kernel has no modules, no proprietary drivers,
compiled with "gcc32" (3.2.3 20030502) instead of 4.  An earlier
reported hangcheck/nanosleep/clock problem is gone right now for some
unknown reason.

lspci:

00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 
South]
00:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller 
(rev 80)
00:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 
378/SATA 378) (rev 02)
00:09.0 Network controller: RaLink Ralink RT2500 802.11 Cardbus Reference Card 
(rev 01)
00:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit 
Ethernet Controller (rev 13)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8139/8139C/8139C+ (rev 10)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 
AC97 Audio Controller (rev 60)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO] 
(rev 01)
01:00.1 Display controller: ATI Technologies Inc: Unknown device 5940 (rev 01)

kernel boot parameters: ro root=/dev/md1 apm=power-off

config:

CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y

CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

CONFIG_LOCALVERSION=""
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_SYSCTL=y
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
CONFIG_BASE_SMALL=0

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

CONFIG_X86_PC=y
CONFIG_MK8=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MSR=y

Re: 2.6.13: can kill X server but readlink of /proc//exe et. al. says EACCES. feature?

2005-09-06 Thread Frank van Maarseveen

On Tue, Sep 06, 2005 at 06:57:37PM +0100, [EMAIL PROTECTED] wrote:
> On Tue, Sep 06, 2005 at 07:53:49PM +0200, Frank van Maarseveen wrote:
> > While I have access to /proc/, readlink fails with EACCES on
> > 
> > /proc//exe
> > /proc//cwd
> > /proc//root
> > 
> > even when I own  though it runs with a different effective/saved/fs
> > uid such as the X server. This is a bit uncomfortable and doesn't
> > seem right.
> > 
> > Or is this to make /proc mounting inside a chroot jail safe?
> 
> suid-root task does chdir() to place you shouldn't be able to access.
> You do cd /proc//cwd and get there anyway.  Bad Things Happen...

Ok, but being able to do readlink() does not mean that one can chdir(),
usually.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13: can kill X server but readlink of /proc//exe et. al. says EACCES. feature?

2005-09-06 Thread Frank van Maarseveen

While I have access to /proc/, readlink fails with EACCES on

/proc//exe
/proc//cwd
/proc//root

even when I own  though it runs with a different effective/saved/fs
uid such as the X server. This is a bit uncomfortable and doesn't
seem right.

Or is this to make /proc mounting inside a chroot jail safe?

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13: can kill X server but readlink of /proc/pid/exe et. al. says EACCES. feature?

2005-09-06 Thread Frank van Maarseveen

While I have access to /proc/pid, readlink fails with EACCES on

/proc/pid/exe
/proc/pid/cwd
/proc/pid/root

even when I own pid though it runs with a different effective/saved/fs
uid such as the X server. This is a bit uncomfortable and doesn't
seem right.

Or is this to make /proc mounting inside a chroot jail safe?

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13: can kill X server but readlink of /proc/pid/exe et. al. says EACCES. feature?

2005-09-06 Thread Frank van Maarseveen

On Tue, Sep 06, 2005 at 06:57:37PM +0100, [EMAIL PROTECTED] wrote:
 On Tue, Sep 06, 2005 at 07:53:49PM +0200, Frank van Maarseveen wrote:
  While I have access to /proc/pid, readlink fails with EACCES on
  
  /proc/pid/exe
  /proc/pid/cwd
  /proc/pid/root
  
  even when I own pid though it runs with a different effective/saved/fs
  uid such as the X server. This is a bit uncomfortable and doesn't
  seem right.
  
  Or is this to make /proc mounting inside a chroot jail safe?
 
 suid-root task does chdir() to place you shouldn't be able to access.
 You do cd /proc/pid/cwd and get there anyway.  Bad Things Happen...

Ok, but being able to do readlink() does not mean that one can chdir(),
usually.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13 SMP on AMD Athlon64 X2 + FC4: PS/2 keyboard b0rken; taskset/sched_setaffinity() saves the day!

2005-09-06 Thread Frank van Maarseveen

While playing with a new AMD Athlon64 X2 3800+ (i386) the keyboard goes
wild for 10 (20?) seconds, behaves normally for 10 (20?) seconds, and
then goes wild again: when wild, every keypress results in a random
number of repeats, e.g.:

$ pppsss aaxxxuuu
bash: pppsss: command not found
$
$
$
$
$
$
$
$

Upgrading Xorg to xorg-x11-6.8.2-37.FC4.45 did not help.

Booting with nosmp seems to fix it. And this _seems_ to fix it too:

taskset -p 1 `ps axo comm,pid|awk '$1==X{print $2}'`

I haven't seen this problem on the console.

The taskset work-around does not work when PnP OS is set to true in the
BIOS. I've compared kernel messages with- and without PnP set to true
but didn't see any interesting difference though.

The 2.6.11-1.1369_FC4smp kernel shows the same problem but the taskset
fix doesn't work there. Also, that kernel has the LOC interrupt
completely gone wild for both CPUs and occasionally crashes so lets
forget about that.

memory is good (1TB disk copy/md5summing + 150 kernel builds did not
reveal any problem). kernel has no modules, no proprietary drivers,
compiled with gcc32 (3.2.3 20030502) instead of 4.  An earlier
reported hangcheck/nanosleep/clock problem is gone right now for some
unknown reason.

lspci:

00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 
South]
00:07.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host Controller 
(rev 80)
00:08.0 RAID bus controller: Promise Technology, Inc. PDC20378 (FastTrak 
378/SATA 378) (rev 02)
00:09.0 Network controller: RaLink Ralink RT2500 802.11 Cardbus Reference Card 
(rev 01)
00:0a.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit 
Ethernet Controller (rev 13)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. 
RTL-8139/8139C/8139C+ (rev 10)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82x UHCI USB 1.1 
Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 
AC97 Audio Controller (rev 60)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO] 
(rev 01)
01:00.1 Display controller: ATI Technologies Inc: Unknown device 5940 (rev 01)

kernel boot parameters: ro root=/dev/md1 apm=power-off

config:

CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y

CONFIG_EXPERIMENTAL=y
CONFIG_CLEAN_COMPILE=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32

CONFIG_LOCALVERSION=
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_SYSCTL=y
CONFIG_HOTPLUG=y
CONFIG_KOBJECT_UEVENT=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SHMEM=y
CONFIG_CC_ALIGN_FUNCTIONS=0
CONFIG_CC_ALIGN_LABELS=0
CONFIG_CC_ALIGN_LOOPS=0
CONFIG_CC_ALIGN_JUMPS=0
CONFIG_BASE_SMALL=0

CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_KMOD=y
CONFIG_STOP_MACHINE=y

CONFIG_X86_PC=y
CONFIG_MK8=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_PREEMPT_VOLUNTARY=y
CONFIG_PREEMPT_BKL=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MSR=y

2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast

2005-09-04 Thread Frank van Maarseveen

After replacing the kernel on a fresh FC4 install with a stock 2.6.13
(using gcc 3.2) and my own config it appears that the clock is going too
fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
kernel-2.6.11-1.1369_FC4) seems ok

I tried the following from another system with reliable clock:

for i in `yes|head -100`
do
/usr/bin/time -f %e rsh system_with_buggy_clock sleep 1
done | cat -n

annotated output:

 1  1.03
 2  1.03
 3  1.03
 4  1.03
 5  1.03
 6  1.03
 7  1.02
 8  1.03
 9  1.03
10  1.03
11  1.03
12  1.03
13  1.03
14  1.03
15  1.03
16  0.72<==
17  1.03
18  1.03
19  1.03
20  1.03
21  1.03
22  1.03
23  1.03
24  1.02
25  1.03
26  1.03
27  1.03
28  1.03
29  1.03
30  1.03
31  1.03
32  1.03
33  1.03
34  0.14<==
35  1.03
36  1.03
37  1.03
38  1.03
39  1.03
40  1.03
41  1.03
42  1.02
43  1.03
44  1.03
45  1.03
46  1.03
47  1.03
48  1.03
49  1.03
50  1.03
51  1.03
52  0.18<==
53  1.03
54  1.03
55  1.03
56  1.03
57  1.03
58  1.03
59  1.03
60  1.02
61  1.03
62  1.03
63  1.03
64  1.04
65  1.03
66  1.03
67  1.03
68  1.03
69  1.03
70  0.13<==
71  1.03
72  1.03
73  1.03
74  1.03
75  1.03
76  1.03
77  1.03
78  1.02
79  1.03
80  1.03
81  1.03
82  1.03
83  1.03
84  1.03
85  1.03
86  1.03
87  1.03
88  0.15<==
89  1.03
90  1.03
91  1.03
92  1.03
93  1.03
94  1.03
95  1.03
96  1.02
97  1.03
98  1.03
99  1.03
   100  1.03

I also ran the following script on the system with the unstable clock,
measuring timer interrupts per CPU as visible in /proc/interrupts:

   CPU0   CPU1   
  0:67417075860969IO-APIC-edge  timer
  1: 45 10IO-APIC-edge  i8042
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 14: 807745 907612IO-APIC-edge  ide0
 15: 834978 871118IO-APIC-edge  ide1
 17:   45336986   45939432   IO-APIC-level  SysKonnect SK-98xx
 18:  0  0   IO-APIC-level  libata
 21:  0  0   IO-APIC-level  ehci_hcd:usb1, uhci_hcd:usb2, 
uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
 22:  0  0   IO-APIC-level  VIA8237
NMI:  0  0 
LOC:   12601494   12601519 
ERR:  0
MIS:  0

script:

#!/bin/sh

for i in `yes|head -100`
do
s1=`cat /proc/interrupts`
sleep 1
s2=`cat /proc/interrupts`

t10=`echo "$s1" | awk '$1=="0:"{ print $2}'`
t11=`echo "$s1" | awk '$1=="0:"{ print $3}'`
t20=`echo "$s2" | awk '$1=="0:"{ print $2}'`
t21=`echo "$s2" | awk '$1=="0:"{ print $3}'`
d1=`expr $t20 - $t10`
d2=`expr $t21 - $t11`
echo $d1 + $d2 = `expr $d1 + $d2`
done | cat -n

annotated output:

  CPU0 CPU1   Total
---
 1  0 + 251 = 251
 2  0 + 251 = 251
 3  0 + 251 = 251
 4  0 + 251 = 251
 5  0 + 251 = 251
 6  52 + 196 = 248  <== (?)
 7  251 + 0 = 251
 8  251 + 0 = 251
 9  251 + 0 = 251
10  251 + 0 = 251
11  251 + 0 = 251
12  251 + 0 = 251
13  251 + 0 = 251
14  251 + 0 = 251
15  251 + 0 = 251
16  147 + 1 = 148   <==
17  0 + 252 = 252
18  0 + 251 = 251
19  0 + 251 = 251
20  0 + 251 = 251
21  0 + 251 = 251
22  0 + 252 = 252
23  0 + 251 = 251
24  72 + 177 = 249  <== (?)
25  252 + 0 = 252
26  252 + 0 = 252
27  252 + 0 = 252
28  252 + 0 = 252
29  252 + 0 = 252
30  252 + 0 = 252
31  252 + 0 = 252
32  253 + 0 = 253
33  253 + 0 = 253
34  118 + 2 = 120   <==
35  0 + 253 = 253
36  0 + 253 = 253
37  0 + 253 = 253
38  0 + 253 = 253
39  0 + 252 = 252
40  0 + 252 = 252
41  0 + 252 = 252
42  78 + 171 = 249  <== (?)
43  252 + 0 = 252
44  252 + 0 = 252
45  252 + 0 = 252
46  252 + 0 = 252
47  251 + 0 = 251
48  251 + 0 = 251
49  251 + 0 = 251
50  251 + 0 = 251
51  251 + 0 = 251
52  121 + 1 = 122   <==
53  0 + 251 = 251
54  0 + 251 = 251
55  0 + 251 = 251
56  0 + 251 = 251
57  0 + 251 = 251
58  0 + 251 = 251
59  0 + 251 = 251
60  69 + 179 = 248  <== (?)
61  251 + 0 = 251
62  251 + 0 = 251
63  251 + 0 = 251
64  251 + 0 = 251
65  251 + 0 = 251
66  251 + 0 = 251
67  251 + 0 = 251
68  251 + 0 = 251
69  251 + 0 = 251
70  130 + 1 = 131

2.6.13 SMP on Athlon X2: nanosleep returning waay to soon, clock_gettime(CLOCK_REALTIME...) proceeding too fast

2005-09-04 Thread Frank van Maarseveen

After replacing the kernel on a fresh FC4 install with a stock 2.6.13
(using gcc 3.2) and my own config it appears that the clock is going too
fast: it gains at least an hour every 12 hours or so. FC4 kernel (rpm:
kernel-2.6.11-1.1369_FC4) seems ok

I tried the following from another system with reliable clock:

for i in `yes|head -100`
do
/usr/bin/time -f %e rsh system_with_buggy_clock sleep 1
done | cat -n

annotated output:

 1  1.03
 2  1.03
 3  1.03
 4  1.03
 5  1.03
 6  1.03
 7  1.02
 8  1.03
 9  1.03
10  1.03
11  1.03
12  1.03
13  1.03
14  1.03
15  1.03
16  0.72==
17  1.03
18  1.03
19  1.03
20  1.03
21  1.03
22  1.03
23  1.03
24  1.02
25  1.03
26  1.03
27  1.03
28  1.03
29  1.03
30  1.03
31  1.03
32  1.03
33  1.03
34  0.14==
35  1.03
36  1.03
37  1.03
38  1.03
39  1.03
40  1.03
41  1.03
42  1.02
43  1.03
44  1.03
45  1.03
46  1.03
47  1.03
48  1.03
49  1.03
50  1.03
51  1.03
52  0.18==
53  1.03
54  1.03
55  1.03
56  1.03
57  1.03
58  1.03
59  1.03
60  1.02
61  1.03
62  1.03
63  1.03
64  1.04
65  1.03
66  1.03
67  1.03
68  1.03
69  1.03
70  0.13==
71  1.03
72  1.03
73  1.03
74  1.03
75  1.03
76  1.03
77  1.03
78  1.02
79  1.03
80  1.03
81  1.03
82  1.03
83  1.03
84  1.03
85  1.03
86  1.03
87  1.03
88  0.15==
89  1.03
90  1.03
91  1.03
92  1.03
93  1.03
94  1.03
95  1.03
96  1.02
97  1.03
98  1.03
99  1.03
   100  1.03

I also ran the following script on the system with the unstable clock,
measuring timer interrupts per CPU as visible in /proc/interrupts:

   CPU0   CPU1   
  0:67417075860969IO-APIC-edge  timer
  1: 45 10IO-APIC-edge  i8042
  2:  0  0  XT-PIC  cascade
  8:  0  1IO-APIC-edge  rtc
 14: 807745 907612IO-APIC-edge  ide0
 15: 834978 871118IO-APIC-edge  ide1
 17:   45336986   45939432   IO-APIC-level  SysKonnect SK-98xx
 18:  0  0   IO-APIC-level  libata
 21:  0  0   IO-APIC-level  ehci_hcd:usb1, uhci_hcd:usb2, 
uhci_hcd:usb3, uhci_hcd:usb4, uhci_hcd:usb5
 22:  0  0   IO-APIC-level  VIA8237
NMI:  0  0 
LOC:   12601494   12601519 
ERR:  0
MIS:  0

script:

#!/bin/sh

for i in `yes|head -100`
do
s1=`cat /proc/interrupts`
sleep 1
s2=`cat /proc/interrupts`

t10=`echo $s1 | awk '$1==0:{ print $2}'`
t11=`echo $s1 | awk '$1==0:{ print $3}'`
t20=`echo $s2 | awk '$1==0:{ print $2}'`
t21=`echo $s2 | awk '$1==0:{ print $3}'`
d1=`expr $t20 - $t10`
d2=`expr $t21 - $t11`
echo $d1 + $d2 = `expr $d1 + $d2`
done | cat -n

annotated output:

  CPU0 CPU1   Total
---
 1  0 + 251 = 251
 2  0 + 251 = 251
 3  0 + 251 = 251
 4  0 + 251 = 251
 5  0 + 251 = 251
 6  52 + 196 = 248  == (?)
 7  251 + 0 = 251
 8  251 + 0 = 251
 9  251 + 0 = 251
10  251 + 0 = 251
11  251 + 0 = 251
12  251 + 0 = 251
13  251 + 0 = 251
14  251 + 0 = 251
15  251 + 0 = 251
16  147 + 1 = 148   ==
17  0 + 252 = 252
18  0 + 251 = 251
19  0 + 251 = 251
20  0 + 251 = 251
21  0 + 251 = 251
22  0 + 252 = 252
23  0 + 251 = 251
24  72 + 177 = 249  == (?)
25  252 + 0 = 252
26  252 + 0 = 252
27  252 + 0 = 252
28  252 + 0 = 252
29  252 + 0 = 252
30  252 + 0 = 252
31  252 + 0 = 252
32  253 + 0 = 253
33  253 + 0 = 253
34  118 + 2 = 120   ==
35  0 + 253 = 253
36  0 + 253 = 253
37  0 + 253 = 253
38  0 + 253 = 253
39  0 + 252 = 252
40  0 + 252 = 252
41  0 + 252 = 252
42  78 + 171 = 249  == (?)
43  252 + 0 = 252
44  252 + 0 = 252
45  252 + 0 = 252
46  252 + 0 = 252
47  251 + 0 = 251
48  251 + 0 = 251
49  251 + 0 = 251
50  251 + 0 = 251
51  251 + 0 = 251
52  121 + 1 = 122   ==
53  0 + 251 = 251
54  0 + 251 = 251
55  0 + 251 = 251
56  0 + 251 = 251
57  0 + 251 = 251
58  0 + 251 = 251
59  0 + 251 = 251
60  69 + 179 = 248  == (?)
61  251 + 0 = 251
62  251 + 0 = 251
63  251 + 0 = 251
64  251 + 0 = 251
65  251 + 0 = 251
66  251 + 0 = 251
67  251 + 0 = 251
68  251 + 0 = 251
69  251 + 0 = 251
70  130 + 1 = 131   ==
71  0 + 252 = 252

Re: lost ticks and Hangcheck

2005-08-30 Thread Frank van Maarseveen

On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker wrote:
> Hi,
> 
> I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and 
> Gigabyte GA-K8NXP-SLI motherboard (bios version F8).  I'm having a problem 
> with lost clock ticks.  The dmesg says
> 
> warning: many lost ticks.
> Your time source seems to be instable or some driver is hogging interupts
> 
> Also if I enable hangcheck, then I get a huge number of Hangcheck messages 
> in dmesg.

I get a lot of "kernel: Hangcheck: hangcheck value past margin!" messages
from 2.6.13-rc7 on AMD64 X2 3800+ and Asus A8V deluxe motherboard. No lost
ticks messages however.

> 
> The main other symptom is that the system clock runs fast and 
> inaccurately.  It seems to run more inaccurately when I'm using the CPU, 
> and be basically OK when idling.

That seems to be the case here too: clock runs too fast under heavy load
(burn-in tests involving kernel builds and large disk copies).

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: lost ticks and Hangcheck

2005-08-30 Thread Frank van Maarseveen

On Fri, Aug 19, 2005 at 12:41:07AM -0700, Nathan Becker wrote:
 Hi,
 
 I'm running kernel 2.6.12.5 with x86_64 target on an AMD X2 4800+ and 
 Gigabyte GA-K8NXP-SLI motherboard (bios version F8).  I'm having a problem 
 with lost clock ticks.  The dmesg says
 
 warning: many lost ticks.
 Your time source seems to be instable or some driver is hogging interupts
 
 Also if I enable hangcheck, then I get a huge number of Hangcheck messages 
 in dmesg.

I get a lot of kernel: Hangcheck: hangcheck value past margin! messages
from 2.6.13-rc7 on AMD64 X2 3800+ and Asus A8V deluxe motherboard. No lost
ticks messages however.

 
 The main other symptom is that the system clock runs fast and 
 inaccurately.  It seems to run more inaccurately when I'm using the CPU, 
 and be basically OK when idling.

That seems to be the case here too: clock runs too fast under heavy load
(burn-in tests involving kernel builds and large disk copies).

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix VmSize and VmData after mremap

2005-08-05 Thread Frank van Maarseveen

On Thu, Aug 04, 2005 at 07:05:30PM +0100, Hugh Dickins wrote:
> mremap's move_vma is applying __vm_stat_account to the old vma which may
> have already been freed: move it to just before the do_munmap.
> 
> mremapping to and fro with CONFIG_DEBUG_SLAB=y showed /proc//status
> VmSize and VmData wrapping just like in kernel bugzilla #4842, and fixed
> by this patch - worth including in 2.6.13, though not yet confirmed that
> it fixes that specific report from Frank van Maarseveen.

The patch works, thanks.

> 
> Signed-off-by: Hugh Dickins <[EMAIL PROTECTED]>
> 
> --- 2.6.13-rc5-git2/mm/mremap.c   2005-06-17 20:48:29.0 +0100
> +++ linux/mm/mremap.c 2005-08-03 16:22:33.0 +0100
> @@ -229,6 +229,7 @@ static unsigned long move_vma(struct vm_
>* since do_munmap() will decrement it by old_len == new_len
>*/
>   mm->total_vm += new_len >> PAGE_SHIFT;
> + __vm_stat_account(mm, vma->vm_flags, vma->vm_file, new_len>>PAGE_SHIFT);
>  
>   if (do_munmap(mm, old_addr, old_len) < 0) {
>   /* OOM: unable to split vma, just get accounts right */
> @@ -243,7 +244,6 @@ static unsigned long move_vma(struct vm_
>   vma->vm_next->vm_flags |= VM_ACCOUNT;
>   }
>  
> - __vm_stat_account(mm, vma->vm_flags, vma->vm_file, new_len>>PAGE_SHIFT);
>   if (vm_flags & VM_LOCKED) {
>   mm->locked_vm += new_len >> PAGE_SHIFT;
>   if (new_len > old_len)

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] fix VmSize and VmData after mremap

2005-08-05 Thread Frank van Maarseveen

On Thu, Aug 04, 2005 at 07:05:30PM +0100, Hugh Dickins wrote:
 mremap's move_vma is applying __vm_stat_account to the old vma which may
 have already been freed: move it to just before the do_munmap.
 
 mremapping to and fro with CONFIG_DEBUG_SLAB=y showed /proc/pid/status
 VmSize and VmData wrapping just like in kernel bugzilla #4842, and fixed
 by this patch - worth including in 2.6.13, though not yet confirmed that
 it fixes that specific report from Frank van Maarseveen.

The patch works, thanks.

 
 Signed-off-by: Hugh Dickins [EMAIL PROTECTED]
 
 --- 2.6.13-rc5-git2/mm/mremap.c   2005-06-17 20:48:29.0 +0100
 +++ linux/mm/mremap.c 2005-08-03 16:22:33.0 +0100
 @@ -229,6 +229,7 @@ static unsigned long move_vma(struct vm_
* since do_munmap() will decrement it by old_len == new_len
*/
   mm-total_vm += new_len  PAGE_SHIFT;
 + __vm_stat_account(mm, vma-vm_flags, vma-vm_file, new_lenPAGE_SHIFT);
  
   if (do_munmap(mm, old_addr, old_len)  0) {
   /* OOM: unable to split vma, just get accounts right */
 @@ -243,7 +244,6 @@ static unsigned long move_vma(struct vm_
   vma-vm_next-vm_flags |= VM_ACCOUNT;
   }
  
 - __vm_stat_account(mm, vma-vm_flags, vma-vm_file, new_lenPAGE_SHIFT);
   if (vm_flags  VM_LOCKED) {
   mm-locked_vm += new_len  PAGE_SHIFT;
   if (new_len  old_len)

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-08-01 Thread Frank van Maarseveen

On Fri, Jul 29, 2005 at 05:03:19PM -0700, Andrew Morton wrote:
> 
> (The IDR problem is fixed in Linus's current tree)

yep, enabled PM and running rc4-git3, everything seems normal now.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-08-01 Thread Frank van Maarseveen

On Sat, Jul 30, 2005 at 08:27:24PM +0100, Hugh Dickins wrote:
> On Fri, 29 Jul 2005, Frank van Maarseveen wrote:
> > 2.6.13-rc4 does not recognize the second CPU of a 3GHz HT P4:
> 
> I think your problem is this: HT has depended on CONFIG_ACPI for
> some while, and now in 2.6.13-rc CONFIG_ACPI depends on CONFIG_PM.
> You don't have CONFIG_PM set in your .config (nor had I), so you
> don't get ACPI, so you don't get HT.

enabling PM fixed HT problem, thanks.

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-08-01 Thread Frank van Maarseveen

On Sat, Jul 30, 2005 at 08:27:24PM +0100, Hugh Dickins wrote:
 On Fri, 29 Jul 2005, Frank van Maarseveen wrote:
  2.6.13-rc4 does not recognize the second CPU of a 3GHz HT P4:
 
 I think your problem is this: HT has depended on CONFIG_ACPI for
 some while, and now in 2.6.13-rc CONFIG_ACPI depends on CONFIG_PM.
 You don't have CONFIG_PM set in your .config (nor had I), so you
 don't get ACPI, so you don't get HT.

enabling PM fixed HT problem, thanks.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-08-01 Thread Frank van Maarseveen

On Fri, Jul 29, 2005 at 05:03:19PM -0700, Andrew Morton wrote:
 
 (The IDR problem is fixed in Linus's current tree)

yep, enabled PM and running rc4-git3, everything seems normal now.

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces (3)

2005-07-29 Thread Frank van Maarseveen

This is a /var/log/messages snippet from 2.6.12 where HT was working:

Jul 29 18:57:05 kotka syslogd 1.4.1#17: restart.
klogd 1.4.1#17, log source = /proc/kmsg started.
Inspecting /boot/System.map-2.6.12.2-y115
Loaded 38335 symbols from /boot/System.map-2.6.12.2-y115.
Symbols match kernel version 2.6.12.
No module symbols loaded - kernel modules not enabled. 
Linux version 2.6.12.2-y115 ([EMAIL PROTECTED]) (gcc version 3.2.2 20030222 
(Red Hat Linux 3.2.2-5)) #1 SMP Fri Jul 29 18:41:48 CEST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3fe86c00 (usable)
 BIOS-e820: 3fe86c00 - 3fe88c00 (ACPI NVS)
 BIOS-e820: 3fe88c00 - 3fe8ac00 (ACPI data)
 BIOS-e820: 3fe8ac00 - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fed00400 (reserved)
 BIOS-e820: fed2 - feda (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
DMI 2.3 present.
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:4 APIC version 20
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] disabled)
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x07] disabled)
ACPI: LAPIC_NMI (acpi_id[0xff] high level lint[0x1])
ACPI: IOAPIC (id[0x08] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 8, version 32, address 0xfec0, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Enabling APIC mode:  Flat.  Using 1 I/O APICs
ACPI: HPET id: 0x8086a201 base: 0xfed0
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 4000:a000)
Built 1 zonelists
Kernel command line: auto BOOT_IMAGE=2.6.12.2-y115 ro root=802 nomodules
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030164k/1047064k available (3482k kernel code, 16124k reserved, 1767k 
data, 264k init, 129560k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
hpet0: at MMIO 0xfed0, IRQs 2, 8, 0
hpet0: 0ns tick, 3 64-bit timers
Using HPET for base-timer
Using HPET for gettimeofday
Detected 2992.825 MHz processor.
Using hpet for high-res timesource
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Booting processor 1/1 eip 2000
Initializing CPU#1
monitor/mwait feature present.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Total of 2 processors activated (11911.16 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb258, last bus=4
PCI: Using MMCONFIG
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs *3 4 5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 11 12 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 *9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 *10 11 12 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 11 devices
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore:

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces (2)

2005-07-29 Thread Frank van Maarseveen

In addition, /proc/bus/usb got mounted but /proc/bus seems have changed into a 
file:

$ df
df: `/proc/bus/usb': Not a directory
   ...
$ grep usb /proc/mounts
usbfs /proc/bus/usb usbfs rw 0 0
$ ls -l /proc/bus
-r--r--r--  1 root root 0 Jul 29 17:54 /proc/bus
$ cat /proc/bus
Inter-|   Receive|  Transmit
 face |bytespackets errs drop fifo frame compressed multicast|bytes
packets errs drop fifo colls carrier compressed
  eth0: 2440261   10169000 0  097  1287588
4726000 0   0  0
lo:   34526 173000 0  0 034526 
173000 0   0  0
dummy0:   0   0000 0  0 00  
 0000 0   0  0
 tunl0:   0   0000 0  0 00  
 0000 0   0  0
  gre0:   0   0000 0  0 00  
 0000 0   0  0

-- 
Frank
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-07-29 Thread Frank van Maarseveen

2.6.13-rc4 does not recognize the second CPU of a 3GHz HT P4:

/proc/cpuinfo:
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping: 1
cpu MHz : 2993.277
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 3
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pni monitor ds_cpl 
cid xtpr
bogomips: 5995.71
(no second CPU, but "ht" flag is present)

/var/log/messages:
Jul 29 17:54:56 kotka syslogd 1.4.1#17: restart.
klogd 1.4.1#17, log source = /proc/kmsg started.
Inspecting /boot/System.map-2.6.13-rc4-y117
Loaded 37375 symbols from /boot/System.map-2.6.13-rc4-y117.
Symbols match kernel version 2.6.13.
No module symbols loaded - kernel modules not enabled. 
Linux version 2.6.13-rc4-y117 ([EMAIL PROTECTED]) (gcc version 3.2.2 20030222 
(Red Hat Linux 3.2.2-5)) #1 SMP Fri Jul 29 17:45:54 CEST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3fe86c00 (usable)
 BIOS-e820: 3fe86c00 - 3fe88c00 (ACPI NVS)
 BIOS-e820: 3fe88c00 - 3fe8ac00 (ACPI data)
 BIOS-e820: 3fe8ac00 - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fed00400 (reserved)
 BIOS-e820: fed2 - feda (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: DELL Product ID: Opti GX280   APIC at: 0xFEE0
Processor #0 15:4 APIC version 20
I/O APIC #8 Version 32 at 0xFEC0.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 4000 (gap: 4000:a000)
Built 1 zonelists
Kernel command line: auto BOOT_IMAGE=2.6.13-rc4-y117 ro root=802 nomodules
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2993.277 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030340k/1047064k available (3395k kernel code, 15952k reserved, 1710k 
data, 240k init, 129560k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5995.71 BogoMIPS (lpj=11991422)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Total of 1 processors activated (5995.71 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=0
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb258, last bus=4
PCI: Using configuration type 1
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/2640] at :00:1f.0
PCI->APIC IRQ transform: :00:1d.0[A] -> IRQ 21
PCI->APIC IRQ transform: :00:1d.1[B] -> IRQ 22
PCI->APIC IRQ transform: :00:1d.2[C] -> IRQ 18
PCI->APIC IRQ transform: :00:1d.3[D] -> IRQ 23
PCI->APIC IRQ transform: :00:1d.7[A] -> IRQ 21
PCI->APIC IRQ transform: :00:1e.2[A] -> IRQ 23
PCI->APIC IRQ transform: :00:1f.1[A] -> IRQ 16
PCI->APIC IRQ transform: :00:1f.2[C] -> IRQ 20
PCI->APIC IRQ transform: :00:1f.3[B] -> IRQ 17
PCI->APIC IRQ transform: :01:00.0[A] -> IRQ 16
PCI->APIC IRQ transform: :02:00.0[A] -> IRQ 16
PCI: Bridge: :00:01.0
  IO window: d000-dfff
  MEM window: dfd0-dfef
  PREFETCH window: d000-d7ff
PCI: Bridge: :00:1c.0
  IO window: disabled.
  MEM window: dfc0-dfcf
  PREFETCH window: disabled.
PCI: Bridge: :00:1c.1
  IO window: disabled.
  MEM

2.6.13-rc4: no hyperthreading and idr_remove() stack traces

2005-07-29 Thread Frank van Maarseveen

2.6.13-rc4 does not recognize the second CPU of a 3GHz HT P4:

/proc/cpuinfo:
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 4
model name  : Intel(R) Pentium(R) 4 CPU 3.00GHz
stepping: 1
cpu MHz : 2993.277
cache size  : 1024 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 1
fdiv_bug: no
hlt_bug : no
f00f_bug: no
coma_bug: no
fpu : yes
fpu_exception   : yes
cpuid level : 3
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx pni monitor ds_cpl 
cid xtpr
bogomips: 5995.71
(no second CPU, but ht flag is present)

/var/log/messages:
Jul 29 17:54:56 kotka syslogd 1.4.1#17: restart.
klogd 1.4.1#17, log source = /proc/kmsg started.
Inspecting /boot/System.map-2.6.13-rc4-y117
Loaded 37375 symbols from /boot/System.map-2.6.13-rc4-y117.
Symbols match kernel version 2.6.13.
No module symbols loaded - kernel modules not enabled. 
Linux version 2.6.13-rc4-y117 ([EMAIL PROTECTED]) (gcc version 3.2.2 20030222 
(Red Hat Linux 3.2.2-5)) #1 SMP Fri Jul 29 17:45:54 CEST 2005
BIOS-provided physical RAM map:
 BIOS-e820:  - 000a (usable)
 BIOS-e820: 000f - 0010 (reserved)
 BIOS-e820: 0010 - 3fe86c00 (usable)
 BIOS-e820: 3fe86c00 - 3fe88c00 (ACPI NVS)
 BIOS-e820: 3fe88c00 - 3fe8ac00 (ACPI data)
 BIOS-e820: 3fe8ac00 - 4000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - fed00400 (reserved)
 BIOS-e820: fed2 - feda (reserved)
 BIOS-e820: fee0 - fef0 (reserved)
 BIOS-e820: ffb0 - 0001 (reserved)
126MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000fe710
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: DELL Product ID: Opti GX280   APIC at: 0xFEE0
Processor #0 15:4 APIC version 20
I/O APIC #8 Version 32 at 0xFEC0.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 4000 (gap: 4000:a000)
Built 1 zonelists
Kernel command line: auto BOOT_IMAGE=2.6.13-rc4-y117 ro root=802 nomodules
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 2993.277 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1030340k/1047064k available (3395k kernel code, 15952k reserved, 1710k 
data, 240k init, 129560k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 5995.71 BogoMIPS (lpj=11991422)
Mount-cache hash table entries: 512
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 01
Total of 1 processors activated (5995.71 BogoMIPS).
ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=0
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb258, last bus=4
PCI: Using configuration type 1
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller :00:1f.1
PCI: Transparent bridge - :00:1e.0
PCI: Using IRQ router PIIX/ICH [8086/2640] at :00:1f.0
PCI-APIC IRQ transform: :00:1d.0[A] - IRQ 21
PCI-APIC IRQ transform: :00:1d.1[B] - IRQ 22
PCI-APIC IRQ transform: :00:1d.2[C] - IRQ 18
PCI-APIC IRQ transform: :00:1d.3[D] - IRQ 23
PCI-APIC IRQ transform: :00:1d.7[A] - IRQ 21
PCI-APIC IRQ transform: :00:1e.2[A] - IRQ 23
PCI-APIC IRQ transform: :00:1f.1[A] - IRQ 16
PCI-APIC IRQ transform: :00:1f.2[C] - IRQ 20
PCI-APIC IRQ transform: :00:1f.3[B] - IRQ 17
PCI-APIC IRQ transform: :01:00.0[A] - IRQ 16
PCI-APIC IRQ transform: :02:00.0[A] - IRQ 16
PCI: Bridge: :00:01.0
  IO window: d000-dfff
  MEM window: dfd0-dfef
  PREFETCH window: d000-d7ff
PCI: Bridge: :00:1c.0
  IO window: disabled.
  MEM window: dfc0-dfcf
  PREFETCH window: disabled.
PCI: Bridge: :00:1c.1
  IO window: disabled.
  MEM window: dfb0-dfbf

Re: 2.6.13-rc4: no hyperthreading and idr_remove() stack traces (2)

2005-07-29 Thread Frank van Maarseveen

In addition, /proc/bus/usb got mounted but /proc/bus seems have changed into a 
file:

$ df
df: `/proc/bus/usb': Not a directory
   ...
$ grep usb /proc/mounts
usbfs /proc/bus/usb usbfs rw 0 0
$ ls -l /proc/bus
-r--r--r--  1 root root 0 Jul 29 17:54 /proc/bus
$ cat /proc/bus
Inter-|   Receive|  Transmit
 face |bytespackets errs drop fifo frame compressed multicast|bytes
packets errs drop fifo colls carrier compressed
  eth0: 2440261   10169000 0  097  1287588
4726000 0   0  0
lo:   34526 173000 0  0 034526 
173000 0   0  0
dummy0:   0   0000 0  0 00  
 0000 0   0  0
 tunl0:   0   0000 0  0 00  
 0000 0   0  0
  gre0:   0   0000 0  0 00  
 0000 0   0  0

-- 
Frank
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 >

1 - 100 of 190 matches

Mail list logo