Re: SATA ahci Bug in 2.6.19.x

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

acpi=off does not help i've already tried that.


Ok here some outputs:
1.) complete dmesg with 2.6.16.27 (works)

Linux version 2.6.16.27amd ([EMAIL PROTECTED]) (gcc version 3.3.5 
(Debian 1:3.3.5-13)) #6 SMP Sat Aug 26 14:29:07 CEST 2006

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 3bfb (usable)
 BIOS-e820: 3bfb - 3bfbe000 (ACPI data)
 BIOS-e820: 3bfbe000 - 3bfe (ACPI NVS)
 BIOS-e820: 3bfe - 3c00 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fecc - fecc1000 (reserved)
 BIOS-e820: ff7c - 0001 (reserved)
ACPI: RSDP (v002 ACPIAM) @ 
0x000fa850
ACPI: XSDT (v001 A M I  OEMXSDT  0x12000527 MSFT 0x0097) @ 
0x3bfb0100
ACPI: FADT (v003 A M I  OEMFACP  0x12000527 MSFT 0x0097) @ 
0x3bfb0290
ACPI: MADT (v001 A M I  OEMAPIC  0x12000527 MSFT 0x0097) @ 
0x3bfb0390
ACPI: MCFG (v001 A M I  OEMMCFG  0x12000527 MSFT 0x0097) @ 
0x3bfb0400
ACPI: OEMB (v001 A M I  AMI_OEM  0x12000527 MSFT 0x0097) @ 
0x3bfbe040
ACPI: DSDT (v001  A0339 A0339000 0x INTL 0x02002026) @ 
0x

Scanning NUMA topology in Northbridge 24
Number of nodes 1
Node 0 MemBase  Limit 3bfb
NUMA: Using 63 for the hash shift.
Using node hash shift of 63
Bootmem setup node 0 -3bfb
On node 0 totalpages: 240991
  DMA zone: 2709 pages, LIFO batch:0
  DMA32 zone: 238282 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 3, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x02] address[0xfecc] gsi_base[24])
IOAPIC[1]: apic_id 2, version 3, address 0xfecc, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 3c00:c2c0)
Checking aperture...
CPU 0: aperture @ f000 size 128 MB
Built 1 zonelists
Kernel command line: root=/dev/sda6 ro rootflags=quota Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.214 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 962212k/982720k available (2939k kernel code, 20120k reserved, 
1327k data, 220k init)
Calibrating delay using timer specific routine.. 4810.51 BogoMIPS 
(lpj=9621030)

Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
Using local APIC timer interrupts.
result 12501128
Detected 12.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
DMI 2.3 present.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
PCI: Using MMCONFIG at e000
ACPI: Subsystem revision 20060127
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is :01:00.0
PCI: Transparent bridge - :00:13.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBPG._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBP0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0PA._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.

ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c

Re: XFS or Kernel Problem / Bug

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

 > OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

 > And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

  OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

  And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...






-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-30 Thread Stefan Priebe - FH

Hi!

Any News?

Stefan

Stefan Priebe - FH schrieb:

Hi!

acpi=off does not help i've already tried that.


Ok here some outputs:
1.) complete dmesg with 2.6.16.27 (works)

Linux version 2.6.16.27amd ([EMAIL PROTECTED]) (gcc version 3.3.5 
(Debian 1:3.3.5-13)) #6 SMP Sat Aug 26 14:29:07 CEST 2006

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 3bfb (usable)
 BIOS-e820: 3bfb - 3bfbe000 (ACPI data)
 BIOS-e820: 3bfbe000 - 3bfe (ACPI NVS)
 BIOS-e820: 3bfe - 3c00 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fecc - fecc1000 (reserved)
 BIOS-e820: ff7c - 0001 (reserved)
ACPI: RSDP (v002 ACPIAM) @ 
0x000fa850
ACPI: XSDT (v001 A M I  OEMXSDT  0x12000527 MSFT 0x0097) @ 
0x3bfb0100
ACPI: FADT (v003 A M I  OEMFACP  0x12000527 MSFT 0x0097) @ 
0x3bfb0290
ACPI: MADT (v001 A M I  OEMAPIC  0x12000527 MSFT 0x0097) @ 
0x3bfb0390
ACPI: MCFG (v001 A M I  OEMMCFG  0x12000527 MSFT 0x0097) @ 
0x3bfb0400
ACPI: OEMB (v001 A M I  AMI_OEM  0x12000527 MSFT 0x0097) @ 
0x3bfbe040
ACPI: DSDT (v001  A0339 A0339000 0x INTL 0x02002026) @ 
0x

Scanning NUMA topology in Northbridge 24
Number of nodes 1
Node 0 MemBase  Limit 3bfb
NUMA: Using 63 for the hash shift.
Using node hash shift of 63
Bootmem setup node 0 -3bfb
On node 0 totalpages: 240991
  DMA zone: 2709 pages, LIFO batch:0
  DMA32 zone: 238282 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 3, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x02] address[0xfecc] gsi_base[24])
IOAPIC[1]: apic_id 2, version 3, address 0xfecc, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 3c00:c2c0)
Checking aperture...
CPU 0: aperture @ f000 size 128 MB
Built 1 zonelists
Kernel command line: root=/dev/sda6 ro rootflags=quota Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.214 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 962212k/982720k available (2939k kernel code, 20120k reserved, 
1327k data, 220k init)
Calibrating delay using timer specific routine.. 4810.51 BogoMIPS 
(lpj=9621030)

Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0(1) - Node 0 - Core 0
Using local APIC timer interrupts.
result 12501128
Detected 12.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
DMI 2.3 present.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
PCI: Using MMCONFIG at e000
ACPI: Subsystem revision 20060127
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is :01:00.0
PCI: Transparent bridge - :00:13.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBPG._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBP0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0PA._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.

ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam

Re: SATA ahci Bug in 2.6.19.x

2007-01-26 Thread Stefan Priebe - FH

Hi!

acpi=off does not help i've already tried that.


Ok here some outputs:
1.) complete dmesg with 2.6.16.27 (works)

Linux version 2.6.16.27amd ([EMAIL PROTECTED]) (gcc version 3.3.5 
(Debian 1:3.3.5-13)) #6 SMP Sat Aug 26 14:29:07 CEST 2006

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 3bfb (usable)
 BIOS-e820: 3bfb - 3bfbe000 (ACPI data)
 BIOS-e820: 3bfbe000 - 3bfe (ACPI NVS)
 BIOS-e820: 3bfe - 3c00 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fecc - fecc1000 (reserved)
 BIOS-e820: ff7c - 0001 (reserved)
ACPI: RSDP (v002 ACPIAM) @ 
0x000fa850
ACPI: XSDT (v001 A M I  OEMXSDT  0x12000527 MSFT 0x0097) @ 
0x3bfb0100
ACPI: FADT (v003 A M I  OEMFACP  0x12000527 MSFT 0x0097) @ 
0x3bfb0290
ACPI: MADT (v001 A M I  OEMAPIC  0x12000527 MSFT 0x0097) @ 
0x3bfb0390
ACPI: MCFG (v001 A M I  OEMMCFG  0x12000527 MSFT 0x0097) @ 
0x3bfb0400
ACPI: OEMB (v001 A M I  AMI_OEM  0x12000527 MSFT 0x0097) @ 
0x3bfbe040
ACPI: DSDT (v001  A0339 A0339000 0x INTL 0x02002026) @ 
0x

Scanning NUMA topology in Northbridge 24
Number of nodes 1
Node 0 MemBase  Limit 3bfb
NUMA: Using 63 for the hash shift.
Using node hash shift of 63
Bootmem setup node 0 -3bfb
On node 0 totalpages: 240991
  DMA zone: 2709 pages, LIFO batch:0
  DMA32 zone: 238282 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 3, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x02] address[0xfecc] gsi_base[24])
IOAPIC[1]: apic_id 2, version 3, address 0xfecc, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 3c00:c2c0)
Checking aperture...
CPU 0: aperture @ f000 size 128 MB
Built 1 zonelists
Kernel command line: root=/dev/sda6 ro rootflags=quota Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.214 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 962212k/982720k available (2939k kernel code, 20120k reserved, 
1327k data, 220k init)
Calibrating delay using timer specific routine.. 4810.51 BogoMIPS 
(lpj=9621030)

Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0(1) -> Node 0 -> Core 0
Using local APIC timer interrupts.
result 12501128
Detected 12.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
DMI 2.3 present.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
PCI: Using MMCONFIG at e000
ACPI: Subsystem revision 20060127
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is :01:00.0
PCI: Transparent bridge - :00:13.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBPG._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBP0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0PA._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.

ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 

Re: SATA ahci Bug in 2.6.19.x

2007-01-26 Thread Stefan Priebe - FH

Hi!

acpi=off does not help i've already tried that.


Ok here some outputs:
1.) complete dmesg with 2.6.16.27 (works)

Linux version 2.6.16.27amd ([EMAIL PROTECTED]) (gcc version 3.3.5 
(Debian 1:3.3.5-13)) #6 SMP Sat Aug 26 14:29:07 CEST 2006

BIOS-provided physical RAM map:
 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e4000 - 0010 (reserved)
 BIOS-e820: 0010 - 3bfb (usable)
 BIOS-e820: 3bfb - 3bfbe000 (ACPI data)
 BIOS-e820: 3bfbe000 - 3bfe (ACPI NVS)
 BIOS-e820: 3bfe - 3c00 (reserved)
 BIOS-e820: fec0 - fec01000 (reserved)
 BIOS-e820: fecc - fecc1000 (reserved)
 BIOS-e820: ff7c - 0001 (reserved)
ACPI: RSDP (v002 ACPIAM) @ 
0x000fa850
ACPI: XSDT (v001 A M I  OEMXSDT  0x12000527 MSFT 0x0097) @ 
0x3bfb0100
ACPI: FADT (v003 A M I  OEMFACP  0x12000527 MSFT 0x0097) @ 
0x3bfb0290
ACPI: MADT (v001 A M I  OEMAPIC  0x12000527 MSFT 0x0097) @ 
0x3bfb0390
ACPI: MCFG (v001 A M I  OEMMCFG  0x12000527 MSFT 0x0097) @ 
0x3bfb0400
ACPI: OEMB (v001 A M I  AMI_OEM  0x12000527 MSFT 0x0097) @ 
0x3bfbe040
ACPI: DSDT (v001  A0339 A0339000 0x INTL 0x02002026) @ 
0x

Scanning NUMA topology in Northbridge 24
Number of nodes 1
Node 0 MemBase  Limit 3bfb
NUMA: Using 63 for the hash shift.
Using node hash shift of 63
Bootmem setup node 0 -3bfb
On node 0 totalpages: 240991
  DMA zone: 2709 pages, LIFO batch:0
  DMA32 zone: 238282 pages, LIFO batch:31
  Normal zone: 0 pages, LIFO batch:0
  HighMem zone: 0 pages, LIFO batch:0
ACPI: PM-Timer IO Port: 0x808
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:15 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x81] disabled)
ACPI: IOAPIC (id[0x01] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 1, version 3, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x02] address[0xfecc] gsi_base[24])
IOAPIC[1]: apic_id 2, version 3, address 0xfecc, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 4000 (gap: 3c00:c2c0)
Checking aperture...
CPU 0: aperture @ f000 size 128 MB
Built 1 zonelists
Kernel command line: root=/dev/sda6 ro rootflags=quota Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz WALL PM GTOD PIT/TSC timer.
time.c: Detected 2400.214 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 65536 (order: 7, 524288 bytes)
Memory: 962212k/982720k available (2939k kernel code, 20120k reserved, 
1327k data, 220k init)
Calibrating delay using timer specific routine.. 4810.51 BogoMIPS 
(lpj=9621030)

Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU 0(1) - Node 0 - Core 0
Using local APIC timer interrupts.
result 12501128
Detected 12.501 MHz APIC timer.
Brought up 1 CPUs
testing NMI watchdog ... OK.
migration_cost=0
DMI 2.3 present.
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: Using configuration type 1
PCI: Using MMCONFIG at e000
ACPI: Subsystem revision 20060127
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (:00)
PCI: Probing PCI hardware (bus 00)
Boot video device is :01:00.0
PCI: Transparent bridge - :00:13.1
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBPG._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.NBP0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P7._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0PA._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 *5 6 7 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 11 12 14 15) *0, 
disabled.

ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *10 11 12 14 15)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 12 devices

Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Stefan Priebe - FH

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

> OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

> And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Stefan Priebe - FH

Hello

Nobody here who cares???

Stefan

Stephen Evanchik schrieb:

On 1/22/07, Stefan Priebe - FH <[EMAIL PROTECTED]> wrote:


I've an Asus A8V Mainboard which works wonderful with a 2.6.18.X kernel.
But i cannot use the SATA Controller with a 2.6.19.x Kernel.



I also have an Asus A8V motherboard that cannot boot a newer kernel
because the SATA controller does not come up properly. I have tried
kernels 2.6.19.2 and 2.6.20-rc5 with no luck. It looks like later
kernels don't recognize the proper IRQ of the device as compared to
the 2.6.18 boot logs.


"ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 21"
"ahci :00:0f.0: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE
mode"
"ahci :00:0f.0: flags: 64bit ncq pm led clo pmp pio slum part "
"ata1: SATA max UDMA/133 cmd 0xC2004D00 ctl 0x0 bmdma 0x0 irq 
1277"
"ata2: SATA max UDMA/133 cmd 0xC2004D80 ctl 0x0 bmdma 0x0 irq 
1277"
"ata3: SATA max UDMA/133 cmd 0xC2004E00 ctl 0x0 bmdma 0x0 irq 
1277"
"ata4: SATA max UDMA/133 cmd 0xC2004E80 ctl 0x0 bmdma 0x0 irq 
1277"



Similar output as above.


Does any one have any ideas?


Stephen


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: SATA ahci Bug in 2.6.19.x

2007-01-25 Thread Stefan Priebe - FH

Hello

Nobody here who cares???

Stefan

Stephen Evanchik schrieb:

On 1/22/07, Stefan Priebe - FH [EMAIL PROTECTED] wrote:


I've an Asus A8V Mainboard which works wonderful with a 2.6.18.X kernel.
But i cannot use the SATA Controller with a 2.6.19.x Kernel.



I also have an Asus A8V motherboard that cannot boot a newer kernel
because the SATA controller does not come up properly. I have tried
kernels 2.6.19.2 and 2.6.20-rc5 with no luck. It looks like later
kernels don't recognize the proper IRQ of the device as compared to
the 2.6.18 boot logs.


ACPI: PCI Interrupt :00:0f.0[B] - GSI 21 (level, low) - IRQ 21
ahci :00:0f.0: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE
mode
ahci :00:0f.0: flags: 64bit ncq pm led clo pmp pio slum part 
ata1: SATA max UDMA/133 cmd 0xC2004D00 ctl 0x0 bmdma 0x0 irq 
1277
ata2: SATA max UDMA/133 cmd 0xC2004D80 ctl 0x0 bmdma 0x0 irq 
1277
ata3: SATA max UDMA/133 cmd 0xC2004E00 ctl 0x0 bmdma 0x0 irq 
1277
ata4: SATA max UDMA/133 cmd 0xC2004E80 ctl 0x0 bmdma 0x0 irq 
1277



Similar output as above.


Does any one have any ideas?


Stephen


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-25 Thread Stefan Priebe - FH

Hi!

OK - i rechecked everything. We've 22 Servers with the DFI PM-12 
Mainboard with VIA Chipset.


But only the 5 oldest of them (before 2004 / 01 / 20) (we've buyed all 
in a range of 10 month) have this problem.


So i think it is a mixture of software and hardware problem. Perhaps DFI 
changed something on the mainboard (e.g. new revision) or there was a 
new BIOS Version on it.


But there must also changed something in the kernel.

 OK, can you post configs for one that works and one that doesn't?
You mean Kernel .configs?

 And which C compiler(s) do you use? The same for all, I hope...
On all 32bit Machines:

gcc -v
Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.5/specs
Configured with: ../src/configure -v 
--enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared 
--enable-__cxa_atexit --with-system-zlib --enable-nls 
--without-included-gettext --enable-clocale=gnu --enable-debug 
--enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux

Thread model: posix
gcc version 3.3.5 (Debian 1:3.3.5-13)

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


What is different about these servers?


All 300 machines are mostly different. We have Dual Opteron, single P4
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many
more... different mainboards etc.

The only thing i found out is, that all these servers (where the
problem exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


Any others with VIA chipsets?


Are you building different kernels for them, or is it just different
drivers loaded?


No every machine builds it's own kernel.



OK, can you post configs for one that works and one that doesn't?

And which C compiler(s) do you use? The same for all, I hope...



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi Chuck,
   hi Eric,

cause you both asked me nearly the same i will answer you both in one mail.


> What is different about these servers?
All 300 machines are mostly different. We have Dual Opteron, single P4 
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many more... 
different mainboards etc.


The only thing i found out is, that all these servers (where the problem 
exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


> Are you building different kernels for them, or is it just different
> drivers loaded?
No every machine builds it's own kernel.

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with
the same mainboard. I cannot believe, that all 5 servers have a
hardware problem that starts on the same day.

The other thing is - that they all work fine with 2.6.16.x and all
other kernels before. I mean some of them were used with 2.6.x since
two years without any problem...



OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with 
the same mainboard. I cannot believe, that all 5 servers have a hardware 
problem that starts on the same day.


The other thing is - that they all work fine with 2.6.16.x and all other 
kernels before. I mean some of them were used with 2.6.x since two years 
without any problem...


Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine -
perhaps this helps.

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"
"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b>
85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"

Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o



You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   <=

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine - perhaps 
this helps.


"BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0288"

" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0 
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc 
 "
"    01ec dd04beac 00d420b1   dd04bd80 
45b1fa67 "

"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"
"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f 
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85 
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "
"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP 
0068:dd04bd18"


Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

It could be, that the options are now different - cause i my first try
was to change the kernel options - if that did not help i switched
back to 2.6.16.37.

Any idea what i can do?

Chuck Ebbert schrieb:

That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?



If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

It could be, that the options are now different - cause i my first try 
was to change the kernel options - if that did not help i switched back 
to 2.6.16.37.


Any idea what i can do?

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s


If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.


That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

It could be, that the options are now different - cause i my first try 
was to change the kernel options - if that did not help i switched back 
to 2.6.16.37.


Any idea what i can do?

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s


If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.


That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine - perhaps 
this helps.


BUG: unable to handle kernel NULL pointer dereference at virtual 
address 0288

 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0 
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc 
 
    01ec dd04beac 00d420b1   dd04bd80 
45b1fa67 

Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb
Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f 
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85 
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 
EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP 
0068:dd04bd18


Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:

It could be, that the options are now different - cause i my first try
was to change the kernel options - if that did not help i switched
back to 2.6.16.37.

Any idea what i can do?

Chuck Ebbert schrieb:

That doesn't match your oops at all.  Did you use a different compiler
and/or
different kernel build options?



If you don't know what changed you can try different options until the
filemap.s
is the same.  You should see

movl   156(%ebp),%eax
testb   16, 48(%eax)


in generic_file_buffered_write.  And you need to regenerate filemap.s
manually
each time.


(Did you test the kernel that you posted these pieces from? If you can
get it to oops
the same way, just post that instead.)



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with 
the same mainboard. I cannot believe, that all 5 servers have a hardware 
problem that starts on the same day.


The other thing is - that they all work fine with 2.6.16.x and all other 
kernels before. I mean some of them were used with 2.6.x since two years 
without any problem...


Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Sorry that is not possible - cause it is a production machine.

But i've catched the error and the files from another machine -
perhaps this helps.

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb
Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db
0f 88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b
85 9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 
EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18

Files:
http://server113-han.de-nserver.de/filemap.s
http://server113-han.de-nserver.de/filemap.o



You seem to have some kind of hardware/memory problem.

Disassembly of the failing instruction from the oops:

 8b 7c 24 28   mov0x28(%esp),%edi
 8b 85 9c 00 00 00 mov0x9c(%ebp),%eax   =

Dump of the object code:

   8b 7c 24 28 mov0x28(%esp),%edi
   8b 87 9c 00 00 00   mov0x9c(%edi),%eax

Looks like a bit is flipped.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-24 Thread Stefan Priebe - FH

Hi Chuck,
   hi Eric,

cause you both asked me nearly the same i will answer you both in one mail.


 What is different about these servers?
All 300 machines are mostly different. We have Dual Opteron, single P4 
with HT, single P4 without HT, Dual Xeon, Athlon 64 X2, and many more... 
different mainboards etc.


The only thing i found out is, that all these servers (where the problem 
exist) are using a DFI PM-12 Mainboard with a VIA Chipset.


 Are you building different kernels for them, or is it just different
 drivers loaded?
No every machine builds it's own kernel.

Stefan

Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


Hi!

Mhm are you shure? I mean i have this problem on 5 servers - all with
the same mainboard. I cannot believe, that all 5 servers have a
hardware problem that starts on the same day.

The other thing is - that they all work fine with 2.6.16.x and all
other kernels before. I mean some of them were used with 2.6.x since
two years without any problem...



OK it's probably not hardware, but a bit is flipped somehow.

What is different about these servers?

Are you building different kernels for them, or is it just different
drivers loaded?




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

"general protection fault:  [#1]"
"Modules linked in:"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
"eax:    ebx: fffe0007   ecx: 0071a4cd   edx: "
"esi:    edi:    ebp: 0015   esp: ce35f8f0"
"ds:    es: 007b   ss: 0068"
"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
"Stack: 0232  0233    000c
 "
"   0007  eca90250 eca90278 0001 eca90200 
03c3 "
"    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 "
"Call Trace:"
" [] xfs_trans_dqresv+0x3f9/0x405"
" [] xfs_bmap_add_extent+0x163/0x377"
" [] xfs_bmapi+0xa4e/0x1109"
" [] xfs_iomap_write_delay+0x233/0x2fa"
" [] xfs_imap_to_bmap+0x29/0x1d6"
" [] xfs_iomap+0x23c/0x3e1"
" [] xfs_iomap+0x2e0/0x3e1"
" [] xfs_bmap+0x1a/0x1e"
" [] __xfs_get_blocks+0x5d/0x195"


Without the "Code:" line it's hard to tell what happened...



and sometimes this one:

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"

"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "

"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"



Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status >= 0)) {
if (unlikely((file->f_flags & O_SYNC) ||
IS_SYNC(inode))) { <===
if (!a_ops->writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode->i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I can give you an idea of the workload :-) I have the same problem on an 
nearly idle Server. There runs only a few cronjobs (normal Debian System 
crons).


The load was not higher than 0.01 on this system the last 3 days and 
this morning it crashes with the same error.


I've not tested 2.6.19.x cause this one has some problems with SATA AHCI 
driver which we need. But i can manuelly update only this system with 
2.6.19.x and wait some days.


There were no other messages in the log.

Cheers,
   Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I do everything you like :-) if we can find the bug.

So here are the files (2.6.18.6):
http://server055.de-nserver.de/filemap.o
http://server055.de-nserver.de/filemap.s

Stefan


Chuck Ebbert schrieb:

Stefan Priebe - FH wrote:


I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

general protection fault:  [#1]
Modules linked in:
CPU:0
EIP:0060:[c01c8fd2]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b
eax:    ebx: fffe0007   ecx: 0071a4cd   edx: 
esi:    edi:    ebp: 0015   esp: ce35f8f0
ds:    es: 007b   ss: 0068
Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)
Stack: 0232  0233    000c
 
   0007  eca90250 eca90278 0001 eca90200 
03c3 
    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 
Call Trace:
 [c01b6c58] xfs_trans_dqresv+0x3f9/0x405
 [c01c6485] xfs_bmap_add_extent+0x163/0x377
 [c01cd2c3] xfs_bmapi+0xa4e/0x1109
 [c01ebbe3] xfs_iomap_write_delay+0x233/0x2fa
 [c01eaa31] xfs_imap_to_bmap+0x29/0x1d6
 [c01eae1a] xfs_iomap+0x23c/0x3e1
 [c01eaebe] xfs_iomap+0x2e0/0x3e1
 [c020a71a] xfs_bmap+0x1a/0x1e
 [c020471e] __xfs_get_blocks+0x5d/0x195


Without the Code: line it's hard to tell what happened...



and sometimes this one:

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb

Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 

EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18



Well that's strange. It's here in mm/filemap.c line 2201:

/*
 * For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC
 */
if (likely(status = 0)) {
if (unlikely((file-f_flags  O_SYNC) ||
IS_SYNC(inode))) { ===
if (!a_ops-writepage || !is_sync_kiocb(iocb))
status = generic_osync_inode(inode, mapping,
OSYNC_METADATA|OSYNC_DATA);
}
}

ebp holds the value of 'inode' and it's obviously wrong (it's also the same
as 'written', which is in ebx.) So when it tries to read inode-i_sb, it
dies.

If you can, post the file mm/filemap.o from your build directory to some
website.
And do 'make mm/filemap.s' and post that file too.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-23 Thread Stefan Priebe - FH

Hi!

I can give you an idea of the workload :-) I have the same problem on an 
nearly idle Server. There runs only a few cronjobs (normal Debian System 
crons).


The load was not higher than 0.01 on this system the last 3 days and 
this morning it crashes with the same error.


I've not tested 2.6.19.x cause this one has some problems with SATA AHCI 
driver which we need. But i can manuelly update only this system with 
2.6.19.x and wait some days.


There were no other messages in the log.

Cheers,
   Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 09:07:23AM +0100, Stefan Priebe - FH wrote:

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


I was thinking more of a driver change that is being triggered on
that particular hardware. FWIW, did you test 2.6.19?

I really need a better idea of the workload these servers are running
and, ideally, a reproducable test case to track something like
this down. At the moment I have no idea what is going on and no
real information on which to even base a guess.

Were there any other messages in the log?

On Mon, Jan 22, 2007 at 10:42:36AM +0100, Stefan Priebe - FH wrote:

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


You could try turning it off. If it does fix the problem, then I'd be
pointing once again at hardware ;)

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SATA ahci Bug in 2.6.19.x

2007-01-22 Thread Stefan Priebe - FH

Hello!

I've an Asus A8V Mainboard which works wonderful with a 2.6.18.X kernel. 
But i cannot use the SATA Controller with a 2.6.19.x Kernel.


dmesg output from 2.6.18.3 where it works perfectly:
libata version 2.00 loaded.
ahci :00:0f.0: version 2.0
GSI 19 sharing vector 0xD9 and IRQ 19
ACPI: PCI Interrupt :00:0f.0[B] -> GSI
21 (level, low) -> IRQ 217
ahci :00:0f.0: AHCI 0001. 32 slots 4
ports 3 Gbps 0xf impl IDE mode
ahci :00:0f.0: flags: 64bit ncq pm led
clo pmp pio slum part
ata1: SATA max UDMA/133 cmd
0xC2004D00 ctl 0x0 bmdma 0x0 irq 225
ata2: SATA max UDMA/133 cmd
0xC2004D80 ctl 0x0 bmdma 0x0 irq 225
ata3: SATA max UDMA/133 cmd
0xC2004E00 ctl 0x0 bmdma 0x0 irq 225
ata4: SATA max UDMA/133 cmd
0xC2004E80 ctl 0x0 bmdma 0x0 irq 225
scsi0 : ahci
ata1: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
ata1.00: ATA-7, max UDMA7, 312581808
sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ahci
ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
ata2.00: ATA-7, max UDMA7, 312581808
sectors: LBA48 NCQ (depth 0/32)
ata2.00: ata2: dev 0 multi count 16
ata2.00: configured for UDMA/133
scsi2 : ahci
ata3: SATA link down (SStatus 0 SControl 300)
scsi3 : ahci
ata4: SATA link down (SStatus 0 SControl 300)
  Vendor: ATA   Model: SAMSUNG HD160JJ
 Rev: ZM10
  Type:   Direct-Access
 ANSI SCSI revision: 05
  Vendor: ATA   Model: SAMSUNG HD160JJ
 Rev: ZM10
  Type:   Direct-Access
 ANSI SCSI revision: 05
SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1 < sda5 sda6 sda7 >
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 312581808 512-byte hdwr
sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 312581808 512-byte hdwr
sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: sdb1 < sdb5 sdb6 sdb7 >
sd 1:0:0:0: Attached scsi disk sdb


Output with 2.6.19.2 (logged via netconsole cause the system can't boot):

"ACPI: PCI Interrupt :00:0f.0[B] -> GSI 21 (level, low) -> IRQ 21"
"ahci :00:0f.0: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE 
mode"

"ahci :00:0f.0: flags: 64bit ncq pm led clo pmp pio slum part "
"ata1: SATA max UDMA/133 cmd 0xC2004D00 ctl 0x0 bmdma 0x0 irq 1277"
"ata2: SATA max UDMA/133 cmd 0xC2004D80 ctl 0x0 bmdma 0x0 irq 1277"
"ata3: SATA max UDMA/133 cmd 0xC2004E00 ctl 0x0 bmdma 0x0 irq 1277"
"ata4: SATA max UDMA/133 cmd 0xC2004E80 ctl 0x0 bmdma 0x0 irq 1277"
"scsi0 : ahci"
"ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)"
"ata1.00: qc timeout (cmd 0xec)"
"ata1.00: failed to IDENTIFY (I/O error, err_mask=0x104)"
"ata1: port is slow to respond, please be patient (Status 0x80)"
"ata1: port failed to respond (30 secs, Status 0x80)"
"ata1: COMRESET failed (device not ready)"
"ata1: hardreset failed, retrying in 5 secs"
"ata1: port is slow to respond, please be patient (Status 0x80)"
"ata1: port failed to respond (30 secs, Status 0x80)"
"ata1: COMRESET failed (device not ready)"
"ata1: hardreset failed, retrying in 5 secs"
"ata1: port is slow to respond, please be patient (Status 0x80)"
"ata1: port failed to respond (30 secs, Status 0x80)"
"ata1: COMRESET failed (device not ready)"
"ata1: reset failed, giving up"
"scsi1 : ahci"
"ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)"
"ata2.00: qc timeout (cmd 0xec)"
"ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104)"
"ata2: port is slow to respond, please be patient (Status 0x80)"
"ata2: port failed to respond (30 secs, Status 0x80)"
"ata2: COMRESET failed (device not ready)"
"ata2: hardreset failed, retrying in 5 secs"
"ata2: port is slow to respond, please be patient (Status 0x80)"
"ata2: port failed to respond (30 secs, Status 0x80)"
"ata2: COMRESET failed (device not ready)"
"ata2: hardreset failed, retrying in 5 secs"
"ata2: port is slow to respond, please be patient (Status 0x80)"
"ata2: port failed to respond (30 secs, Status 0x80)"
"ata2: COMRESET failed (device not ready)"
"ata2: reset failed, giving up"
"scsi2 : ahci"
"ata3: SATA link down (SStatus 0 SControl 300)"
"scsi3 : ahci"
"ata4: SATA link down (SStatus 0 SControl 300)"


Stefan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


SATA ahci Bug in 2.6.19.x

2007-01-22 Thread Stefan Priebe - FH

Hello!

I've an Asus A8V Mainboard which works wonderful with a 2.6.18.X kernel. 
But i cannot use the SATA Controller with a 2.6.19.x Kernel.


dmesg output from 2.6.18.3 where it works perfectly:
libata version 2.00 loaded.
ahci :00:0f.0: version 2.0
GSI 19 sharing vector 0xD9 and IRQ 19
ACPI: PCI Interrupt :00:0f.0[B] - GSI
21 (level, low) - IRQ 217
ahci :00:0f.0: AHCI 0001. 32 slots 4
ports 3 Gbps 0xf impl IDE mode
ahci :00:0f.0: flags: 64bit ncq pm led
clo pmp pio slum part
ata1: SATA max UDMA/133 cmd
0xC2004D00 ctl 0x0 bmdma 0x0 irq 225
ata2: SATA max UDMA/133 cmd
0xC2004D80 ctl 0x0 bmdma 0x0 irq 225
ata3: SATA max UDMA/133 cmd
0xC2004E00 ctl 0x0 bmdma 0x0 irq 225
ata4: SATA max UDMA/133 cmd
0xC2004E80 ctl 0x0 bmdma 0x0 irq 225
scsi0 : ahci
ata1: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
ata1.00: ATA-7, max UDMA7, 312581808
sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : ahci
ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
ata2.00: ATA-7, max UDMA7, 312581808
sectors: LBA48 NCQ (depth 0/32)
ata2.00: ata2: dev 0 multi count 16
ata2.00: configured for UDMA/133
scsi2 : ahci
ata3: SATA link down (SStatus 0 SControl 300)
scsi3 : ahci
ata4: SATA link down (SStatus 0 SControl 300)
  Vendor: ATA   Model: SAMSUNG HD160JJ
 Rev: ZM10
  Type:   Direct-Access
 ANSI SCSI revision: 05
  Vendor: ATA   Model: SAMSUNG HD160JJ
 Rev: ZM10
  Type:   Direct-Access
 ANSI SCSI revision: 05
SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
SCSI device sda: 312581808 512-byte hdwr
sectors (160042 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
 sda: sda1  sda5 sda6 sda7 
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 312581808 512-byte hdwr
sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
SCSI device sdb: 312581808 512-byte hdwr
sectors (160042 MB)
sdb: Write Protect is off
sdb: Mode Sense: 00 3a 00 00
SCSI device sdb: drive cache: write back
 sdb: sdb1  sdb5 sdb6 sdb7 
sd 1:0:0:0: Attached scsi disk sdb


Output with 2.6.19.2 (logged via netconsole cause the system can't boot):

ACPI: PCI Interrupt :00:0f.0[B] - GSI 21 (level, low) - IRQ 21
ahci :00:0f.0: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE 
mode

ahci :00:0f.0: flags: 64bit ncq pm led clo pmp pio slum part 
ata1: SATA max UDMA/133 cmd 0xC2004D00 ctl 0x0 bmdma 0x0 irq 1277
ata2: SATA max UDMA/133 cmd 0xC2004D80 ctl 0x0 bmdma 0x0 irq 1277
ata3: SATA max UDMA/133 cmd 0xC2004E00 ctl 0x0 bmdma 0x0 irq 1277
ata4: SATA max UDMA/133 cmd 0xC2004E80 ctl 0x0 bmdma 0x0 irq 1277
scsi0 : ahci
ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x104)
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ata1: COMRESET failed (device not ready)
ata1: hardreset failed, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ata1: COMRESET failed (device not ready)
ata1: hardreset failed, retrying in 5 secs
ata1: port is slow to respond, please be patient (Status 0x80)
ata1: port failed to respond (30 secs, Status 0x80)
ata1: COMRESET failed (device not ready)
ata1: reset failed, giving up
scsi1 : ahci
ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata2.00: qc timeout (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x104)
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: COMRESET failed (device not ready)
ata2: hardreset failed, retrying in 5 secs
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: COMRESET failed (device not ready)
ata2: hardreset failed, retrying in 5 secs
ata2: port is slow to respond, please be patient (Status 0x80)
ata2: port failed to respond (30 secs, Status 0x80)
ata2: COMRESET failed (device not ready)
ata2: reset failed, giving up
scsi2 : ahci
ata3: SATA link down (SStatus 0 SControl 300)
scsi3 : ahci
ata4: SATA link down (SStatus 0 SControl 300)


Stefan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

The update of the IDE layer was in 2.6.19. I don't think it is a 
hardware bug cause all these 5 machines runs fine since a few years with 
2.6.16.X and before. We switch to 2.6.18.6 on monday last week and all 
machines began to crash periodically. On friday last week we downgraded 
them all to 2.6.16.37 and all 5 machines runs fine again. So i don't 
believe it is a hardware problem. Do you really think that could be?


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH

Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are "old" Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines. Other P4 Machines with a Tyan Mainboard or a 
Gigabyte Mainboard are not affected. All 300 machines runs the same 
Debian 3.0 with self build kernel. Some of these 5 use a 3ware 
controller and some of them the mainboardcontroller. All systems are 
using IDE.


But i cannot say what happens to these machines at the time of failure. 
Sometimes these servers crashed directly after a few minutes. Sometimes 
they run about 2-3 days... i've now downgraded all servers to 2.6.16.37. 
Cause they are production machines... but i have one machine where we 
can test - if you need something.


Here is the output running 2.6.16.37 at the moment:
xfs_growfs -n /

meta-data=/dev/root  isize=256agcount=16, agsize=603855 blks
 =   sectsz=512   attr=0
data =   bsize=4096   blocks=9661680, imaxpct=25
 =   sunit=0  swidth=0 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=4717, version=1
 =   sectsz=512   sunit=0 blks
realtime =none   extsz=65536  blocks=0, rtextents=0

Stefan

David Chinner schrieb:

On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:


[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n  on the filesystem that is causing
problems?

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

"general protection fault:  [#1]"
"Modules linked in:"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b"
"eax:    ebx: fffe0007   ecx: 0071a4cd   edx: "
"esi:    edi:    ebp: 0015   esp: ce35f8f0"
"ds:    es: 007b   ss: 0068"
"Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)"
"Stack: 0232  0233    000c
 "
"   0007  eca90250 eca90278 0001 eca90200 
03c3 "
"    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 "
"Call Trace:"
" [] xfs_trans_dqresv+0x3f9/0x405"
" [] xfs_bmap_add_extent+0x163/0x377"
" [] xfs_bmapi+0xa4e/0x1109"
" [] xfs_iomap_write_delay+0x233/0x2fa"
" [] xfs_imap_to_bmap+0x29/0x1d6"
" [] xfs_iomap+0x23c/0x3e1"
" [] xfs_iomap+0x2e0/0x3e1"
" [] xfs_bmap+0x1a/0x1e"
" [] __xfs_get_blocks+0x5d/0x195"


and sometimes this one:

"BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288"
" printing eip:"
"c0142ff7"
"*pde = "
"Oops:  [#1]"
"SMP "
"Modules linked in: iptable_filter ip_tables x_tables"
"CPU:0"
"EIP:0060:[]Not tainted VLI"
"EFLAGS: 00010246   (2.6.18.6 #1) "
"EIP is at generic_file_buffered_write+0x390/0x6cf"
"eax:    ebx: 01ec   ecx: ea029a40   edx: 8002"
"esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18"
"ds: 007b   es: 007b   ss: 0068"
"Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)"
"Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 "
"   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 "
"    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 "
"Call Trace:"
" [] sock_def_readable+0x7f/0x81"
" [] file_update_time+0xad/0xcb"
" [] xfs_iunlock+0x55/0x9f"
" [] xfs_write+0xa74/0xc61"
" [] sock_aio_read+0x95/0x99"
" [] xfs_file_aio_write+0x8f/0xa0"
" [] do_sync_write+0xc9/0x10f"
" [] autoremove_wake_function+0x0/0x57"
" [] generic_file_llseek+0x95/0xbc"
" [] do_sync_write+0x0/0x10f"
" [] vfs_write+0xa6/0x179"
" [] sys_write+0x51/0x80"
" [] syscall_call+0x7/0xb"

"Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 <8b> 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a "

"EIP: [] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18"

Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:

general protection fault:  [#1]
Modules linked in:
CPU:0
EIP:0060:[c01c8fd2]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b
eax:    ebx: fffe0007   ecx: 0071a4cd   edx: 
esi:    edi:    ebp: 0015   esp: ce35f8f0
ds:    es: 007b   ss: 0068
Process mysqld (pid: 1836, ti=ce35e000 task=ee618550 task.ti=ce35e000)
Stack: 0232  0233    000c
 
   0007  eca90250 eca90278 0001 eca90200 
03c3 
    010003c3 ffc0 ce35fa58 ce35fa58 0001 
 
Call Trace:
 [c01b6c58] xfs_trans_dqresv+0x3f9/0x405
 [c01c6485] xfs_bmap_add_extent+0x163/0x377
 [c01cd2c3] xfs_bmapi+0xa4e/0x1109
 [c01ebbe3] xfs_iomap_write_delay+0x233/0x2fa
 [c01eaa31] xfs_imap_to_bmap+0x29/0x1d6
 [c01eae1a] xfs_iomap+0x23c/0x3e1
 [c01eaebe] xfs_iomap+0x2e0/0x3e1
 [c020a71a] xfs_bmap+0x1a/0x1e
 [c020471e] __xfs_get_blocks+0x5d/0x195


and sometimes this one:

BUG: unable to handle kernel NULL pointer dereference at virtual
address 0288
 printing eip:
c0142ff7
*pde = 
Oops:  [#1]
SMP 
Modules linked in: iptable_filter ip_tables x_tables
CPU:0
EIP:0060:[c0142ff7]Not tainted VLI
EFLAGS: 00010246   (2.6.18.6 #1) 
EIP is at generic_file_buffered_write+0x390/0x6cf
eax:    ebx: 01ec   ecx: ea029a40   edx: 8002
esi:    edi: e3b28c9c   ebp: 01ec   esp: dd04bd18
ds: 007b   es: 007b   ss: 0068
Process proftpd (pid: 3615, ti=dd04a000 task=eba88a70 task.ti=dd04a000)
Stack: e3b28d44 0001 0010 01fc c036d793 01fc c14765c0
0010 
   080d404c 01ec e3b28c9c c03e78c0 e3b28d44 ea029a40 01fc
 
    01ec dd04beac 00d420b1   dd04bd80
45b1fa67 
Call Trace:
 [c036d793] sock_def_readable+0x7f/0x81
 [c017a03a] file_update_time+0xad/0xcb
 [c0232015] xfs_iunlock+0x55/0x9f
 [c0262eeb] xfs_write+0xa74/0xc61
 [c036a253] sock_aio_read+0x95/0x99
 [c025d9fb] xfs_file_aio_write+0x8f/0xa0
 [c015fb94] do_sync_write+0xc9/0x10f
 [c0133ad6] autoremove_wake_function+0x0/0x57
 [c015f3d5] generic_file_llseek+0x95/0xbc
 [c015facb] do_sync_write+0x0/0x10f
 [c015fc80] vfs_write+0xa6/0x179
 [c015fe24] sys_write+0x51/0x80
 [c0102d3f] syscall_call+0x7/0xb

Code: 04 89 10 8b 44 24 40 85 c0 0f 85 db 00 00 00 8b 5c 24 24 85 db 0f
88 c3 00 00 00 8b 4c 24 34 8b 51 18 f6 c6 10 75 73 8b 7c 24 28 8b 85
9c 00 00 00 f6 40 30 10 75 63 f6 87 48 01 00 00 01 75 5a 

EIP: [c0142ff7] generic_file_buffered_write+0x390/0x6cf SS:ESP
0068:dd04bd18

Stefan

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: XFS or Kernel Problem / Bug

2007-01-21 Thread Stefan Priebe - FH

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines. Other P4 Machines with a Tyan Mainboard or a 
Gigabyte Mainboard are not affected. All 300 machines runs the same 
Debian 3.0 with self build kernel. Some of these 5 use a 3ware 
controller and some of them the mainboardcontroller. All systems are 
using IDE.


But i cannot say what happens to these machines at the time of failure. 
Sometimes these servers crashed directly after a few minutes. Sometimes 
they run about 2-3 days... i've now downgraded all servers to 2.6.16.37. 
Cause they are production machines... but i have one machine where we 
can test - if you need something.


Here is the output running 2.6.16.37 at the moment:
xfs_growfs -n /

meta-data=/dev/root  isize=256agcount=16, agsize=603855 blks
 =   sectsz=512   attr=0
data =   bsize=4096   blocks=9661680, imaxpct=25
 =   sunit=0  swidth=0 blks, unwritten=1
naming   =version 2  bsize=4096
log  =internal   bsize=4096   blocks=4717, version=1
 =   sectsz=512   sunit=0 blks
realtime =none   extsz=65536  blocks=0, rtextents=0

Stefan

David Chinner schrieb:

On Sun, Jan 21, 2007 at 01:30:15PM +0100, Stefan Priebe - FH wrote:

Hello!

I've 3 Servers which works wonderful with 2.6.16.X (also testet the
latest 2.6.16.37)

but with 2.6.18.6 i get these errors:


[ EIP is at xfs_bmap_add_extent_hole_delay+0x58d/0x59b ]
[ EIP is at generic_file_buffered_write+0x390/0x6cf ]

Do you have a reproducable test case for these? if not,
do you have any idea what is going on in the system at the time
of the failure?

Can you describe the storage subsystem you are using and post the
output of xfs_growfs -n mntpt on the filesystem that is causing
problems?

Cheers,

Dave.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/