Re: Disk Errors during boot and run time.

2009-03-13 Thread Konstantin Svist
Bryn M. Reeves wrote:
 On Fri, 2009-03-06 at 12:22 +1300, Paul Ward wrote:
   
 # ls /boot
 ls: reading directory /boot: Input/output error
 

 What's in dmesg at this time?

   
 I have been told that the disks use multipath but I have no experience
 of this to date.
 I know the disks are on a SAN but as yet have not been able to locate
 them using the IBM SAN manager.

 

   
 Linux version 2.6.18-53.1.21.el5PAE
 

 So, RHEL5.1?

   
 (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626
 (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008
 

   
   Vendor: IBM   Model: 1814  FAStT   Rev: 0916
   Type:   Direct-Access  ANSI SCSI revision: 05
 

 So it's an IBM FAStT SAN? These are active/passive storage arrays that
 require use of a multipath hardware handler to properly manage switching
 between the active and passive paths and preventing I/O being sent to a
 controller that cannot handler it.

 The I/O errors that you see are a result of things trying to access the
 passive paths (e.g. partition scanning, lvm label scanning, udev/hal
 probes etc.).

 RHEL5.1 included the old device-mapper hardware handlers. These will
 only take effect once multipath has configured the devices and only
 handle path switching in the event of a path failure (i.e. you'll still
 see I/O errors if something tries to access one of the underlying paths
 directly rather than via the multipath device map).

 RHEL5.3 introduces the scsi device handler framework as a replacement
 for the device-mapper hardware handlers (this appeared upstream in
 2.6.26).

 Whether you decide to update or not it's probably worth carefully
 checking the current multipath configuration on the system as this is a
 very common area for configuration mistakes.

 Regards,
 Bryn.


   

I don't think this is hardware-specific. I've seen this problem on
desktop-grade hardware, using either IDE or SATA drives (single 300GB
Seagate). Mine happened while I was using cloning software CloneZilla
(don't remember which version, right now).
I'll post more details if/when i run into the problem again...

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Disk Errors during boot and run time.

2009-03-11 Thread Bryn M. Reeves
On Fri, 2009-03-06 at 12:22 +1300, Paul Ward wrote:
 # ls /boot
 ls: reading directory /boot: Input/output error

What's in dmesg at this time?

 I have been told that the disks use multipath but I have no experience
 of this to date.
 I know the disks are on a SAN but as yet have not been able to locate
 them using the IBM SAN manager.
 

 Linux version 2.6.18-53.1.21.el5PAE

So, RHEL5.1?

 (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626
 (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008

   Vendor: IBM   Model: 1814  FAStT   Rev: 0916
   Type:   Direct-Access  ANSI SCSI revision: 05

So it's an IBM FAStT SAN? These are active/passive storage arrays that
require use of a multipath hardware handler to properly manage switching
between the active and passive paths and preventing I/O being sent to a
controller that cannot handler it.

The I/O errors that you see are a result of things trying to access the
passive paths (e.g. partition scanning, lvm label scanning, udev/hal
probes etc.).

RHEL5.1 included the old device-mapper hardware handlers. These will
only take effect once multipath has configured the devices and only
handle path switching in the event of a path failure (i.e. you'll still
see I/O errors if something tries to access one of the underlying paths
directly rather than via the multipath device map).

RHEL5.3 introduces the scsi device handler framework as a replacement
for the device-mapper hardware handlers (this appeared upstream in
2.6.26).

Whether you decide to update or not it's probably worth carefully
checking the current multipath configuration on the system as this is a
very common area for configuration mistakes.

Regards,
Bryn.




-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Re: Disk Errors during boot and run time.

2009-03-10 Thread Konstantin Svist
Paul Ward wrote:
 Hi,

 I have been asked to look at a server that has some disk issues.

 If I try and do a ls /boot I get the following:
 # ls /boot
 ls: reading directory /boot: Input/output error

 I have been told that the disks use multipath but I have no experience
 of this to date.
 I know the disks are on a SAN but as yet have not been able to locate
 them using the IBM SAN manager.

 Can someone help me unravel what I can do to get this server healthy again.


 This is the dmesg file during the boot up


 Linux version 2.6.18-53.1.21.el5PAE
 (brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626
 (Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008
 BIOS-provided physical RAM map:
  BIOS-e820:  - 0009d800 (usable)
  BIOS-e820: 0009d800 - 000a (reserved)
  BIOS-e820: 000e - 0010 (reserved)
  BIOS-e820: 0010 - cffbce40 (usable)
  BIOS-e820: cffbce40 - cffd (ACPI data)
  BIOS-e820: cffd - d000 (reserved)
  BIOS-e820: e000 - f000 (reserved)
  BIOS-e820: fec0 - 0001 (reserved)
  BIOS-e820: 0001 - 00013000 (usable)
 3968MB HIGHMEM available.
 896MB LOWMEM available.
 found SMP MP-table at 0009d940
 Using x86 segment limits to approximate NX protection
 On node 0 totalpages: 1245184
   DMA zone: 4096 pages, LIFO batch:0
   Normal zone: 225280 pages, LIFO batch:31
   HighMem zone: 1015808 pages, LIFO batch:31
 DMI 2.4 present.
 Using APIC driver default
 ACPI: RSDP (v002 IBM   ) @ 0x000fdfd0
 ACPI: XSDT (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcff00
 ACPI: FADT (v002 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfe40
 ACPI: MADT (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfdc0
 ACPI: MCFG (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfd80
 ACPI: DSDT (v002 IBMSERBLADE 0x1000 INTL 0x20060707) @ 0x
 ACPI: PM-Timer IO Port: 0x588
 ACPI: Local APIC address 0xfee0
 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
 Processor #0 6:15 APIC version 20
 ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
 Processor #1 6:15 APIC version 20
 ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
 ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
 ACPI: IOAPIC (id[0x0e] address[0xfec0] gsi_base[0])
 IOAPIC[0]: apic_id 14, version 32, address 0xfec0, GSI 0-23
 ACPI: IOAPIC (id[0x0d] address[0xfec8] gsi_base[24])
 IOAPIC[1]: apic_id 13, version 32, address 0xfec8, GSI 24-47
 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
 ACPI: IRQ0 used by override.
 ACPI: IRQ2 used by override.
 ACPI: IRQ9 used by override.
 Enabling APIC mode:  Flat.  Using 2 I/O APICs
 Using ACPI (MADT) for SMP configuration information
 Allocating PCI resources starting at d100 (gap: d000:1000)
 Detected 3000.364 MHz processor.
 Built 1 zonelists.  Total pages: 1245184
 Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
 crashkernel=1...@16m
 mapped APIC to d000 (fee0)
 mapped IOAPIC to c000 (fec0)
 mapped IOAPIC to b000 (fec8)
 Enabling fast FPU save and restore... done.
 Enabling unmasked SIMD FPU exception support... done.
 Initializing CPU#0
 CPU 0 irqstacks, hard=c074 soft=c072
 PID hash table entries: 4096 (order: 12, 16384 bytes)
 Console: colour VGA+ 80x25
 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
 Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
 Memory: 4014432k/4980736k available (2078k kernel code, 178348k
 reserved, 859k data, 220k init, 3276528k highmem)
 Checking if this processor honours the WP bit even in supervisor mode... Ok.
 Calibrating delay using timer specific routine.. 6002.55 BogoMIPS 
 (lpj=3001278)
 Security Framework v1.0.0 initialized
 SELinux:  Initializing.
 SELinux:  Starting in permissive mode
 selinux_register_security:  Registering secondary module capability
 Capability LSM initialized as secondary
 Mount-cache hash table entries: 512
 CPU: After generic identify, caps: bfebfbff 2000  
 0004e3bd  0001
 CPU: After vendor identify, caps: bfebfbff 2000  
 0004e3bd  0001
 monitor/mwait feature present.
 using mwait in idle threads.
 CPU: L1 I cache: 32K, L1 D cache: 32K
 CPU: L2 cache: 4096K
 CPU: Physical Processor ID: 0
 CPU: Processor Core ID: 0
 CPU: After all inits, caps: bfebf3ff 2000  0940
 0004e3bd  0001
 Intel machine check architecture supported.
 Intel machine check reporting enabled on CPU#0.
 Checking 'hlt' instruction... OK.
 SMP alternatives: switching to UP code
 ACPI: Core revision 20060707
 CPU0: Intel(R) Xeon(R) CPU5160  @ 3.00GHz stepping 0b
 SMP alternatives: switching to SMP code
 Booting processor 1/1 

Re: Disk Errors during boot and run time.

2009-03-10 Thread Patrick O'Callaghan
On Tue, 2009-03-10 at 14:57 -0700, Konstantin Svist wrote:
 Bump for this thread. I'd very much like to know the answer, too.

You didn't need to quote all 1142 lines of context to say me too.

poc

-- 
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines


Disk Errors during boot and run time.

2009-03-05 Thread Paul Ward
Hi,

I have been asked to look at a server that has some disk issues.

If I try and do a ls /boot I get the following:
# ls /boot
ls: reading directory /boot: Input/output error

I have been told that the disks use multipath but I have no experience
of this to date.
I know the disks are on a SAN but as yet have not been able to locate
them using the IBM SAN manager.

Can someone help me unravel what I can do to get this server healthy again.


This is the dmesg file during the boot up


Linux version 2.6.18-53.1.21.el5PAE
(brewbuil...@ls20-bc2-13.build.redhat.com) (gcc version 4.1.2 20070626
(Red Hat 4.1.2-14)) #1 SMP Wed May 7 08:56:33 EDT 2008
BIOS-provided physical RAM map:
 BIOS-e820:  - 0009d800 (usable)
 BIOS-e820: 0009d800 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - cffbce40 (usable)
 BIOS-e820: cffbce40 - cffd (ACPI data)
 BIOS-e820: cffd - d000 (reserved)
 BIOS-e820: e000 - f000 (reserved)
 BIOS-e820: fec0 - 0001 (reserved)
 BIOS-e820: 0001 - 00013000 (usable)
3968MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 0009d940
Using x86 segment limits to approximate NX protection
On node 0 totalpages: 1245184
  DMA zone: 4096 pages, LIFO batch:0
  Normal zone: 225280 pages, LIFO batch:31
  HighMem zone: 1015808 pages, LIFO batch:31
DMI 2.4 present.
Using APIC driver default
ACPI: RSDP (v002 IBM   ) @ 0x000fdfd0
ACPI: XSDT (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcff00
ACPI: FADT (v002 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfe40
ACPI: MADT (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfdc0
ACPI: MCFG (v001 IBMSERBLADE 0x1001 IBM  0x45444f43) @ 0xcffcfd80
ACPI: DSDT (v002 IBMSERBLADE 0x1000 INTL 0x20060707) @ 0x
ACPI: PM-Timer IO Port: 0x588
ACPI: Local APIC address 0xfee0
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 6:15 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
Processor #1 6:15 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec0] gsi_base[0])
IOAPIC[0]: apic_id 14, version 32, address 0xfec0, GSI 0-23
ACPI: IOAPIC (id[0x0d] address[0xfec8] gsi_base[24])
IOAPIC[1]: apic_id 13, version 32, address 0xfec8, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Enabling APIC mode:  Flat.  Using 2 I/O APICs
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at d100 (gap: d000:1000)
Detected 3000.364 MHz processor.
Built 1 zonelists.  Total pages: 1245184
Kernel command line: ro root=/dev/VolGroup00/LogVol00 rhgb quiet
crashkernel=1...@16m
mapped APIC to d000 (fee0)
mapped IOAPIC to c000 (fec0)
mapped IOAPIC to b000 (fec8)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Initializing CPU#0
CPU 0 irqstacks, hard=c074 soft=c072
PID hash table entries: 4096 (order: 12, 16384 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 4014432k/4980736k available (2078k kernel code, 178348k
reserved, 859k data, 220k init, 3276528k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6002.55 BogoMIPS (lpj=3001278)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 2000  
0004e3bd  0001
CPU: After vendor identify, caps: bfebfbff 2000  
0004e3bd  0001
monitor/mwait feature present.
using mwait in idle threads.
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU: After all inits, caps: bfebf3ff 2000  0940
0004e3bd  0001
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
ACPI: Core revision 20060707
CPU0: Intel(R) Xeon(R) CPU5160  @ 3.00GHz stepping 0b
SMP alternatives: switching to SMP code
Booting processor 1/1 eip 3000
CPU 1 irqstacks, hard=c0741000 soft=c0721000
Initializing CPU#1
Calibrating delay using timer specific routine..