Re: RAID5 not being reassembled correctly after device swap

2007-07-02 Thread Michael Frotscher
On Monday 02 July 2007 00:12:14 Neil Brown wrote:

 Kernel logs from the boot would help here.
 Logs would help.

Sure. The interesting part from dmesg is this:

hdg: max request size: 512KiB
hdg: 398297088 sectors (203928 MB) w/8192KiB Cache, CHS=24792/255/63, 
UDMA(100)
hdg: cache flushes supported
 hdg: hdg1 hdg2 hdg3 hdg4
hda: max request size: 512KiB
hda: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63, 
UDMA(100)
hda: cache flushes supported
 hda: hda1 hda2 hda3 hda4
hdb: max request size: 512KiB
hdb: 490234752 sectors (251000 MB) w/7936KiB Cache, CHS=30515/255/63, 
UDMA(100)
hdb: cache flushes supported
 hdb: hdb1 hdb2 hdb3 hdb4
md: md3 stopped.
md: bindhdb3
md: bindhda3
raid5: device hda3 operational as raid disk 0
raid5: device hdb3 operational as raid disk 1
raid5: allocated 3163kB for md3
raid5: raid level 5 set md3 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 0, o:1, dev:hda3
 disk 1, o:1, dev:hdb3

What I really don't understand is the output of /proc/mdstat after a reboot:

Personalities : [raid6] [raid5] [raid4]
md4 : active raid5 hdg4[1] hda4[2]
  368643328 blocks level 5, 4k chunk, algorithm 2 [3/2] [_UU]

md2 : active raid5 hda2[0] hdg2[2]
  1027968 blocks level 5, 4k chunk, algorithm 2 [3/2] [U_U]

md3 : active raid5 hda3[0] hdb3[1]
  20980736 blocks level 5, 4k chunk, algorithm 2 [3/2] [UU_]

All arrays are degraded, but different disks are missing. md3 (the root 
partition) is missing its hdg, as the logfile tells. md2 and md4 are now 
missing its hdb:

md: md2 stopped.
md: bindhdg2
md: bindhda2
raid5: device hda2 operational as raid disk 0
raid5: device hdg2 operational as raid disk 2
raid5: allocated 3163kB for md2
raid5: raid level 5 set md2 active with 2 out of 3 devices, algorithm 2

Btw., is that significant that the order is different? In md4, the hdg-disk is 
raid-disk 1, whereas it is raid-disk 2 in md2. 

 Maybe /etc/mdadm/mdadm.conf lists device= where it shouldn't.

Should be irrelevant, as the root-fs, where mdadm.conf resides, is on a raid 
itself.

 Maybe the other IDE controller uses a module that it loaded late.

Hmm, I'd need to check that after I rebuild the arrays. Maybe the other 
IDE-controller is not in the initrd. That wouldn't explain the missing hdb, 
though.
-- 
YT,
Michael
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-07-02 Thread Tejun Heo
David Greaves wrote:
 Tejun Heo wrote:
 It's really weird tho.  The PHY RDY status changed events are coming
 from the device which is NOT used while resuming
 
 There is an obvious problem there though Tejun (the errors even when sda
 isn't involved in the OS boot) - can I start another thread about that
 issue/bug later? I need to reshuffle partitions so I'd rather get the
 hibernate working first and then go back to it if that's OK?

Yeah, sure.  The problem is that we don't know whether or how those two
are related.  It would be great if there's a way to verify memory image
read from hibernation is intact.  Rafael, any ideas?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-07-02 Thread Rafael J. Wysocki
On Monday, 2 July 2007 12:56, Tejun Heo wrote:
 David Greaves wrote:
  Tejun Heo wrote:
  It's really weird tho.  The PHY RDY status changed events are coming
  from the device which is NOT used while resuming
  
  There is an obvious problem there though Tejun (the errors even when sda
  isn't involved in the OS boot) - can I start another thread about that
  issue/bug later? I need to reshuffle partitions so I'd rather get the
  hibernate working first and then go back to it if that's OK?
 
 Yeah, sure.  The problem is that we don't know whether or how those two
 are related.  It would be great if there's a way to verify memory image
 read from hibernation is intact.  Rafael, any ideas?

Well, s2disk has an option to compute an MD5 checksum of the image during
the hibernation and verify it while reading the image.  Still, s2disk/resume
aren't very easy to install  and configure ...

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-07-02 Thread Rafael J. Wysocki
On Monday, 2 July 2007 16:32, David Greaves wrote:
 Rafael J. Wysocki wrote:
  On Monday, 2 July 2007 12:56, Tejun Heo wrote:
  David Greaves wrote:
  Tejun Heo wrote:
  It's really weird tho.  The PHY RDY status changed events are coming
  from the device which is NOT used while resuming
  There is an obvious problem there though Tejun (the errors even when sda
  isn't involved in the OS boot) - can I start another thread about that
  issue/bug later? I need to reshuffle partitions so I'd rather get the
  hibernate working first and then go back to it if that's OK?
  Yeah, sure.  The problem is that we don't know whether or how those two
  are related.  It would be great if there's a way to verify memory image
  read from hibernation is intact.  Rafael, any ideas?
  
  Well, s2disk has an option to compute an MD5 checksum of the image during
  the hibernation and verify it while reading the image.
 (Assuming you mean the mainline version)
 
 Sounds like a good think to try next...
 Couldn't see anything on this in ../Documentation/power/*
 How do I enable it?

Add 'compute checksum = y' to the s2disk's configuration file.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

2007-07-02 Thread Rafael J. Wysocki
On Monday, 2 July 2007 18:36, David Greaves wrote:
 Rafael J. Wysocki wrote:
  On Monday, 2 July 2007 16:32, David Greaves wrote:
  Rafael J. Wysocki wrote:
  On Monday, 2 July 2007 12:56, Tejun Heo wrote:
  David Greaves wrote:
  Tejun Heo wrote:
  It's really weird tho.  The PHY RDY status changed events are coming
  from the device which is NOT used while resuming
  There is an obvious problem there though Tejun (the errors even when sda
  isn't involved in the OS boot) - can I start another thread about that
  issue/bug later? I need to reshuffle partitions so I'd rather get the
  hibernate working first and then go back to it if that's OK?
  Yeah, sure.  The problem is that we don't know whether or how those two
  are related.  It would be great if there's a way to verify memory image
  read from hibernation is intact.  Rafael, any ideas?
  Well, s2disk has an option to compute an MD5 checksum of the image during
  the hibernation and verify it while reading the image.
  (Assuming you mean the mainline version)
 
  Sounds like a good think to try next...
  Couldn't see anything on this in ../Documentation/power/*
  How do I enable it?
  
  Add 'compute checksum = y' to the s2disk's configuration file.
 
 Ah, right - that's uswsusp isn't it? Which isn't what I'm having problems 
 with 
 AFAIK?
 
 My suspend procedure is:
 
 xfs_freeze -f /scratch
 sync
 echo platform  /sys/power/disk
 echo disk  /sys/power/state
 xfs_freeze -u /scratch
 
 Which should work (actually it should work without the sync/xfs_freeze too).
 
 So to debug the problem I'd like to minimally extend this process rather than 
 replace it with another approach.

Well, this is not entirely another approach.  Only the saving of the image is
done differently, the rest is the same.

 I take it there isn't an 'echo y  /sys/power/do_image_checksum'?

No, there is not anything like that.

Greetings,
Rafael


-- 
Premature optimization is the root of all evil. - Donald Knuth
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fastest Chunk Size w/XFS For MD Software RAID = 1024k

2007-07-02 Thread Mr. James W. Laferriere

Hello Justin ( all) ,

On Thu, 28 Jun 2007, Justin Piszcz wrote:

On Thu, 28 Jun 2007, Peter Rabbitson wrote:


Justin Piszcz wrote:


On Thu, 28 Jun 2007, Peter Rabbitson wrote:

Interesting, I came up with the same results (1M chunk being superior) 
with a completely different raid set with XFS on top:


...

Could it be attributed to XFS itself?

Peter



Good question, by the way how much cache do the drives have that you are 
testing with?




I believe 8MB, but I am not sure I am looking at the right number:

[EMAIL PROTECTED]:~# hdparm -i /dev/sda

/dev/sda:

Model=aMtxro7 2Y050M  , FwRev=AY5RH10W, 
SerialNo=6YB6Z7E4

Config={ Fixed }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=?0?
CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes:  pio0 pio1 pio2 pio3 pio4
DMA modes:  mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5
AdvancedPM=yes: disabled (255) WriteCache=enabled
Drive conforms to: ATA/ATAPI-7 T13 1532D revision 0:  ATA/ATAPI-1 
ATA/ATAPI-2 ATA/ATAPI-3 ATA/ATAPI-4 ATA/ATAPI-5 ATA/ATAPI-6 ATA/ATAPI-7


* signifies the current active mode

[EMAIL PROTECTED]:~#

1M chunk consistently delivered best performance with:

o A plain dumb dd run
o bonnie
o two bonnie threads
o iozone with 4 threads

My RA is set at 256 for the drives and 16384 for the array (128k and 8M 
respectively)




8MB yup: BuffSize=7936kB.

My read ahead is set to 64 megabytes and 16384 for the stripe_size_cache.


	Might you know of a tool for acquiring these (*) parameters for a scsi 
drive ?  hdrarm really doesn't like real scsi drives so that doesn't seem to work for 
me .


(*) BuffType=DualPortCache, BuffSize=7936kB,  Stolen from above .


Tia ,  JimL
--
+-+
| James   W.   Laferriere | System   Techniques | Give me VMS |
| NetworkEngineer | 663  Beaumont  Blvd |  Give me Linux  |
| [EMAIL PROTECTED] | Pacifica, CA. 94044 |   only  on  AXP |
+-+
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html