Re: problem with software raid1 on 2.6.22.10: check/rebuild hangs

2007-12-02 Thread Neil Brown
On Monday December 3, [EMAIL PROTECTED] wrote:
> Hello,
> 
> with kernel 2.6.22.10 checking a raid1 or rebuilding ist does not work on one 
> of our machines. After a short time the rebuild/check does not make progress 
> any more . Processes which then access the filesystems on those raids are 
> blocked.
> 
> Nothing gets logged. Access to other filesystems works fine.
> 
> If we boot 2.6.17.10 (the kernel we used befor upgrading to 2.6.22) the raids 
> the check/rebuild is done without any problems.
> 

Sounds like a driver problem.
Your symptoms are completely consistent with a request being submitted
to the underlying device, and that request never completing.

What controller runs your drives for you.  You should probably report
the problem to the relevant maintainer.

Do you compile your own kernels?  Would you be comfortable using "git
bisect" to narrow down exactly which change breaks things?  It should
take more than a dozen or so tests.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Oliver Martin
Neil Brown schrieb:
> 
> This isn't a resync, it is a data check.  "Dec  2" is the first Sunday
> of the month.  You probably have a crontab entries that does
>echo check > /sys/block/mdX/md/sync_action
> 
> early on the first Sunday of the month.  I know that Debian does this.
> 
> It is good to do this occasionally to catch sleeping bad blocks.
> 
Duh, thanks for clearing this up. I guess what set the alarm off was
getting what looked like a rebuild to me while stress testing. Yes, I'm
running Debian and I have exactly this entry in my crontab... Perhaps
they should add a short log entry like "starting periodic RAID check" so
that people know there is nothing to worry about.

Or maybe I should just RTFC (read the fine crontab) ;-)

Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problem with software raid1 on 2.6.22.10: check/rebuild hangs

2007-12-02 Thread Wolfgang Walter
Hello,

with kernel 2.6.22.10 checking a raid1 or rebuilding ist does not work on one 
of our machines. After a short time the rebuild/check does not make progress 
any more . Processes which then access the filesystems on those raids are 
blocked.

Nothing gets logged. Access to other filesystems works fine.

If we boot 2.6.17.10 (the kernel we used befor upgrading to 2.6.22) the raids 
the check/rebuild is done without any problems.

The filesystems on the raid are exported via nfs.

The machine has 2 xeons with hyperthreading.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Richard Scobie

Justin Piszcz wrote:

While we are on the subject of bad blocks, is it possible to do what 
3ware raid controllers do without an external card?


They know when a block is bad and they remap it to another part of the 
array etc, where as with software raid you never know this is happening 
until the disk is dead.


Are you sure the 3ware software is remapping the bad blocks, or is it 
just reporting the bad blocks were remapped?


As I understand it, bad block remapping (reallocated sectors), are done 
internally at the drive level.


Perhaps all 3ware are doing is running the SMART command for reallocated 
sectors on all drives on a periodic basis and reporting any changes?


Regards,

Richard
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Justin Piszcz



On Mon, 3 Dec 2007, Neil Brown wrote:


On Sunday December 2, [EMAIL PROTECTED] wrote:


Anyway, the problems are back: To test my theory that everything is
alright with the CPU running within its specs, I removed one of the
drives while copying some large files yesterday. Initially, everything
seemed to work out nicely, and by the morning, the rebuild had finished.
Again, I unmounted the filesystem and ran badblocks -svn on the LVM. It
ran without gripes for some hours, but just now I saw md had started to
rebuild the array again out of the blue:

Dec  1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device
using ehci_hcd and address 4
Dec  2 01:06:02 quassel kernel: md: data-check of RAID array md0

 ^^

Dec  2 01:06:02 quassel kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec  2 01:06:02 quassel kernel: md: using maximum available idle IO
bandwidth (but not more than 20 KB/sec) for data-check.

 ^^

Dec  2 01:06:02 quassel kernel: md: using 128k window, over a total of
488383936 blocks.
Dec  2 03:57:24 quassel kernel: usb 4-5.2: reset high speed USB device
using ehci_hcd and address 4



This isn't a resync, it is a data check.  "Dec  2" is the first Sunday
of the month.  You probably have a crontab entries that does
  echo check > /sys/block/mdX/md/sync_action

early on the first Sunday of the month.  I know that Debian does this.

It is good to do this occasionally to catch sleeping bad blocks.


While we are on the subject of bad blocks, is it possible to do what 3ware 
raid controllers do without an external card?


They know when a block is bad and they remap it to another part of the 
array etc, where as with software raid you never know this is happening 
until the disk is dead.


For example with 3dm2 it notifies you if you have e-mail alerts set to 2 
(warn) it will e-mail you every time there is a sector re-allocation, is 
this possible with software raid or does it *require* HW raid/external 
controller?


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading takes 100% precedence over writes for mdadm+raid5?

2007-12-02 Thread Justin Piszcz



On Mon, 3 Dec 2007, Neil Brown wrote:


On Sunday December 2, [EMAIL PROTECTED] wrote:


Was curious if when running 10 DD's (which are writing to the RAID 5)
fine, no issues, suddenly all go into D-state and let the read/give it
100% priority?


So are you saying that the writes completely stalled while the read
was progressing?  How exactly did you measure that?

Yes, 100%.



What kernel version are you running.

2.6.23.9





Is this normal?


It shouldn't be.

NeilBrown



I checked again with du -sb while it is writing, it is, just just VERY 
slowly:


Before reading dd:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  3104  46088  8 766941600 0 102832 2683 21132  0 34 43 22
 0  2104  49140  8 766672400 0 137800 2662 6690  0 30 45 25
 0  4104  47344  8 766888400 0 93312 2637 19454  0 22 40 38
 0  6104  51292  8 766468800 0 89404 2538 7901  0 18 31 51
 0  1104  55476  8 766042400 0 172852 2669 13607  0 39 47 14
 0  3104  50428  8 766503600 0 135916 2711 22523  0 27 52 22
 0  5104  51836  8 766415200 0 101504 2491 2784  0 18 42 40
 0  5104 113468  8 760301600 0 63788 2568 7528  0 24 24 52
 0  2104  45780  8 766936400  1116 177604 2617 13521  0 34 33 33

After reading dd launched:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 2  4104  45076 2379348 527358800  7584 17753  548  301  0 17 45 39
 0  5104  46632 2617352 504311600 237908 0 2949 2647  0 10 35 54
 1  5104  45656 2846728 481490000 229376 0 2768 2360  0 10 36 54
 1  4104  46128 3104932 455140800 258308  2748 2918 2559  0 11 36 53
 0  5104  43804 3338248 432399600 233212 0 2815 2631  0 10 33 57
 0  5104  46580 3534856 412584800 196608 0 2736 2273  0  9 36 55
 0  5104  46164 3797000 386293600 262144  1396 2900 2834  0 11 37 51
 1  4104  46076 4026376 363374000 229376 0 2978 2586  0 11 37 53
 0  5104  46252 4288520 337172400 262144 0 2878 2316  0 11 37 53
 0  5104  46520 4517896 314237600 229440 0 2912 2406  0 10 35 56
 0  5104  47408 4747272 291315600 229376 0 2903 2619  0 10 36 54
 1  4104  46800 4976648 268356000 229376 0 2726 2346  0 10 37 53
 0  5104  45284 5206024 245624800 229376 0 2856 2482  0 10 36 54
 0  5104  46524 5468168 219213600 262144 0 2956 2750  0 11 36 54
 0  5104  47284 5697544 196255600 229376 0 2894 2589  0 10 37 53

Takes awhile before it writes anything..

l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r# du -sb .
1250921771135   .
l1:/r#

.. 5 minutes later ..

l1:/r# du -sb .
1251764138111   .
l1:/r#

l1:/r# du -sb .
1251885887615   .
l1:/r#

l1:/r# ps auxww | grep dd
root  2206  4.5  0.0  10356  1672 ?DDec02  11:46 dd if 
/dev/zero of 1.out bs 1M
root  2207  4.5  0.0  10356  1672 ?DDec02  11:47 dd if 
/dev/zero of 2.out bs 1M
root  2208  4.4  0.0  10356  1676 ?DDec02  11:42 dd if 
/dev/zero of 3.out bs 1M
root  2209  4.5  0.0  10356  1676 ?DDec02  11:53 dd if 
/dev/zero of 4.out bs 1M
root  2210  4.4  0.0  10356  1672 ?DDec02  11:43 dd if 
/dev/zero of 5.out bs 1M
root  2211  4.4  0.0  10356  1676 ?DDec02  11:43 dd if 
/dev/zero of 6.out bs 1M
root  2212  4.4  0.0  10356  1676 ?DDec02  11:38 dd if 
/dev/zero of 7.out bs 1M
root  2213  4.5  0.0  10356  1672 ?DDec02  11:50 dd if 
/dev/zero of 8.out bs 1M
root  2214  4.5  0.0  10356  1672 ?DDec02  11:47 dd if 
/dev/zero of 9.out bs 1M
root  2215  4.4  0.0  10356  1676 ?DDec02  11:44 dd if 
/dev/zero of 10.out bs 1M
root  3251 25.0  0.0  10356  1676 pts/2D02:21   0:14 dd if /dev/md3 
of /dev/null bs 1M
root  3282  0.0  0.0   5172   780 pts/2S+   02:22   0:00 grep dd
l1:/r#

HP raid controllers (CCISS) allow a pct(%) utilization for read/write, 
does Linux/mdadm's implementation offer this anywhere in a sys or proc 
tunable?


Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reading takes 100% precedence over writes for mdadm+raid5?

2007-12-02 Thread Neil Brown
On Sunday December 2, [EMAIL PROTECTED] wrote:
> 
> Was curious if when running 10 DD's (which are writing to the RAID 5) 
> fine, no issues, suddenly all go into D-state and let the read/give it 
> 100% priority?

So are you saying that the writes completely stalled while the read
was progressing?  How exactly did you measure that?

What kernel version are you running.

> 
> Is this normal?

It shouldn't be.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Neil Brown
On Sunday December 2, [EMAIL PROTECTED] wrote:
> 
> Anyway, the problems are back: To test my theory that everything is
> alright with the CPU running within its specs, I removed one of the
> drives while copying some large files yesterday. Initially, everything
> seemed to work out nicely, and by the morning, the rebuild had finished.
> Again, I unmounted the filesystem and ran badblocks -svn on the LVM. It
> ran without gripes for some hours, but just now I saw md had started to
> rebuild the array again out of the blue:
> 
> Dec  1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device
> using ehci_hcd and address 4
> Dec  2 01:06:02 quassel kernel: md: data-check of RAID array md0
  ^^
> Dec  2 01:06:02 quassel kernel: md: minimum _guaranteed_  speed: 1000
> KB/sec/disk.
> Dec  2 01:06:02 quassel kernel: md: using maximum available idle IO
> bandwidth (but not more than 20 KB/sec) for data-check.
  ^^
> Dec  2 01:06:02 quassel kernel: md: using 128k window, over a total of
> 488383936 blocks.
> Dec  2 03:57:24 quassel kernel: usb 4-5.2: reset high speed USB device
> using ehci_hcd and address 4
> 

This isn't a resync, it is a data check.  "Dec  2" is the first Sunday
of the month.  You probably have a crontab entries that does
   echo check > /sys/block/mdX/md/sync_action

early on the first Sunday of the month.  I know that Debian does this.

It is good to do this occasionally to catch sleeping bad blocks.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Reading takes 100% precedence over writes for mdadm+raid5?

2007-12-02 Thread Justin Piszcz
root  2206 1  4 Dec02 ?00:10:37 dd if /dev/zero of 1.out 
bs 1M
root  2207 1  4 Dec02 ?00:10:38 dd if /dev/zero of 2.out 
bs 1M
root  2208 1  4 Dec02 ?00:10:35 dd if /dev/zero of 3.out 
bs 1M
root  2209 1  4 Dec02 ?00:10:45 dd if /dev/zero of 4.out 
bs 1M
root  2210 1  4 Dec02 ?00:10:35 dd if /dev/zero of 5.out 
bs 1M
root  2211 1  4 Dec02 ?00:10:35 dd if /dev/zero of 6.out 
bs 1M
root  2212 1  4 Dec02 ?00:10:30 dd if /dev/zero of 7.out 
bs 1M
root  2213 1  4 Dec02 ?00:10:42 dd if /dev/zero of 8.out 
bs 1M
root  2214 1  4 Dec02 ?00:10:35 dd if /dev/zero of 9.out 
bs 1M
root  2215 1  4 Dec02 ?00:10:37 dd if /dev/zero of 10.out 
bs 1M
root  3080 24.6  0.0  10356  1672 ?D01:22   5:51 dd if 
/dev/md3 of /dev/null bs 1M


Was curious if when running 10 DD's (which are writing to the RAID 5) 
fine, no issues, suddenly all go into D-state and let the read/give it 
100% priority?


Is this normal?

# du -sb . ; sleep 300; du -sb .
1115590287487   .
1115590287487   .

Here my my raid5 config:

# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
  Creation Time : Sun Dec  2 12:15:20 2007
 Raid Level : raid5
 Array Size : 1465143296 (1397.27 GiB 1500.31 GB)
  Used Dev Size : 732571648 (698.63 GiB 750.15 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 3
Persistence : Superblock is persistent

Update Time : Sun Dec  2 22:00:54 2007
  State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

 Layout : left-symmetric
 Chunk Size : 1024K

   UUID : fea48e85:ddd2c33f:d19da839:74e9c858 (local to host box1)
 Events : 0.15

Number   Major   Minor   RaidDevice State
   0   8   330  active sync   /dev/sdc1
   1   8   491  active sync   /dev/sdd1
   2   8   652  active sync   /dev/sde1

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-02 Thread Justin Piszcz



On Mon, 3 Dec 2007, Michael Tokarev wrote:


Justin Piszcz said: (by the date of Sun, 2 Dec 2007 04:11:59 -0500 (EST))


The badblocks did not do anything; however, when I built a software raid 5
and the performed a dd:

/usr/bin/time dd if=/dev/zero of=fill_disk bs=1M

I saw this somewhere along the way:

[42332.936706] ata5.00: spurious completions during NCQ issue=0x0
SAct=0x7000 FIS=004040a1:0800
[42333.240054] ata5: soft resetting port


There's some (probably timing-related) bug with spurious completions
during NCQ.  Alot of people are seeing this same effect with different
drives and controllers.  Tejun is working on it.  It's different to
reproduce.

Search for "spurious completion" - there are many hits...

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Thanks will check it out.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-02 Thread Michael Tokarev
> Justin Piszcz said: (by the date of Sun, 2 Dec 2007 04:11:59 -0500 (EST))
> 
>> The badblocks did not do anything; however, when I built a software raid 5 
>> and the performed a dd:
>>
>> /usr/bin/time dd if=/dev/zero of=fill_disk bs=1M
>>
>> I saw this somewhere along the way:
>>
>> [42332.936706] ata5.00: spurious completions during NCQ issue=0x0 
>> SAct=0x7000 FIS=004040a1:0800
>> [42333.240054] ata5: soft resetting port

There's some (probably timing-related) bug with spurious completions
during NCQ.  Alot of people are seeing this same effect with different
drives and controllers.  Tejun is working on it.  It's different to
reproduce.

Search for "spurious completion" - there are many hits...

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-02 Thread Justin Piszcz



On Sun, 2 Dec 2007, Janek Kozicki wrote:


Justin Piszcz said: (by the date of Sun, 2 Dec 2007 04:11:59 -0500 (EST))


The badblocks did not do anything; however, when I built a software raid 5
and the performed a dd:

/usr/bin/time dd if=/dev/zero of=fill_disk bs=1M

I saw this somewhere along the way:

[42332.936706] ata5.00: spurious completions during NCQ issue=0x0
SAct=0x7000 FIS=004040a1:0800
[42333.240054] ata5: soft resetting port


I know nothing about NCQ ;) But I find it interesting that *slower*
access worked fine while *fast* access didn't.

If I understand you correctly:

- badblocks is slower, and you said that it worked flawlessly, right?
- getting from /dev/zero is the fastest thing you can do, and it fails...

I'd check jumpers on HDD and if there is any, set it to 1.5 Gb speed
instead of default 3.0 Gb. Or sth. along that way. I remember seeing
such jumper on one of my HDDs (I don't remember the exact speed
numbers though).

Also on one forum I remember about problems occurring when HDD was
working at maximum speed, which was faster than the IO controller
could handle.

I dunno. It's just what came to my mind...
--
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Thanks for the suggestions, but BTW NCQ OFF on (raptors anyway) is 30 to 
50 megabytes per second faster in a RAID 5 configuration.  NCQ slows 
things down for those disks.


There are no jumpers (by default) on the 750GB WD Caviar's btw..

So far with NCQ off I've been pounding the disks and have not been able to 
reproduce the error but with NCQ on and some DD's or some raid creations, 
it is reproducible (or appears to be)-- did it twice.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Janek Kozicki
> Justin Piszcz schrieb:
> >
> > Naturally, when it is reset, the device is disconnected and then
> > re-appears, when MD see's this it rebuilds the array.

Least you can do is to add an internal bitmap to your raid, this will
make rebuilds faster :-/

-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-02 Thread Janek Kozicki
Justin Piszcz said: (by the date of Sun, 2 Dec 2007 04:11:59 -0500 (EST))

> The badblocks did not do anything; however, when I built a software raid 5 
> and the performed a dd:
> 
> /usr/bin/time dd if=/dev/zero of=fill_disk bs=1M
> 
> I saw this somewhere along the way:
> 
> [42332.936706] ata5.00: spurious completions during NCQ issue=0x0 
> SAct=0x7000 FIS=004040a1:0800
> [42333.240054] ata5: soft resetting port

I know nothing about NCQ ;) But I find it interesting that *slower*
access worked fine while *fast* access didn't.

If I understand you correctly:

- badblocks is slower, and you said that it worked flawlessly, right?
- getting from /dev/zero is the fastest thing you can do, and it fails...

I'd check jumpers on HDD and if there is any, set it to 1.5 Gb speed
instead of default 3.0 Gb. Or sth. along that way. I remember seeing
such jumper on one of my HDDs (I don't remember the exact speed
numbers though). 

Also on one forum I remember about problems occurring when HDD was
working at maximum speed, which was faster than the IO controller
could handle.

I dunno. It's just what came to my mind...
-- 
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Abysmal write performance on HW RAID5

2007-12-02 Thread Daniel Korstad


> -Original Message-
> From: ChristopherD [mailto:[EMAIL PROTECTED]
> Sent: Sunday, December 02, 2007 4:03 AM
> To: linux-raid@vger.kernel.org
> Subject: Abysmal write performance on HW RAID5
> 
> 
> In the process of upgrading my RAID5 array, I've run into a brick wall 
(<
> 4MB/sec avg write perf!) that I could use some help figuring out.  
I'll
> start with the quick backstory and setup.
> 
> Common Setup:
> 
> Dell Dimension XPS T800, salvaged from Mom. (i440BX chipset, Pentium3 
@
> 800MHZ)
> 768MB DDR SDRAM @ 100MHZ FSB  (3x256MB DIMM)
> PCI vid card (ATI Rage 128)
> PCI 10/100 NIC (3Com 905)
> PCI RAID controller (LSI MegaRAID i4 - 4 channel PATA)
> 4 x 250GB (WD2500) UltraATA drives, each connected to separate 
channels on
> the controller
> Ubuntu Feisty Fawn
> 
> In the LSI BIOS config, I setup the full capacity of all four drives 
as a
> single logical disk using RAID5 @ 64K strips size.  I installed the OS
> from
> the CD, allowing it to create a 4GB swap partition (sda2) and use the 
rest
> as a single ext3 partition (sda1) with roughly 700GB space.
> 
> This setup ran fine for months as my home fileserver.  Being new to 
RAID
> at
> the time, I didn't know or think about tuning or benchmarking, etc, 
etc.
> I
> do know that I often moved ISO images to this machine from my gaming 
rig
> using both SAMBA and FTP, with xfer limited by the 100MBit LAN
> (~11MB/sec).

That sounds about right; 11MB * 8 (bit/Byte) = 88Mbit on your 100M LAN.

> 
> About a month or so ago, I hit capacity on the partition.  I dumped 
some
> movies off to a USB drive (500GB PATA) and started watching the drive
> aisle
> at Fry's.  Last week, I saw what I'd been waiting for: Maxtor 500GB 
drives
> @
> $99 each.  So, I bought three of them and started this adventure.
> 
> 
> I'll skip the details on the pain in the butt of moving 700GB of data 
onto
> various drives of various sizes...the end result was the following 
change
> to
> my setup:
> 
> 3 x Maxtor 500GB PATA drives (7200rpm, 16MB cache)
> 1 x IBM/Hitachi Deskstar 500GB PATA (7200rpm, 8MB cache)
> 
> Each drive still on a separate controller channel, this time 
configured
> into
> two logical drives:
> Logical Disk 1:  RAID0, 16GB, 64K stripe size (sda)
> Logical Disk 2:  RAID5, 1.5TB, 128K stripe size (sdb)
> 
> 
> I also took this opportunity to upgrade to the newest Ubuntu 7.10 
(Gutsy),
> and having done some reading, planned to make some tweaks to the 
partition
> formats.  After fighting with the standard CD, which refused to 
install
> the
> OS without also formatting the root partition (but not offering any
> control
> of the formatting), i downloaded the "alternate CD" and used the 
textmode
> installer.
> 
> I set up the partitions like this:
> sda1: 14.5GB ext3, 256MB journal (mounted data_ordered), 4K block 
size,
> stride=16, sparse superblocks, no resize_inode, 1GB reserved for root
> sda2: 1.5GB linux swap
> sdb1: 1.5TB ext2, largefile4 (4MB per inode), stride=32, sparse
> superblocks,
> no resize_inode, 0 reserved for root
> 
> The format command was my first hint of a problem.  The block group
> creation
> counter spun very rapidly up to 9800/11600 and then paused and I heard 
the
> drives thrash.  The block groups completed at a slower pace, and then 
the
> final creation process took several minutes.
> 
> But the real shocker was transferring my data onto this new partition.
> FOUR
> MEGABYTES PER SECOND?!?!
> 
> My initial plan was to plug a single old data drive into the 
motherboard's
> ATA port, thinking the transfer speed within a single machine would be 
the
> fastest possible mechanism.  Wrong.  I ended up mounting the drives 
using
> USB enclosures to my laptop (RedHat EL 5.1) and sharing them via NFS.
> 
> So, deciding the partition was disposable (still unused), I fired up 
dd to
> run some block device tests:
> dd if=/dev/zero of=/dev/sdb bs=1M count=25
> 
> This ran silently and showed 108MB/sec??  OK, that beats 4...let's try
> again!  Now I hear drive activity, and the result says 26MB/sec.  
Running
> it
> a third time immediately brought the rate down to 4MB/sec.  
Apparently,
> the
> first 64MB or so runs nice and fast (cache? the i4 only has 16MB 
onboard).
> 
> I also ran iostat -dx in the background during a 26GB directory copy
> operation, reporting on 60-sec intervals.  This is a typical output:
> 
> Device:rrqm/s  wrqm/sr/sw/srMB/s  wMB/s  avgrq-sz  
avgqu-
> sz
> awaitsvctm  %util
> sda  0.00 0.18  0.00  0.48   0.00   0.0011.03
> 0.01 21.6616.73   0.61
> sdb  0.00 0.72  0.03  64.28  0.00   3.95   125.43
> 137.572180.23  15.85   100.02
> 
> 
> So, the RAID5 device has a huge queue of write requests with an 
average
> wait
> time of more than 2 seconds @ 100% utilization?  Or is this a bug in
> iostat?
> 
> At this point, I'm all ears...I don't even know where to start.  Is 
ext2
> not
> a good format for volumes of this size?  Then h

Re: Spontaneous rebuild

2007-12-02 Thread Oliver Martin
Justin Piszcz schrieb:
> 
> It rebuilds the array because 'something' is causing device
> resets/timeouts on your USB device:
> 
> Dec  1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device
> using ehci_hcd and address 4
> 
> Naturally, when it is reset, the device is disconnected and then
> re-appears, when MD see's this it rebuilds the array.
> 
> Why it is timing out/resetting the device, that is what you need to find
> out.
> 
> Justin.
> 

Thanks for your answer, I'll investigate the USB resets. Still, it seems
strange that the rebuild only started five hours after the reset. Is
this normal?
The reason I said the resets don't seem to hurt is that I also get them
for a second disk (not in a raid), and file transfers aren't
interrupted, I haven't (yet?) seen any data corruption, and other than
the message, the kernel doesn't seem to mind at all.

BTW, this time, badblocks ran through without any errors. The only
strange thing remaining is the rebuild.


Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Spontaneous rebuild

2007-12-02 Thread Justin Piszcz



On Sun, 2 Dec 2007, Oliver Martin wrote:


[Please CC me on replies as I'm not subscribed]

Hello!

I've been experimenting with software RAID a bit lately, using two
external 500GB drives. One is connected via USB, one via Firewire. It is
set up as a RAID5 with LVM on top so that I can easily add more drives
when I run out of space.
About a day after the initial setup, things went belly up. First, EXT3
reported strange errors:
EXT3-fs error (device dm-0): ext3_new_block: Allocating block in system
zone - blocks from 106561536, length 1
EXT3-fs error (device dm-0): ext3_new_block: Allocating block in system
zone - blocks from 106561537, length 1
...

There were literally hundreds of these, and they came back immediately
when I reformatted the array. So I tried ReiserFS, which worked fine for
about a day. Then I got errors like these:
ReiserFS: warning: is_tree_node: node level 0 does not match to the
expected one 2
ReiserFS: dm-0: warning: vs-5150: search_by_key: invalid format found in
block 69839092. Fsck?
ReiserFS: dm-0: warning: vs-13070: reiserfs_read_locked_inode: i/o
failure occurred trying to find stat data of [6 10 0x0 SD]

Again, hundreds. So I ran badblocks on the LVM volume, and it reported
some bad blocks near the end. Running badblocks on the md array worked,
so I recreated the LVM stuff and attributed the failures to undervolting
experiments I had been doing (this is my old laptop running as a server).

Anyway, the problems are back: To test my theory that everything is
alright with the CPU running within its specs, I removed one of the
drives while copying some large files yesterday. Initially, everything
seemed to work out nicely, and by the morning, the rebuild had finished.
Again, I unmounted the filesystem and ran badblocks -svn on the LVM. It
ran without gripes for some hours, but just now I saw md had started to
rebuild the array again out of the blue:

Dec  1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device
using ehci_hcd and address 4
Dec  2 01:06:02 quassel kernel: md: data-check of RAID array md0
Dec  2 01:06:02 quassel kernel: md: minimum _guaranteed_  speed: 1000
KB/sec/disk.
Dec  2 01:06:02 quassel kernel: md: using maximum available idle IO
bandwidth (but not more than 20 KB/sec) for data-check.
Dec  2 01:06:02 quassel kernel: md: using 128k window, over a total of
488383936 blocks.
Dec  2 03:57:24 quassel kernel: usb 4-5.2: reset high speed USB device
using ehci_hcd and address 4

I'm not sure the USB resets are related to the problem - device 4-5.2 is
part of the array, but I get these sometimes at random intervals and
they don't seem to hurt normally. Besides, the first one was long before
the rebuild started, and the second one long afterwards.

Any ideas why md is rebuilding the array? And could this be related to
the bad blocks problem I had first? badblocks is still running, I'll
post an update when it is finished.
In the meantime, mdadm --detail /dev/md0 and mdadm --examine
/dev/sd[bc]1 don't give me any clues as to what went wrong, both disks
are marked as "active sync", and the whole array is "active, recovering".

Before I forget, I'm running 2.6.23.1 with this config:
http://stud4.tuwien.ac.at/~e0626486/config-2.6.23.1-hrt3-fw

Thanks,
Oliver
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



It rebuilds the array because 'something' is causing device 
resets/timeouts on your USB device:


Dec  1 20:04:49 quassel kernel: usb 4-5.2: reset high speed USB device
using ehci_hcd and address 4

Naturally, when it is reset, the device is disconnected and then 
re-appears, when MD see's this it rebuilds the array.


Why it is timing out/resetting the device, that is what you need to find 
out.


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)

2007-12-02 Thread Justin Piszcz



On Sat, 1 Dec 2007, Justin Piszcz wrote:




On Sat, 1 Dec 2007, Janek Kozicki wrote:

Justin Piszcz said: (by the date of Sat, 1 Dec 2007 07:23:41 -0500 
(EST))



dd if=/dev/zero of=/dev/sdc


The purpose is with any new disk its good to write to all the blocks and
let the drive to all of the re-mapping before you put 'real' data on it.
Let it crap out or fail before I put my data on it.


better use badblocks. It writes data, then reads it afterwards:
In this example the data is semi random (quicker than /dev/urandom ;)

badblocks -c 10240 -s -w -t random -v /dev/sdc

--
Janek Kozicki |
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Will give this a shot and see if I can reproduce the error, thanks.



The badblocks did not do anything; however, when I built a software raid 5 
and the performed a dd:


/usr/bin/time dd if=/dev/zero of=fill_disk bs=1M

I saw this somewhere along the way:

[30189.967531] RAID5 conf printout:
[30189.967576]  --- rd:3 wd:3
[30189.967617]  disk 0, o:1, dev:sdc1
[30189.967660]  disk 1, o:1, dev:sdd1
[30189.967716]  disk 2, o:1, dev:sde1
[42332.936615] ata5.00: exception Emask 0x2 SAct 0x7000 SErr 0x0 action 
0x2 frozen
[42332.936706] ata5.00: spurious completions during NCQ issue=0x0 
SAct=0x7000 FIS=004040a1:0800
[42332.936804] ata5.00: cmd 61/08:60:6f:4d:2a/00:00:27:00:00/40 tag 12 cdb 
0x0 data 4096 out
[42332.936805]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 
(HSM violation)
[42332.936977] ata5.00: cmd 61/08:68:77:4d:2a/00:00:27:00:00/40 tag 13 cdb 
0x0 data 4096 out
[42332.936981]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 
(HSM violation)
[42332.937162] ata5.00: cmd 61/00:70:0f:49:2a/04:00:27:00:00/40 tag 14 cdb 
0x0 data 524288 out
[42332.937163]  res 40/00:74:0f:49:2a/00:00:27:00:00/40 Emask 0x2 
(HSM violation)

[42333.240054] ata5: soft resetting port
[42333.494462] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[42333.506592] ata5.00: configured for UDMA/133
[42333.506652] ata5: EH complete
[42333.506741] sd 4:0:0:0: [sde] 1465149168 512-byte hardware sectors 
(750156 MB)

[42333.506834] sd 4:0:0:0: [sde] Write Protect is off
[42333.506887] sd 4:0:0:0: [sde] Mode Sense: 00 3a 00 00
[42333.506905] sd 4:0:0:0: [sde] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA


Next test, I will turn off NCQ and try to make the problem re-occur.
If anyone else has any thoughts here..?
I ran long smart tests on all 3 disks, they all ran successfully.

Perhaps these drives need to be NCQ BLACKLISTED with the P35 chipset?

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html