Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-26 Thread Bill Davidsen

Mr. James W. Laferriere wrote:


Hello Gabor ,

On Tue, 20 Jun 2006, Gabor Gombas wrote:


On Tue, Jun 20, 2006 at 03:08:59PM +0200, Niccolo Rigacci wrote:


Do you know if it is possible to switch the scheduler at runtime?


echo cfq  /sys/block/disk/queue/scheduler



At least one can do a ls of the /sys/block area  then do an 
automated
echo cfq down the tree .  Does anyone know of a method to set a 
default
scheduler ?  Scanning down a list or manually maintaining a list 
seems

to be a bug in the waiting .  Tia ,  JimL


Thought I posted this... it can be set in kernel build or on the bloot 
parameters from grub/lilo.


2nd thought: set it to cfq by default, then at the END of rc.local, if 
there are no arrays rebuilding, change to something else if you like.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-20 Thread Niccolo Rigacci
On Mon, Jun 19, 2006 at 05:05:56PM +0200, Gabor Gombas wrote:
 
 IMHO a much better fix is to use the cfq I/O scheduler during the
 rebuild.

Yes, changing the default I/O to DEFAULT_CFQ solve the problem 
very well, I get over 40 Mb/s resync speed with no lock-up at 
all!

Thank you very much, I think we can elaborate a new FAQ entry.

Do you know if it is possible to switch the scheduler at runtime?

-- 
Niccolo Rigacci
Firenze - Italy

Iraq, missione di pace: 38475 morti - www.iraqbodycount.net
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-20 Thread Mr. James W. Laferriere

Hello Gabor ,

On Tue, 20 Jun 2006, Gabor Gombas wrote:

On Tue, Jun 20, 2006 at 03:08:59PM +0200, Niccolo Rigacci wrote:

Do you know if it is possible to switch the scheduler at runtime?

echo cfq  /sys/block/disk/queue/scheduler


At least one can do a ls of the /sys/block area  then do an automated
echo cfq down the tree .  Does anyone know of a method to set a default
scheduler ?  Scanning down a list or manually maintaining a list seems
to be a bug in the waiting .  Tia ,  JimL
--
+--+
| James   W.   Laferriere |   SystemTechniques   | Give me VMS |
| NetworkEngineer | 3600 14th Ave SE #20-103 |  Give me Linux  |
| [EMAIL PROTECTED] |  Olympia ,  WA.   98501  |   only  on  AXP |
+--+
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-20 Thread Niccolo Rigacci
   At least one can do a ls of the /sys/block area  then do an 
   automated
   echo cfq down the tree .  Does anyone know of a method to set a 
   default
   scheduler ?

My be I didn't understand the question...

You decide what schedulers are available at kernel compile time, 
also at kernel compile time you decide which is the default i/o 
scheduler.

-- 
Niccolo Rigacci
Firenze - Italy

Iraq, missione di pace: 38475 morti - www.iraqbodycount.net
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-20 Thread Gabor Gombas
On Tue, Jun 20, 2006 at 08:00:13AM -0700, Mr. James W. Laferriere wrote:

   At least one can do a ls of the /sys/block area  then do an 
   automated
   echo cfq down the tree .  Does anyone know of a method to set a 
   default
   scheduler ?

RTFM: Documentation/kernel-parameters.txt in the kernel source.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-19 Thread Gabor Gombas
On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote:

 I was able to work around this by lowering 
 /proc/sys/dev/raid/speed_limit_max to a value
 below my disk thruput value (~ 50 MB/s) as follows:

IMHO a much better fix is to use the cfq I/O scheduler during the
rebuild. The default anticipatory scheduler gives horrible latencies
and can cause the machine to appear as 'locked up' if there is heavy
I/O load like a RAID reconstruct or heavy database usage.

The price of cfq is lower throughput (higher RAID rebuild time) than
with the anticipatory I/O scheduler.

Gabor

-- 
 -
 MTA SZTAKI Computer and Automation Research Institute
Hungarian Academy of Sciences
 -
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-17 Thread Niccolo Rigacci
On Thursday 15 June 2006 12:13, you wrote:
 If this is causing a lockup, then there is something else wrong, just
 as any single process should not - by writing constantly to disks - be
 able to clog up the whole system.

 Maybe if you could get the result of
   alt-sysrq-P

I tried some kernel changes enabling the HyperThreading on the (single) P4 
processor and enabling CONFIG_PREEMPT_VOLUNTARY=y, but with no success.

During the lookup Alt-SysRq-P constantly says that:

  EIP is at mwait_idle+0x1a/0x2e

While Alt-SysRq-T shows - among other processes - the MD syncing and the bash 
looked-up; this is the hand-copied call traces:

md3_resync
  device_barrier
  default_wake_function
  sync_request
  __generic_unplug_device
  md_do_sync
  schedule
  md_thread
  md_thread
  kthread
  kthread
  kernel_thread_helper

bash
  io_schedule
  sync_buffer
  sync_buffer
  __wait_on_bit_lock
  sync_buffer
  out_of_line_wait_on_bit_lock
  wake_bit_function
  __lock_buffer
  do_get_write_access
  __ext3_get_inode_loc
  jurnal_get_write_access
  ext3_reserve_inode_write
  ext3_mark_inode_dirty
  ext3_dirty_inode
  __mark_inode_dirty
  update_atime
  vfs_readdir
  sys_getdents64
  filldir64
  syscall_call


This is also the top output, which runs regularly during the lookup:

top - 11:40:41 up 7 min,  2 users,  load average: 8.70, 4.92, 2.04
Tasks:  70 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2% us,  0.7% sy,  0.0% ni, 98.7% id,  0.0% wa,  0.0% hi,  0.5% si
Mem:906212k total,58620k used,   847592k free, 3420k buffers
Swap:  1951736k total,0k used,  1951736k free,23848k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  829 root  10  -5 000 S1  0.0   0:01.70 md3_raid1
 2823 root  10  -5 000 D1  0.0   0:01.62 md3_resync
1 root  16   0  1956  656  560 S0  0.1   0:00.52 init
2 root  RT   0 000 S0  0.0   0:00.00 migration/0
3 root  34  19 000 S0  0.0   0:00.00 ksoftirqd/0
4 root  RT   0 000 S0  0.0   0:00.00 watchdog/0
5 root  RT   0 000 S0  0.0   0:00.00 migration/1
6 root  34  19 000 S0  0.0   0:00.00 ksoftirqd/1
7 root  RT   0 000 S0  0.0   0:00.00 watchdog/1
8 root  10  -5 000 S0  0.0   0:00.01 events/0
9 root  10  -5 000 S0  0.0   0:00.01 events/1
   10 root  10  -5 000 S0  0.0   0:00.00 khelper
   11 root  10  -5 000 S0  0.0   0:00.00 kthread
   14 root  10  -5 000 S0  0.0   0:00.00 kblockd/0
   15 root  10  -5 000 S0  0.0   0:00.00 kblockd/1
   16 root  11  -5 000 S0  0.0   0:00.00 kacpid
  152 root  20   0 000 S0  0.0   0:00.00 pdflush
  153 root  15   0 000 D0  0.0   0:00.00 pdflush
  154 root  17   0 000 S0  0.0   0:00.00 kswapd0
  155 root  11  -5 000 S0  0.0   0:00.00 aio/0
  156 root  11  -5 000 S0  0.0   0:00.00 aio/1
  755 root  10  -5 000 S0  0.0   0:00.00 kseriod
  796 root  10  -5 000 S0  0.0   0:00.00 ata/0
  797 root  11  -5 000 S0  0.0   0:00.00 ata/1
  799 root  11  -5 000 S0  0.0   0:00.00 scsi_eh_0
  800 root  11  -5 000 S0  0.0   0:00.00 scsi_eh_1
  825 root  15   0 000 S0  0.0   0:00.00 kirqd
  831 root  10  -5 000 D0  0.0   0:00.00 md2_raid1
  833 root  10  -5 000 S0  0.0   0:00.00 md1_raid1
  834 root  10  -5 000 D0  0.0   0:00.00 md0_raid1
  835 root  15   0 000 D0  0.0   0:00.00 kjournald
  932 root  18  -4  2192  584  368 S0  0.1   0:00.19 udevd
 1698 root  10  -5 000 S0  0.0   0:00.00 khubd
 2031 root  22   0 000 S0  0.0   0:00.00 kjournald
 2032 root  15   0 000 D0  0.0   0:00.00 kjournald
 2142 daemon16   0  1708  364  272 S0  0.0   0:00.00 portmap
 2464 root  16   0  2588  932  796 S0  0.1   0:00.01 syslogd

-- 
Niccolo Rigacci
Firenze - Italy

War against Iraq? Not in my name!
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-15 Thread Niccolo Rigacci
On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote:
 Niccolo Rigacci wrote:
 
 When the sync is complete, the machine start to respond again 
 perfectly.
 
 I was able to work around this by lowering 
 /proc/sys/dev/raid/speed_limit_max to a value
 below my disk thruput value (~ 50 MB/s) as follows:
 
 $ echo 45000  /proc/sys/dev/raid/speed_limit_max

Thanks!

This hack seems to solve my problem too. So it seems that the 
RAID subsystem does not detect a proper speed to throttle the 
sync.

Can you please send me some details of your system?

- SATA chipset (or motherboard model)?
- Disks make/model?
- Do you have the config file of the kernel that you was running
  (look at /boot/config-version file)?

I wonder if kernel preemption can be blamed for that, or burst 
speed of disks can fool the throttle calculation.

-- 
Niccolo Rigacci
Firenze - Italy

Iraq, missione di pace: 38355 morti - www.iraqbodycount.net
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-15 Thread Neil Brown
On Thursday June 15, [EMAIL PROTECTED] wrote:
 On Wed, Jun 14, 2006 at 10:46:09AM -0500, Bill Cizek wrote:
  Niccolo Rigacci wrote:
  
  When the sync is complete, the machine start to respond again 
  perfectly.
  
  I was able to work around this by lowering 
  /proc/sys/dev/raid/speed_limit_max to a value
  below my disk thruput value (~ 50 MB/s) as follows:
  
  $ echo 45000  /proc/sys/dev/raid/speed_limit_max
 
 Thanks!
 
 This hack seems to solve my problem too. So it seems that the 
 RAID subsystem does not detect a proper speed to throttle the 
 sync.

The RAID subsystem doesn't try to detect a 'proper' speed.
When there is nothing else happening, it just drives the disks as fast
as they will go.
If this is causing a lockup, then there is something else wrong, just
as any single process should not - by writing constantly to disks - be
able to clog up the whole system.

Maybe if you could get the result of 
  alt-sysrq-P
or even
  alt-sysrq-T
while the system seems to hang.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IBM xSeries stop responding during RAID1 reconstruction

2006-06-14 Thread Bill Cizek

Niccolo Rigacci wrote:


Hi to all,

I have a new IBM xSeries 206m with two SATA drives, I installed a 
Debian Testing (Etch) and configured a software RAID as shown:


Personalities : [raid1]
md1 : active raid1 sdb5[1] sda5[0]
 1951744 blocks [2/2] [UU]

md2 : active raid1 sdb6[1] sda6[0]
 2931712 blocks [2/2] [UU]

md3 : active raid1 sdb7[1] sda7[0]
 39061952 blocks [2/2] [UU]

md0 : active raid1 sdb1[1] sda1[0]
 582 blocks [2/2] [UU]

I experience this problem: whenever a volume is reconstructing 
(syncing), the system stops responding. The machine is alive, 
because it responds to the ping, the console is responsive but I 
cannot pass the login prompt. It seems that every disk activity 
is delayed and blocking.


When the sync is complete, the machine start to respond again 
perfectly.


Any hints on how to start debugging?
 



I ran into a similar problem using kernel 2.6.16.14 on an ASUS 
motherboard:  When I
mirrored two SATA drives it seemed to block all other disk I/O until the 
sync was complete.


My symptoms were the same:  all consoles were non-responsive and when I 
tried to login

it just sat there until the sync was complete.

I was able to work around this by lowering 
/proc/sys/dev/raid/speed_limit_max to a value

below my disk thruput value (~ 50 MB/s) as follows:

$ echo 45000  /proc/sys/dev/raid/speed_limit_max

That kept my system usable but didn't address the underlying problem of 
the raid
resync not being appropriately throttled.  I ended up configuring my 
system differently

so this became a moot point for me.

Hope this helps,
Bill




-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html