date:20071019

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël


Ming Zhang wrote:


as Ross pointed out, many io pattern only have 1 outstanding io at any
time, so there is only one work thread actively to serve it. so it can
not exploit the multiple core here.


you see 100% at nullio or fileio? with disk, most time should spend on
iowait and cpu utilization should not high at all.


With both nullio and fileio...
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Time to deprecate old RAID formats?

2007-10-19 Thread John Stoffel


So, 

Is it time to start thinking about deprecating the old 0.9, 1.0 and
1.1 formats to just standardize on the 1.2 format?  What are the
issues surrounding this?

It's certainly easy enough to change mdadm to default to the 1.2
format and to require a --force switch to  allow use of the older
formats.

I keep seeing that we support these old formats, and it's never been
clear to me why we have four different ones available?  Why can't we
start defining the canonical format for Linux RAID metadata?

Thanks,
John
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël


Ming Zhang wrote:

On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote:

Ross S. W. Walker wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:
I can format serveral times (mkfs.ext3) a 1.5 TB volume 
over iSCSI 
without any trouble. I can read and write on this virtual 
disk without 

any trouble.

Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

I'm waitinfor a crash. No one when I write these lines. 
   I suspect 

an interaction between raid and iscsi.

I simultanely run :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
8397210+0 records in
8397210+0 records out
68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s

and

Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
739200+0 records in
739199+0 records out
6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s

without any trouble.

The speed can definitely be improved. Look at your network setup
and use ping to try and get the network latency to a minimum.

# ping -A -s 8192 172.16.24.140

--- 172.16.24.140 ping statistics ---
14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms

gershwin:[~]  ping -A -s 8192 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms

--- 192.168.0.2 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 2400ms
rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms
gershwin:[~] 

	Both initiator and target are alone on a gigabit NIC (Tigon3). On 
target server, istd1 takes 100% of a CPU (and only one CPU, even my 
T1000 can simultaneous run 32 threads). I think the limitation comes 
from istd1.


usually istdx will not take 100% cpu with 1G network, especially when
using disk as back storage, some kind of profiling work might be helpful
to tell what happened...

forgot to ask, your sparc64 platform cpu spec.


Root gershwin:[/mnt/solaris]  cat /proc/cpuinfo
cpu : UltraSparc T1 (Niagara)
fpu : UltraSparc T1 integrated FPU
prom: OBP 4.23.4 2006/08/04 20:45
type: sun4v
ncpus probed: 24
ncpus active: 24
D$ parity tl1   : 0
I$ parity tl1   : 0

Both servers are built with 1 GHz T1 processors (6 cores, 24 threads).

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, Doug Ledford wrote:


On Fri, 2007-10-19 at 13:05 -0400, Justin Piszcz wrote:


I'm sure an internal bitmap would.  On RAID1 arrays, reads/writes are
never split up by a chunk size for stripes.  A 2mb read is a single
read, where as on a raid4/5/6 array, a 2mb read will end up hitting a
series of stripes across all disks.  That means that on raid1 arrays,
total disk seeks  total reads/writes, where as on a raid4/5/6, total
disk seeks is usually  total reads/writes.  That in turn implies that
in a raid1 setup, disk seek time is important to performance, but not
necessarily paramount.  For raid456, disk seek time is paramount because
of how many more seeks that format uses.  When you then use an internal
bitmap, you are adding writes to every member of the raid456 array,
which adds more seeks.  The same is true for raid1, but since raid1
doesn't have the same level of dependency on seek rates that raid456
has, it doesn't show the same performance hit that raid456 does.



Got it, so for RAID1 it would make sense if LILO supported it (the
later versions of the md superblock)


Lilo doesn't know anything about the superblock format, however, lilo
expects the raid1 device to start at the beginning of the physical
partition.  In otherwords, format 1.0 would work with lilo.
Did not work when I tried 1.x with LILO, switched back to 00.90.03 and it 
worked fine.





 (for those who use LILO) but for
RAID4/5/6, keep the bitmaps away :)


I still use an internal bitmap regardless ;-)  To help mitigate the cost
of seeks on raid456, you can specify a huge chunk size (like 256k to 2MB
or somewhere in that range).  As long as you can get 90%+ of your
reads/writes to fall into the space of a single chunk, then you start
performing more like a raid1 device without the extra seek overhead.  Of
course, this comes at the expense of peak throughput on the device.
Let's say you were building a mondo movie server, where you were
streaming out digital movie files.  In that case, you very well may care
more about throughput than seek performance since I suspect you wouldn't
have many small, random reads.  Then I would use a small chunk size,
sacrifice the seek performance, and get the throughput bonus of parallel
reads from the same stripe on multiple disks.  On the other hand, if I
was setting up a mail server then I would go with a large chunk size
because the filesystem activities themselves are going to produce lots
of random seeks, and you don't want your raid setup to make that problem
worse.  Plus, most mail doesn't come in or go out at any sort of massive
streaming speed, so you don't need the paralllel reads from multiple
disks to perform well.  It all depends on your particular use scenario.

--
Doug Ledford [EMAIL PROTECTED]
 GPG KeyID: CFBFF194
 http://people.redhat.com/dledford

Infiniband specific RPMs available at
 http://people.redhat.com/dledford/Infiniband


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël


Bill Davidsen wrote:

Dan Williams wrote:

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
  

I run for 12 hours some dd's (read and write in nullio)
between
initiator and target without any disconnection. Thus iSCSI code seems
to
be robust. Both initiator and target are alone on a single gigabit
ethernet link (without any switch). I'm investigating...



Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.


Given that offload engines are far less tested code, I think this is a 
very good thing to try!


	I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one CPU 
when I rebuild my raid1 array. 1% of this array was now resynchronized 
without any hang.


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  []  recovery =  1.0% (15705536/1464725632) 
finish=1103.9min speed=21875K/sec


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread BERTRAND Joël


BERTRAND Joël wrote:

Bill Davidsen wrote:

Dan Williams wrote:

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
 

I run for 12 hours some dd's (read and write in nullio)
between
initiator and target without any disconnection. Thus iSCSI code seems
to
be robust. Both initiator and target are alone on a single gigabit
ethernet link (without any switch). I'm investigating...



Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.


Given that offload engines are far less tested code, I think this is a 
very good thing to try!


I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one 
CPU when I rebuild my raid1 array. 1% of this array was now 
resynchronized without any hang.


Root gershwin:[/usr/scripts]  cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md7 : active raid1 sdi1[2] md_d0p1[0]
  1464725632 blocks [2/1] [U_]
  []  recovery =  1.0% (15705536/1464725632) 
finish=1103.9min speed=21875K/sec


Same result...

connection2:0: iscsi: detected conn error (1011)

 session2: iscsi: session recovery timed out after 120 secs
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery

Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Iustin Pop

On Fri, Oct 19, 2007 at 02:39:47PM -0400, John Stoffel wrote:
 And if putting the superblock at the end is problematic, why is it the
 default?  Shouldn't version 1.1 be the default?  

In my opinion, having the superblock *only* at the end (e.g. the 0.90
format) is the best option.

It allows one to mount the disk separately (in case of RAID 1), if the
MD superblock is corrupt or you just want to get easily at the raw data.

As to the people who complained exactly because of this feature, LVM has
two mechanisms to protect from accessing PVs on the raw disks (the
ignore raid components option and the filter - I always set filters when
using LVM ontop of MD).

regards,
iustin
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ming Zhang

On Fri, 2007-10-19 at 23:04 +0200, BERTRAND Joël wrote:
 BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus iSCSI code seems
  to
  be robust. Both initiator and target are alone on a single gigabit
  ethernet link (without any switch). I'm investigating...
  
 
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
 
  Given that offload engines are far less tested code, I think this is 
  a very good thing to try!
 
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
  
  Same result...
  
  connection2:0: iscsi: detected conn error (1011)
  
   session2: iscsi: session recovery timed out after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
 
   Sorry for this last mail. I have found another mistake, but I don't 
 know if this bug comes from iscsi-target or raid5 itself. iSCSI target 
 is disconnected because istd1 and md_d0_raid5 kernel threads use 100% of 
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
 Swap:  7815536k total,0k used,  7815536k free,64808k cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1 
 
   5599 root  15  -5 000 R  100  0.0   7:25.43 
 md_d0_raid5
 

i would rather use oprofile to check where cpu cycles went to.


   Regards,
 
   JKB
 
 -
 This SF.net email is sponsored by: Splunk Inc.
 Still grepping through log files to find problems?  Stop.
 Now Search log events and configuration files using AJAX and a browser.
 Download your FREE copy of Splunk now  http://get.splunk.com/
 ___
 Iscsitarget-devel mailing list
 [EMAIL PROTECTED]
 https://lists.sourceforge.net/lists/listinfo/iscsitarget-devel
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ross S. W. Walker

BERTRAND Joël wrote:
 
 Ross S. W. Walker wrote:
  BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus 
  iSCSI code seems
  to
  be robust. Both initiator and target are alone on a 
  single gigabit
  ethernet link (without any switch). I'm investigating...
  
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, 
  but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' 
  will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
  Given that offload engines are far less tested code, I 
  think this is a 
  very good thing to try!
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only uses 
  40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% 
  (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
 Same result...
 
  connection2:0: iscsi: detected conn error (1011)
   
session2: iscsi: session recovery timed out 
 after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  
  I am unsure why you would want to setup an iSCSI RAID1, but before
  doing so I would try to verify that each independant iSCSI session
  is bullet proof.
 
   I use one and only one iSCSI session. Raid1 array is 
 built between a 
 local and iSCSI volume.

Oh, in that case you will be much better served with DRBD, which
would provide you with what you want without creating a Frankenstein
setup...

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Dan Williams

On Fri, 2007-10-19 at 14:04 -0700, BERTRAND Joël wrote:
 
 Sorry for this last mail. I have found another mistake, but I
 don't
 know if this bug comes from iscsi-target or raid5 itself. iSCSI target
 is disconnected because istd1 and md_d0_raid5 kernel threads use 100%
 of
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,10136k
 buffers
 Swap:  7815536k total,0k used,  7815536k free,64808k
 cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 
   5599 root  15  -5 000 R  100  0.0   7:25.43
 md_d0_raid5

What is the output of:
cat /proc/5824/wchan
cat /proc/5599/wchan

Thanks,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Bill Davidsen


BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the CPU 
is being used. When you can't trust what your tools tell you it gets 
really hard to make decisions based on the data.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: mdadm 2.6.x regression, fails creation of raid1 w/ v1.0 sb and internal bitmap

2007-10-19 Thread Mike Snitzer

On 10/19/07, Neil Brown [EMAIL PROTECTED] wrote:
 On Friday October 19, [EMAIL PROTECTED] wrote:

  I'm using a stock 2.6.19.7 that I then backported various MD fixes to
  from 2.6.20 - 2.6.23...  this kernel has worked great until I
  attempted v1.0 sb w/ bitmap=internal using mdadm 2.6.x.
 
  But would you like me to try a stock 2.6.22 or 2.6.23 kernel?

 Yes please.
 I'm suspecting the code in write_sb_page where it tests if the bitmap
 overlaps the data or metadata.  The only way I can see you getting the
 exact error that you do get it for that to fail.
 That test was introduced in 2.6.22.  Did you backport that?  Any
 chance it got mucked up a bit?

I believe you're referring to commit
f0d76d70bc77b9b11256a3a23e98e80878be1578.  That change actually made
it into 2.6.23 AFAIK; but yes I actually did backport that fix (which
depended on ab6085c795a71b6a21afe7469d30a365338add7a).

If I back-out f0d76d70bc77b9b11256a3a23e98e80878be1578 I can create a
raid1 w/ v1.0 sb and an internal bitmap.  But clearly that is just
because I removed the negative checks that you introduced ;)

For me this begs the question: what else would
f0d76d70bc77b9b11256a3a23e98e80878be1578 depend on that I missed?  I
included 505fa2c4a2f125a70951926dfb22b9cf273994f1 and
ab6085c795a71b6a21afe7469d30a365338add7a too.

*shrug*...

Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Bill Davidsen


Bill Davidsen wrote:

BERTRAND Joël wrote:


Sorry for this last mail. I have found another mistake, but I 
don't know if this bug comes from iscsi-target or raid5 itself. iSCSI 
target is disconnected because istd1 and md_d0_raid5 kernel threads 
use 100% of CPU each !


Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi,  
0.0%si, 0.0%st

Mem:   4139032k total,   218424k used,  3920608k free,10136k buffers
Swap:  7815536k total,0k used,  7815536k free,64808k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 5824 root  15  -5 000 R  100  0.0  10:34.25 istd1
 5599 root  15  -5 000 R  100  0.0   7:25.43 md_d0_raid5


Given that the summary shows 87.4% idle, something is not right. You 
might try another tool, like vmstat, to at least verify the way the 
CPU is being used. When you can't trust what your tools tell you it 
gets really hard to make decisions based on the data.


ALSO: you have zombie processes. Looking at machines up for 45, 54, and 
470 days, zombies are *not* something you just have to expect. Do you 
get these just about the same time things go to hell? Better you than 
me, I suspect there are still many ways to have a learning experience 
with iSCSI.


Hope that and the summary confusion result in some useful data.

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Doug Ledford

On Fri, 2007-10-19 at 23:23 +0200, Iustin Pop wrote:
 On Fri, Oct 19, 2007 at 02:39:47PM -0400, John Stoffel wrote:
  And if putting the superblock at the end is problematic, why is it the
  default?  Shouldn't version 1.1 be the default?  
 
 In my opinion, having the superblock *only* at the end (e.g. the 0.90
 format) is the best option.
 
 It allows one to mount the disk separately (in case of RAID 1), if the
 MD superblock is corrupt or you just want to get easily at the raw data.

Bad reasoning.  It's the reason that the default is at the end of the
device, but that was a bad decision made by Ingo long, long ago in a
galaxy far, far away.

The simple fact of the matter is there are only two type of raid devices
for the purpose of this issue: those that fragment data (raid0/4/5/6/10)
and those that don't (raid1, linear).

For the purposes of this issue, there are only two states we care about:
the raid array works or doesn't work.

If the raid array works, then you *only* want the system to access the
data via the raid array.  If the raid array doesn't work, then for the
fragmented case you *never* want the system to see any of the data from
the raid array (such as an ext3 superblock) or a subsequent fsck could
see a valid superblock and actually start a filesystem scan on the raw
device, and end up hosing the filesystem beyond all repair after it hits
the first chunk size break (although in practice this is usually a
situation where fsck declares the filesystem so corrupt that it refuses
to touch it, that's leaving an awful lot to chance, you really don't
want fsck to *ever* see that superblock).

If the raid array is raid1, then the raid array should *never* fail to
start unless all disks are missing (in which case there is no raw device
to access anyway).  The very few failure types that will cause the raid
array to not start automatically *and* still have an intact copy of the
data usually happen when the raid array is perfectly healthy, in which
case automatically finding a constituent device when the raid array
failed to start is exactly the *wrong* thing to do (for instance, you
enable SELinux on a machine and it hasn't been relabeled and the raid
array fails to start because /dev/mdblah can't be created because of
an SELinux denial...all the raid1 members are still there, but if you
touch a single one of them, then you run the risk of creating silent
data corruption).

It really boils down to this: for any reason that a raid array might
fail to start, you *never* want to touch the underlying data until
someone has taken manual measures to figure out why it didn't start and
corrected the problem.  Putting the superblock in front of the data does
not prevent manual measures (such as recreating superblocks) from
getting at the data.  But, putting superblocks at the end leaves the
door open for accidental access via constituent devices when you
*really* don't want that to happen.

So, no, the default should *not* be at the end of the device.

 As to the people who complained exactly because of this feature, LVM has
 two mechanisms to protect from accessing PVs on the raw disks (the
 ignore raid components option and the filter - I always set filters when
 using LVM ontop of MD).
 
 regards,
 iustin
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part

Re: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Scott Kaelin

[snip]
 
  I am unsure why you would want to setup an iSCSI RAID1, but before
  doing so I would try to verify that each independant iSCSI session
  is bullet proof.

 I use one and only one iSCSI session. Raid1 array is built between a
 local and iSCSI volume.

So you only get this problem doesn't happen when doing I/O with only
the iSCSI session?

Wouldn't it be better to do the RAID1 on the target machine? Then you
don't need to mess around with weird timing behavior of remote/local
writing.

If you want to have the disks on 2 different machines and have them
mirrored DRDB is the way to go.

@Ross: He is trying mirroring his local drive with a iSCSI lun.


 JKB






-- 
Scott Kaelin
Sitrof Technologies
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Iscsitarget-devel] [BUG] Raid1/5 over iSCSI trouble

2007-10-19 Thread Ross S. W. Walker

BERTRAND Joël wrote:
 
 BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  Bill Davidsen wrote:
  Dan Williams wrote:
  On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
   
  I run for 12 hours some dd's (read and write in nullio)
  between
  initiator and target without any disconnection. Thus 
 iSCSI code seems
  to
  be robust. Both initiator and target are alone on a 
 single gigabit
  ethernet link (without any switch). I'm investigating...
  
 
  Can you reproduce on 2.6.22?
 
  Also, I do not think this is the cause of your failure, 
 but you have
  CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' 
 will compile
  out the unneeded checks for offload engines in async_memcpy and
  async_xor.
 
  Given that offload engines are far less tested code, I 
 think this is 
  a very good thing to try!
 
  I'm trying wihtout CONFIG_DMA_ENGINE=y. istd1 only 
 uses 40% of one 
  CPU when I rebuild my raid1 array. 1% of this array was now 
  resynchronized without any hang.
 
  Root gershwin:[/usr/scripts]  cat /proc/mdstat
  Personalities : [raid1] [raid6] [raid5] [raid4]
  md7 : active raid1 sdi1[2] md_d0p1[0]
1464725632 blocks [2/1] [U_]
[]  recovery =  1.0% 
 (15705536/1464725632) 
  finish=1103.9min speed=21875K/sec
  
  Same result...
  
  connection2:0: iscsi: detected conn error (1011)
  
   session2: iscsi: session recovery timed out after 120 secs
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
  sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
 
   Sorry for this last mail. I have found another mistake, 
 but I don't 
 know if this bug comes from iscsi-target or raid5 itself. 
 iSCSI target 
 is disconnected because istd1 and md_d0_raid5 kernel threads 
 use 100% of 
 CPU each !
 
 Tasks: 235 total,   6 running, 227 sleeping,   0 stopped,   2 zombie
 Cpu(s):  0.1%us, 12.5%sy,  0.0%ni, 87.4%id,  0.0%wa,  0.0%hi, 
  0.0%si, 
 0.0%st
 Mem:   4139032k total,   218424k used,  3920608k free,
 10136k buffers
 Swap:  7815536k total,0k used,  7815536k free,
 64808k cached
 
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
 
   5824 root  15  -5 000 R  100  0.0  10:34.25 istd1 
 
   5599 root  15  -5 000 R  100  0.0   7:25.43 
 md_d0_raid5
 
   Regards,
 
   JKB

If you have 2 iSCSI sessions mirrored then any failure along either
path will hose the setup. Plus having iSCSI and MD RAID fight over
same resources in kernel is a recipe for a race condition.

How about exploring MPIO and DRBD?

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Doug Ledford

On Fri, 2007-10-19 at 12:38 -0400, John Stoffel wrote:


 1, 1.0, 1.1, 1.2
 
   Use the new version-1 format superblock.  This has few restrictions.
   The different sub-versions store the superblock at different locations
   on the device, either at the end (for 1.0), at the start (for 1.1) or
   4K from the start (for 1.2).
 
 
 It looks to me that the 1.1, combined with the 1.0 should be what we
 use, with the 1.2 format nuked.  Maybe call it 1.3?  *grin*

You're somewhat misreading the man page.  You *can't* combine 1.0 with
1.1.  All of the above options: 1, 1.0, 1.1, 1.2; specifically mean to
use a version 1 superblock.  1.0 means use a version 1 superblock at the
end of the disk.  1.1 means version 1 superblock at beginning of disk.
`1.2 means version 1 at 4k offset from beginning of the disk.  There
really is no actual version 1.1, or 1.2, the .0, .1, and .2 part of the
version *only* means where to put the version 1 superblock on the disk.
If you just say version 1, then it goes to the default location for
version 1 superblocks, and last I checked that was the end of disk (aka,
1.0).

-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, Doug Ledford wrote:


On Fri, 2007-10-19 at 12:45 -0400, Justin Piszcz wrote:


On Fri, 19 Oct 2007, John Stoffel wrote:


Justin == Justin Piszcz [EMAIL PROTECTED] writes:


Justin Is a bitmap created by default with 1.x?  I remember seeing
Justin reports of 15-30% performance degradation using a bitmap on a
Justin RAID5 with 1.x.

Not according to the mdadm man page.  I'd probably give up that
performance if it meant that re-syncing an array went much faster
after a crash.

I certainly use it on my RAID1 setup on my home machine.

John



The performance AFTER a crash yes, but in general usage I remember seeing
someone here doing benchmarks it had a negative affect on performance.


I'm sure an internal bitmap would.  On RAID1 arrays, reads/writes are
never split up by a chunk size for stripes.  A 2mb read is a single
read, where as on a raid4/5/6 array, a 2mb read will end up hitting a
series of stripes across all disks.  That means that on raid1 arrays,
total disk seeks  total reads/writes, where as on a raid4/5/6, total
disk seeks is usually  total reads/writes.  That in turn implies that
in a raid1 setup, disk seek time is important to performance, but not
necessarily paramount.  For raid456, disk seek time is paramount because
of how many more seeks that format uses.  When you then use an internal
bitmap, you are adding writes to every member of the raid456 array,
which adds more seeks.  The same is true for raid1, but since raid1
doesn't have the same level of dependency on seek rates that raid456
has, it doesn't show the same performance hit that raid456 does.



Justin.

--
Doug Ledford [EMAIL PROTECTED]
 GPG KeyID: CFBFF194
 http://people.redhat.com/dledford

Infiniband specific RPMs available at
 http://people.redhat.com/dledford/Infiniband



Got it, so for RAID1 it would make sense if LILO supported it (the 
later versions of the md superblock) (for those who use LILO) but for

RAID4/5/6, keep the bitmaps away :)

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread John Stoffel

 Justin == Justin Piszcz [EMAIL PROTECTED] writes:

Justin Is a bitmap created by default with 1.x?  I remember seeing
Justin reports of 15-30% performance degradation using a bitmap on a
Justin RAID5 with 1.x.

Not according to the mdadm man page.  I'd probably give up that
performance if it meant that re-syncing an array went much faster
after a crash.

I certainly use it on my RAID1 setup on my home machine.  

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, John Stoffel wrote:


Doug == Doug Ledford [EMAIL PROTECTED] writes:


Doug On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote:

Justin == Justin Piszcz [EMAIL PROTECTED] writes:



Justin On Fri, 19 Oct 2007, John Stoffel wrote:




So,

Is it time to start thinking about deprecating the old 0.9, 1.0 and
1.1 formats to just standardize on the 1.2 format?  What are the
issues surrounding this?


Doug 1.0, 1.1, and 1.2 are the same format, just in different positions on
Doug the disk.  Of the three, the 1.1 format is the safest to use since it
Doug won't allow you to accidentally have some sort of metadata between the
Doug beginning of the disk and the raid superblock (such as an lvm2
Doug superblock), and hence whenever the raid array isn't up, you won't be
Doug able to accidentally mount the lvm2 volumes, filesystem, etc.  (In worse
Doug case situations, I've seen lvm2 find a superblock on one RAID1 array
Doug member when the RAID1 array was down, the system came up, you used the
Doug system, the two copies of the raid array were made drastically
Doug inconsistent, then at the next reboot, the situation that prevented the
Doug RAID1 from starting was resolved, and it never know it failed to start
Doug last time, and the two inconsistent members we put back into a clean
Doug array).  So, deprecating any of these is not really helpful.  And you
Doug need to keep the old 0.90 format around for back compatibility with
Doug thousands of existing raid arrays.

This is a great case for making the 1.1 format be the default.  So
what are the advantages of the 1.0 and 1.2 formats then?  Or should be
we thinking about making two copies of the data on each RAID member,
one at the beginning and one at the end, for resiliency?

I just hate seeing this in the mag page:

   Declare the style of superblock (raid metadata) to be used.
   The default is 0.90 for --create, and to guess for other operations.
   The default can be overridden by setting the metadata value for the
   CREATE keyword in mdadm.conf.

   Options are:

   0, 0.90, default

 Use the original 0.90 format superblock.  This format limits arrays to
 28 component devices and limits compo- nent devices of levels 1 and
 greater to 2 terabytes.

   1, 1.0, 1.1, 1.2

 Use the new version-1 format superblock.  This has few restrictions.
 The different sub-versions store the superblock at different locations
 on the device, either at the end (for 1.0), at the start (for 1.1) or
 4K from the start (for 1.2).


It looks to me that the 1.1, combined with the 1.0 should be what we
use, with the 1.2 format nuked.  Maybe call it 1.3?  *grin*

So at this point I'm not arguing to get rid of the 0.9 format, though
I think it should NOT be the default any more, we should be using the
1.1 combined with 1.0 format.


Is a bitmap created by default with 1.x?  I remember seeing reports of 
15-30% performance degradation using a bitmap on a RAID5 with 1.x.




John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, Doug Ledford wrote:


On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote:

Justin == Justin Piszcz [EMAIL PROTECTED] writes:


Justin On Fri, 19 Oct 2007, John Stoffel wrote:



So,

Is it time to start thinking about deprecating the old 0.9, 1.0 and
1.1 formats to just standardize on the 1.2 format?  What are the
issues surrounding this?


1.0, 1.1, and 1.2 are the same format, just in different positions on
the disk.  Of the three, the 1.1 format is the safest to use since it
won't allow you to accidentally have some sort of metadata between the
beginning of the disk and the raid superblock (such as an lvm2
superblock), and hence whenever the raid array isn't up, you won't be
able to accidentally mount the lvm2 volumes, filesystem, etc.  (In worse
case situations, I've seen lvm2 find a superblock on one RAID1 array
member when the RAID1 array was down, the system came up, you used the
system, the two copies of the raid array were made drastically
inconsistent, then at the next reboot, the situation that prevented the
RAID1 from starting was resolved, and it never know it failed to start
last time, and the two inconsistent members we put back into a clean
array).  So, deprecating any of these is not really helpful.  And you
need to keep the old 0.90 format around for back compatibility with
thousands of existing raid arrays.


Agree, what is the benefit in deprecating them?  Is there that much old 
code or?





It's certainly easy enough to change mdadm to default to the 1.2
format and to require a --force switch to  allow use of the older
formats.

I keep seeing that we support these old formats, and it's never been
clear to me why we have four different ones available?  Why can't we
start defining the canonical format for Linux RAID metadata?

Thanks,
John
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of
Justin anything else!

Are you sure?  I find that GRUB is much easier to use and setup than
LILO these days.  But hey, just dropping down to support 00.09.03 and
1.2 formats would be fine too.  Let's just lessen the confusion if at
all possible.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Doug Ledford [EMAIL PROTECTED]
 GPG KeyID: CFBFF194
 http://people.redhat.com/dledford

Infiniband specific RPMs available at
 http://people.redhat.com/dledford/Infiniband


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, John Stoffel wrote:


Justin == Justin Piszcz [EMAIL PROTECTED] writes:


Justin On Fri, 19 Oct 2007, John Stoffel wrote:



So,

Is it time to start thinking about deprecating the old 0.9, 1.0 and
1.1 formats to just standardize on the 1.2 format?  What are the
issues surrounding this?

It's certainly easy enough to change mdadm to default to the 1.2
format and to require a --force switch to  allow use of the older
formats.

I keep seeing that we support these old formats, and it's never been
clear to me why we have four different ones available?  Why can't we
start defining the canonical format for Linux RAID metadata?

Thanks,
John
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of
Justin anything else!

Are you sure?  I find that GRUB is much easier to use and setup than
LILO these days.  But hey, just dropping down to support 00.09.03 and
1.2 formats would be fine too.  Let's just lessen the confusion if at
all possible.

John



I am sure, I submitted a bug report to the LILO developer, he acknowledged 
the bug but I don't know if it was fixed.


I have not tried GRUB with a RAID1 setup yet.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread John Stoffel

 Justin == Justin Piszcz [EMAIL PROTECTED] writes:

Justin On Fri, 19 Oct 2007, John Stoffel wrote:

 
 So,
 
 Is it time to start thinking about deprecating the old 0.9, 1.0 and
 1.1 formats to just standardize on the 1.2 format?  What are the
 issues surrounding this?
 
 It's certainly easy enough to change mdadm to default to the 1.2
 format and to require a --force switch to  allow use of the older
 formats.
 
 I keep seeing that we support these old formats, and it's never been
 clear to me why we have four different ones available?  Why can't we
 start defining the canonical format for Linux RAID metadata?
 
 Thanks,
 John
 [EMAIL PROTECTED]
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of
Justin anything else!

Are you sure?  I find that GRUB is much easier to use and setup than
LILO these days.  But hey, just dropping down to support 00.09.03 and
1.2 formats would be fine too.  Let's just lessen the confusion if at
all possible.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, John Stoffel wrote:



So,

Is it time to start thinking about deprecating the old 0.9, 1.0 and
1.1 formats to just standardize on the 1.2 format?  What are the
issues surrounding this?

It's certainly easy enough to change mdadm to default to the 1.2
format and to require a --force switch to  allow use of the older
formats.

I keep seeing that we support these old formats, and it's never been
clear to me why we have four different ones available?  Why can't we
start defining the canonical format for Linux RAID metadata?

Thanks,
John
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I hope 00.90.03 is not deprecated, LILO cannot boot off of anything else!


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, Alberto Alonso wrote:


On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:

Mike Accetta [EMAIL PROTECTED] writes:



What I would like to see is a timeout driven fallback mechanism. If
one mirror does not return the requested data within a certain time
(say 1 second) then the request should be duplicated on the other
mirror. If the first mirror later unchokes then it remains in the
raid, if it fails it gets removed. But (at least reads) should not
have to wait for that process.

Even better would be if some write delay could also be used. The still
working mirror would get an increase in its serial (so on reboot you
know one disk is newer). If the choking mirror unchokes then it can
write back all the delayed data and also increase its serial to
match. Otherwise it gets really failed. But you might have to use
bitmaps for this or the cache size would limit its usefullnes.

MfG
Goswin


I think a timeout on both: reads and writes is a must. Basically I
believe that all problems that I've encountered issues using software
raid would have been resolved by using a timeout within the md code.

This will keep a server from crashing/hanging when the underlying
driver doesn't properly handle hard drive problems. MD can be
smarter than the dumb drivers.

Just my thoughts though, as I've never got an answer as to whether or
not md can implement its own timeouts.

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



I have a question with re-mapping sectors, can software raid be as 
efficient or good at remapping bad sectors as an external raid controller 
for, e.g., raid 10 or raid5?


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread Ming Zhang

On Fri, 2007-10-19 at 16:47 +0200, BERTRAND Joël wrote:
 Ming Zhang wrote:
  
  as Ross pointed out, many io pattern only have 1 outstanding io at any
  time, so there is only one work thread actively to serve it. so it can
  not exploit the multiple core here.
  
  
  you see 100% at nullio or fileio? with disk, most time should spend on
  iowait and cpu utilization should not high at all.
 
   With both nullio and fileio...

it is weired. with file io, run some io load, then run vmstat 1 and
post here. supposed to see some iowait instead of high sys cpu usage...




-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread Ming Zhang

On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote:
 Ross S. W. Walker wrote:
  BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  I can format serveral times (mkfs.ext3) a 1.5 TB volume 
  over iSCSI 
  without any trouble. I can read and write on this virtual 
  disk without 
  any trouble.
 
  Now, I have configured ietd with :
 
  Lun 0 Sectors=1464725758,Type=nullio
 
  and I run on initiator side :
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
  479482+0 records in
  479482+0 records out
  3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
 
  I'm waitinfor a crash. No one when I write these lines. 
 I suspect 
  an interaction between raid and iscsi.
 I simultanely run :
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
  8397210+0 records in
  8397210+0 records out
  68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s
 
  and
 
  Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
  739200+0 records in
  739199+0 records out
  6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s
 
 without any trouble.
  
  The speed can definitely be improved. Look at your network setup
  and use ping to try and get the network latency to a minimum.
  
  # ping -A -s 8192 172.16.24.140
  
  --- 172.16.24.140 ping statistics ---
  14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
  rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms
 
 gershwin:[~]  ping -A -s 8192 192.168.0.2
 PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
 8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
 8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
 8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
 8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
 8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
 8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
 8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
 8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
 8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
 8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
 8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
 8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
 8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms
 
 --- 192.168.0.2 ping statistics ---
 13 packets transmitted, 13 received, 0% packet loss, time 2400ms
 rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms
 gershwin:[~] 
 
   Both initiator and target are alone on a gigabit NIC (Tigon3). On 
 target server, istd1 takes 100% of a CPU (and only one CPU, even my 
 T1000 can simultaneous run 32 threads). I think the limitation comes 
 from istd1.

usually istdx will not take 100% cpu with 1G network, especially when
using disk as back storage, some kind of profiling work might be helpful
to tell what happened...

forgot to ask, your sparc64 platform cpu spec.


 
  You want your avg ping time for 8192 byte payloads to be 300us or less.
  
  1000/.268 = 3731 IOPS @ 8k = 30 MB/s
  
  If you use apps that do overlapping asynchronous IO you can see better
  numbers.
 
   Regards,
 
   JKB
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid5 trouble

2007-10-19 Thread BERTRAND Joël


Bill Davidsen wrote:

Dan Williams wrote:

I found a problem which may lead to the operations count dropping
below zero.  If ops_complete_biofill() gets preempted in between the
following calls:

raid5.c:554 clear_bit(STRIPE_OP_BIOFILL, sh-ops.ack);
raid5.c:555 clear_bit(STRIPE_OP_BIOFILL, sh-ops.pending);

...then get_stripe_work() can recount/re-acknowledge STRIPE_OP_BIOFILL
causing the assertion.  In fact, the 'pending' bit should always be
cleared first, but the other cases are protected by
spin_lock(sh-lock).  Patch attached.
  


Once this patch has been vetted, can it be offered to -stable for 
2.6.23? Or to be pedantic, it *can*, will you make that happen?


	I never see any oops with this patch. But I cannot create a RAID1 array 
with a local RAID5 volume and a foreign RAID5 array exported by iSCSI. 
iSCSI seems to works fine, but RAID1 creation randomly aborts due to a 
unknown SCSI task on target side.


	I have stressed iSCSI target with some simultaneous I/O without any 
trouble (nullio, fileio and blockio), thus I suspect another bug in raid 
code (or an arch specific bug). The last two days, I have made some 
tests to isolate and reproduce this bug:


1/ iSCSI target and initiator seem work when I export with iSCSI a raid5 
array;

2/ raid1 and raid5 seem work with local disks;
3/ iSCSI target is disconnected only when I create a raid1 volume over 
iSCSI (blockio _and_ fileio) with following message:


Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42 
57344 0 0
Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on 
tid:1 lun:0 by sid:630024457682948 (Unknown Task)


	I run for 12 hours some dd's (read and write in nullio) between 
initiator and target without any disconnection. Thus iSCSI code seems to 
be robust. Both initiator and target are alone on a single gigabit 
ethernet link (without any switch). I'm investigating...


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

ANNOUNCE: mdadm 2.6.4 - A tool for managing Soft RAID under Linux

2007-10-19 Thread Neil Brown



I am pleased to announce the availability of
   mdadm version 2.6.4

It is available at the usual places:
   http://www.cse.unsw.edu.au/~neilb/source/mdadm/
and
   countrycode=xx.
   http://www.${countrycode}kernel.org/pub/linux/utils/raid/mdadm/
and via git at
   git://neil.brown.name/mdadm
   http://neil.brown.name/git?p=mdadm

mdadm is a tool for creating, managing and monitoring
device arrays using the md driver in Linux, also
known as Software RAID arrays.

Release 2.6.4 adds a few minor bug fixes to 2.6.3

Changelog Entries:
-   Make --create --auto=mdp work for non-standard device names.
-   Fix restarting of a 'reshape' if it was stopped in the middle.
-   Fix a segfault when using v1 superblock.
-   Make --write-mostly effective when re-adding a device to an array.
-   Various minor fixes

Development of mdadm is sponsored by
 SUSE Labs, Novell Inc.

NeilBrown  19th October 2007
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread John Stoffel

 Doug == Doug Ledford [EMAIL PROTECTED] writes:

Doug On Fri, 2007-10-19 at 12:38 -0400, John Stoffel wrote:
 1, 1.0, 1.1, 1.2
 
 Use the new version-1 format superblock.  This has few restrictions.
 The different sub-versions store the superblock at different locations
 on the device, either at the end (for 1.0), at the start (for 1.1) or
 4K from the start (for 1.2).
 
 
 It looks to me that the 1.1, combined with the 1.0 should be what we
 use, with the 1.2 format nuked.  Maybe call it 1.3?  *grin*

Doug You're somewhat misreading the man page. 

The man page is somewhat misleading then.  It's not clear from reading
it that the version 1 RAID superblock can be in one of three different
positions in the volume.  

Doug You *can't* combine 1.0 with 1.1.  All of the above options: 1,
Doug 1.0, 1.1, 1.2; specifically mean to use a version 1 superblock.
Doug 1.0 means use a version 1 superblock at the end of the disk.
Doug 1.1 means version 1 superblock at beginning of disk.  `1.2 means
Doug version 1 at 4k offset from beginning of the disk.  There really
Doug is no actual version 1.1, or 1.2, the .0, .1, and .2 part of the
Doug version *only* means where to put the version 1 superblock on
Doug the disk.  If you just say version 1, then it goes to the
Doug default location for version 1 superblocks, and last I checked
Doug that was the end of disk (aka, 1.0).

So why not get rid of (deprecate) the version 1.0 and version 1.2
blocks, and only support the 1.1 version?  

Why do we have three different positions for storing the superblock?  

And if putting the superblock at the end is problematic, why is it the
default?  Shouldn't version 1.1 be the default?  

Or, alternatively, update the code so that we support RAID superblocks
at BOTH the beginning and end 4k of the disk, for maximum redundancy.

I guess I need to go and read the code to figure out the placement of
0.90 and 1.0 blocks to see how they are different.  It's just not
clear to me why we have such a muddle of 1.x formats to choose from
and what the advantages and tradeoffs are between them.

John


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Justin Piszcz




On Fri, 19 Oct 2007, John Stoffel wrote:


Justin == Justin Piszcz [EMAIL PROTECTED] writes:


Justin Is a bitmap created by default with 1.x?  I remember seeing
Justin reports of 15-30% performance degradation using a bitmap on a
Justin RAID5 with 1.x.

Not according to the mdadm man page.  I'd probably give up that
performance if it meant that re-syncing an array went much faster
after a crash.

I certainly use it on my RAID1 setup on my home machine.

John



The performance AFTER a crash yes, but in general usage I remember seeing 
someone here doing benchmarks it had a negative affect on performance.


Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread John Stoffel

 Doug == Doug Ledford [EMAIL PROTECTED] writes:

Doug On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote:
  Justin == Justin Piszcz [EMAIL PROTECTED] writes:
 
Justin On Fri, 19 Oct 2007, John Stoffel wrote:
 
  
  So,
  
  Is it time to start thinking about deprecating the old 0.9, 1.0 and
  1.1 formats to just standardize on the 1.2 format?  What are the
  issues surrounding this?

Doug 1.0, 1.1, and 1.2 are the same format, just in different positions on
Doug the disk.  Of the three, the 1.1 format is the safest to use since it
Doug won't allow you to accidentally have some sort of metadata between the
Doug beginning of the disk and the raid superblock (such as an lvm2
Doug superblock), and hence whenever the raid array isn't up, you won't be
Doug able to accidentally mount the lvm2 volumes, filesystem, etc.  (In worse
Doug case situations, I've seen lvm2 find a superblock on one RAID1 array
Doug member when the RAID1 array was down, the system came up, you used the
Doug system, the two copies of the raid array were made drastically
Doug inconsistent, then at the next reboot, the situation that prevented the
Doug RAID1 from starting was resolved, and it never know it failed to start
Doug last time, and the two inconsistent members we put back into a clean
Doug array).  So, deprecating any of these is not really helpful.  And you
Doug need to keep the old 0.90 format around for back compatibility with
Doug thousands of existing raid arrays.

This is a great case for making the 1.1 format be the default.  So
what are the advantages of the 1.0 and 1.2 formats then?  Or should be
we thinking about making two copies of the data on each RAID member,
one at the beginning and one at the end, for resiliency?  

I just hate seeing this in the mag page:

Declare the style of superblock (raid metadata) to be used.
The default is 0.90 for --create, and to guess for other operations.
The default can be overridden by setting the metadata value for the
CREATE keyword in mdadm.conf.

Options are:

0, 0.90, default

  Use the original 0.90 format superblock.  This format limits arrays to
  28 component devices and limits compo- nent devices of levels 1 and
  greater to 2 terabytes.

1, 1.0, 1.1, 1.2

  Use the new version-1 format superblock.  This has few restrictions.
  The different sub-versions store the superblock at different locations
  on the device, either at the end (for 1.0), at the start (for 1.1) or
  4K from the start (for 1.2).


It looks to me that the 1.1, combined with the 1.0 should be what we
use, with the 1.2 format nuked.  Maybe call it 1.3?  *grin*

So at this point I'm not arguing to get rid of the 0.9 format, though
I think it should NOT be the default any more, we should be using the
1.1 combined with 1.0 format.

John
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Time to deprecate old RAID formats?

2007-10-19 Thread Doug Ledford

On Fri, 2007-10-19 at 11:46 -0400, John Stoffel wrote:
  Justin == Justin Piszcz [EMAIL PROTECTED] writes:
 
 Justin On Fri, 19 Oct 2007, John Stoffel wrote:
 
  
  So,
  
  Is it time to start thinking about deprecating the old 0.9, 1.0 and
  1.1 formats to just standardize on the 1.2 format?  What are the
  issues surrounding this?

1.0, 1.1, and 1.2 are the same format, just in different positions on
the disk.  Of the three, the 1.1 format is the safest to use since it
won't allow you to accidentally have some sort of metadata between the
beginning of the disk and the raid superblock (such as an lvm2
superblock), and hence whenever the raid array isn't up, you won't be
able to accidentally mount the lvm2 volumes, filesystem, etc.  (In worse
case situations, I've seen lvm2 find a superblock on one RAID1 array
member when the RAID1 array was down, the system came up, you used the
system, the two copies of the raid array were made drastically
inconsistent, then at the next reboot, the situation that prevented the
RAID1 from starting was resolved, and it never know it failed to start
last time, and the two inconsistent members we put back into a clean
array).  So, deprecating any of these is not really helpful.  And you
need to keep the old 0.90 format around for back compatibility with
thousands of existing raid arrays.

  It's certainly easy enough to change mdadm to default to the 1.2
  format and to require a --force switch to  allow use of the older
  formats.
  
  I keep seeing that we support these old formats, and it's never been
  clear to me why we have four different ones available?  Why can't we
  start defining the canonical format for Linux RAID metadata?
  
  Thanks,
  John
  [EMAIL PROTECTED]
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 Justin I hope 00.90.03 is not deprecated, LILO cannot boot off of
 Justin anything else!
 
 Are you sure?  I find that GRUB is much easier to use and setup than
 LILO these days.  But hey, just dropping down to support 00.09.03 and
 1.2 formats would be fine too.  Let's just lessen the confusion if at
 all possible.
 
 John
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Doug Ledford [EMAIL PROTECTED]
  GPG KeyID: CFBFF194
  http://people.redhat.com/dledford

Infiniband specific RPMs available at
  http://people.redhat.com/dledford/Infiniband


signature.asc
Description: This is a digitally signed message part

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread Ming Zhang

On Fri, 2007-10-19 at 16:30 +0200, BERTRAND Joël wrote:
 Ming Zhang wrote:
  On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote:
  Ross S. W. Walker wrote:
  BERTRAND Joël wrote:
  BERTRAND Joël wrote:
  I can format serveral times (mkfs.ext3) a 1.5 TB volume 
  over iSCSI 
  without any trouble. I can read and write on this virtual 
  disk without 
  any trouble.
 
  Now, I have configured ietd with :
 
  Lun 0 Sectors=1464725758,Type=nullio
 
  and I run on initiator side :
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
  479482+0 records in
  479482+0 records out
  3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
 
  I'm waitinfor a crash. No one when I write these lines. 
 I suspect 
  an interaction between raid and iscsi.
   I simultanely run :
 
  Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
  8397210+0 records in
  8397210+0 records out
  68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s
 
  and
 
  Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
  739200+0 records in
  739199+0 records out
  6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s
 
   without any trouble.
  The speed can definitely be improved. Look at your network setup
  and use ping to try and get the network latency to a minimum.
 
  # ping -A -s 8192 172.16.24.140
  
  --- 172.16.24.140 ping statistics ---
  14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
  rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms
  gershwin:[~]  ping -A -s 8192 192.168.0.2
  PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
  8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
  8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
  8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
  8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
  8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
  8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
  8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
  8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
  8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
  8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
  8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
  8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
  8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms
 
  --- 192.168.0.2 ping statistics ---
  13 packets transmitted, 13 received, 0% packet loss, time 2400ms
  rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 
  ms
  gershwin:[~] 
 
 Both initiator and target are alone on a gigabit NIC (Tigon3). On 
  target server, istd1 takes 100% of a CPU (and only one CPU, even my 
  T1000 can simultaneous run 32 threads). I think the limitation comes 
  from istd1.
  
  usually istdx will not take 100% cpu with 1G network, especially when
  using disk as back storage, some kind of profiling work might be helpful
  to tell what happened...
  
  forgot to ask, your sparc64 platform cpu spec.
 
 Root gershwin:[/mnt/solaris]  cat /proc/cpuinfo
 cpu : UltraSparc T1 (Niagara)
 fpu : UltraSparc T1 integrated FPU
 prom: OBP 4.23.4 2006/08/04 20:45
 type: sun4v
 ncpus probed: 24
 ncpus active: 24
 D$ parity tl1   : 0
 I$ parity tl1   : 0
 
   Both servers are built with 1 GHz T1 processors (6 cores, 24 threads).
 

as Ross pointed out, many io pattern only have 1 outstanding io at any
time, so there is only one work thread actively to serve it. so it can
not exploit the multiple core here.


you see 100% at nullio or fileio? with disk, most time should spend on
iowait and cpu utilization should not high at all.


 
   Regards,
 
   JKB
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread Ross S. W. Walker

Ming Zhang wrote:
 
 On Fri, 2007-10-19 at 16:30 +0200, BERTRAND Joël wrote:
  Ming Zhang wrote:
   On Fri, 2007-10-19 at 09:48 +0200, BERTRAND Joël wrote:
   Ross S. W. Walker wrote:
   BERTRAND Joël wrote:
   BERTRAND Joël wrote:
   I can format serveral times (mkfs.ext3) a 1.5 TB volume 
   over iSCSI 
   without any trouble. I can read and write on this virtual 
   disk without 
   any trouble.
  
   Now, I have configured ietd with :
  
   Lun 0 Sectors=1464725758,Type=nullio
  
   and I run on initiator side :
  
   Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
   479482+0 records in
   479482+0 records out
   3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s
  
   Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
  
   I'm waitinfor a crash. No one when I write these lines. 
  I suspect 
   an interaction between raid and iscsi.
  I simultanely run :
  
   Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
   8397210+0 records in
   8397210+0 records out
   68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s
  
   and
  
   Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
   739200+0 records in
   739199+0 records out
   6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s
  
  without any trouble.
   The speed can definitely be improved. Look at your network setup
   and use ping to try and get the network latency to a minimum.
  
   # ping -A -s 8192 172.16.24.140
   
   --- 172.16.24.140 ping statistics ---
   14058 packets transmitted, 14057 received, 0% packet 
 loss, time 9988ms
   rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, 
 ipg/ewma 0.710/0.260 ms
   gershwin:[~]  ping -A -s 8192 192.168.0.2
   PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
   8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
   8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
   8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
   8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
   8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
   8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
   8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
   8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
   8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
   8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
   8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
   8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
   8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms
  
   --- 192.168.0.2 ping statistics ---
   13 packets transmitted, 13 received, 0% packet loss, time 2400ms
   rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, 
 ipg/ewma 200.022/0.607 ms
   gershwin:[~] 
  
Both initiator and target are alone on a gigabit NIC 
 (Tigon3). On 
   target server, istd1 takes 100% of a CPU (and only one 
 CPU, even my 
   T1000 can simultaneous run 32 threads). I think the 
 limitation comes 
   from istd1.
   
   usually istdx will not take 100% cpu with 1G network, 
 especially when
   using disk as back storage, some kind of profiling work 
 might be helpful
   to tell what happened...
   
   forgot to ask, your sparc64 platform cpu spec.
  
  Root gershwin:[/mnt/solaris]  cat /proc/cpuinfo
  cpu : UltraSparc T1 (Niagara)
  fpu : UltraSparc T1 integrated FPU
  prom: OBP 4.23.4 2006/08/04 20:45
  type: sun4v
  ncpus probed: 24
  ncpus active: 24
  D$ parity tl1   : 0
  I$ parity tl1   : 0
  
  Both servers are built with 1 GHz T1 processors (6 
 cores, 24 threads).
  
 
 as Ross pointed out, many io pattern only have 1 outstanding io at any
 time, so there is only one work thread actively to serve it. so it can
 not exploit the multiple core here.
 
 
 you see 100% at nullio or fileio? with disk, most time should spend on
 iowait and cpu utilization should not high at all.

Maybe it has to do with the endian-ness fix?

Look at where the fix was implemented and if there was a simpler way
of implementing it? (If that is the cause)

The network is still slower then expected, I don't know what chipset
the Sparcs use for their interfaces, if it is e1000 then you can set
low-latency interrupt throttling with InterruptThrottleRate=1 which
works well. You can explore other interface module options around
Interrupt throttling or coalesence.

-Ross

__
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please

async_tx: get best channel

2007-10-19 Thread Yuri Tikhonov


 Hello Dan,

 I have a suggestion regarding the async_tx_find_channel() procedure.

 First, a little introduction. Some processors (e.g. ppc440spe) have several DMA
engines (say DMA1 and DMA2) which are capable of performing the same type of 
operation, say XOR. The DMA2 engine may process the XOR operation faster than
the DMA1 engine, but DMA2 (which is faster) has some restrictions for the source
operand addresses, whereas there are no such restrictions for DMA1 (which is 
slower).
So the question is, how may ASYNC_TX select the DMA engine which will be the
most effective for the given tx operation ?

 In the example just described this means: if the faster engine, DMA2, may 
process
the tx operation with the given source operand addresses, then we select DMA2;
if the given source operand addresses cannot be processed with DMA2, then we
select the slower engine, DMA1.

 I see the following way for introducing such functionality.

 We may introduce an additional method in struct dma_device (let's call it 
device_estimate())
which would take the following as the arguments:
--- the list of sources to be processed during the given tx,
--- the type of operation (XOR, COPY, ...), 
--- perhaps something else, 
 and then estimate the effectiveness of processing this tx on the given channel.
 The async_tx_find_channel() function should call the device_estimate() method 
for each
registered dma channel and then select the most effective one.
 The architecture specific ADMA driver will be responsible for returning the 
greatest
value from the device_estimate() method for the channel which will be the most 
effective
for this given tx.

 What are your thoughts regarding this? Do you see any other effective ways for
enhancing ASYNC_TX with such functionality?

 Regards, Yuri

-- 
Yuri Tikhonov, Senior Software Engineer
Emcraft Systems, www.emcraft.com
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Iscsitarget-devel] Abort Task ?

2007-10-19 Thread BERTRAND Joël


Ross S. W. Walker wrote:

BERTRAND Joël wrote:

BERTRAND Joël wrote:
I can format serveral times (mkfs.ext3) a 1.5 TB volume 
over iSCSI 
without any trouble. I can read and write on this virtual 
disk without 

any trouble.

Now, I have configured ietd with :

Lun 0 Sectors=1464725758,Type=nullio

and I run on initiator side :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
479482+0 records in
479482+0 records out
3927916544 bytes (3.9 GB) copied, 153.222 seconds, 25.6 MB/s

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192

I'm waitinfor a crash. No one when I write these lines. 
   I suspect 

an interaction between raid and iscsi.

I simultanely run :

Root gershwin:[/dev]  dd if=/dev/zero of=/dev/sdj bs=8192
8397210+0 records in
8397210+0 records out
68789944320 bytes (69 GB) copied, 2732.55 seconds, 25.2 MB/s

and

Root gershwin:[~]  dd if=/dev/sdj of=/dev/null bs=8192
739200+0 records in
739199+0 records out
6055518208 bytes (6.1 GB) copied, 447.178 seconds, 13.5 MB/s

without any trouble.


The speed can definitely be improved. Look at your network setup
and use ping to try and get the network latency to a minimum.

# ping -A -s 8192 172.16.24.140

--- 172.16.24.140 ping statistics ---
14058 packets transmitted, 14057 received, 0% packet loss, time 9988ms
rtt min/avg/max/mdev = 0.234/0.268/2.084/0.041 ms, ipg/ewma 0.710/0.260 ms


gershwin:[~]  ping -A -s 8192 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 8192(8220) bytes of data.
8200 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.693 ms
8200 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.595 ms
8200 bytes from 192.168.0.2: icmp_seq=3 ttl=64 time=0.583 ms
8200 bytes from 192.168.0.2: icmp_seq=4 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=5 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=6 ttl=64 time=0.594 ms
8200 bytes from 192.168.0.2: icmp_seq=7 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=8 ttl=64 time=0.592 ms
8200 bytes from 192.168.0.2: icmp_seq=9 ttl=64 time=0.589 ms
8200 bytes from 192.168.0.2: icmp_seq=10 ttl=64 time=0.571 ms
8200 bytes from 192.168.0.2: icmp_seq=11 ttl=64 time=0.588 ms
8200 bytes from 192.168.0.2: icmp_seq=12 ttl=64 time=0.580 ms
8200 bytes from 192.168.0.2: icmp_seq=13 ttl=64 time=0.587 ms

--- 192.168.0.2 ping statistics ---
13 packets transmitted, 13 received, 0% packet loss, time 2400ms
rtt min/avg/max/mdev = 0.571/0.593/0.693/0.044 ms, ipg/ewma 200.022/0.607 ms
gershwin:[~] 

	Both initiator and target are alone on a gigabit NIC (Tigon3). On 
target server, istd1 takes 100% of a CPU (and only one CPU, even my 
T1000 can simultaneous run 32 threads). I think the limitation comes 
from istd1.



You want your avg ping time for 8192 byte payloads to be 300us or less.

1000/.268 = 3731 IOPS @ 8k = 30 MB/s

If you use apps that do overlapping asynchronous IO you can see better
numbers.


Regards,

JKB
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Software RAID when it works and when it doesn't

2007-10-19 Thread Alberto Alonso

On Thu, 2007-10-18 at 17:26 +0200, Goswin von Brederlow wrote:
 Mike Accetta [EMAIL PROTECTED] writes:

 What I would like to see is a timeout driven fallback mechanism. If
 one mirror does not return the requested data within a certain time
 (say 1 second) then the request should be duplicated on the other
 mirror. If the first mirror later unchokes then it remains in the
 raid, if it fails it gets removed. But (at least reads) should not
 have to wait for that process.
 
 Even better would be if some write delay could also be used. The still
 working mirror would get an increase in its serial (so on reboot you
 know one disk is newer). If the choking mirror unchokes then it can
 write back all the delayed data and also increase its serial to
 match. Otherwise it gets really failed. But you might have to use
 bitmaps for this or the cache size would limit its usefullnes.
 
 MfG
 Goswin

I think a timeout on both: reads and writes is a must. Basically I
believe that all problems that I've encountered issues using software
raid would have been resolved by using a timeout within the md code.

This will keep a server from crashing/hanging when the underlying 
driver doesn't properly handle hard drive problems. MD can be 
smarter than the dumb drivers.

Just my thoughts though, as I've never got an answer as to whether or
not md can implement its own timeouts.

Alberto


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Raid5 trouble

2007-10-19 Thread Dan Williams

On Fri, 2007-10-19 at 01:04 -0700, BERTRAND Joël wrote:
 I never see any oops with this patch. But I cannot create a
 RAID1 array
 with a local RAID5 volume and a foreign RAID5 array exported by iSCSI.
 iSCSI seems to works fine, but RAID1 creation randomly aborts due to a
 unknown SCSI task on target side.

For now I am going to forward this patch to Neil for inclusion in
-stable and 2.6.24-rc.  I will add a Tested-by: Joël Bertrand
[EMAIL PROTECTED] unless you have an objection.

 I have stressed iSCSI target with some simultaneous I/O
 without any
 trouble (nullio, fileio and blockio), thus I suspect another bug in
 raid
 code (or an arch specific bug). The last two days, I have made some
 tests to isolate and reproduce this bug:
 1/ iSCSI target and initiator seem work when I export with iSCSI a
 raid5
 array;
 2/ raid1 and raid5 seem work with local disks;
 3/ iSCSI target is disconnected only when I create a raid1 volume over
 iSCSI (blockio _and_ fileio) with following message:
 
 Oct 18 10:43:52 poulenc kernel: iscsi_trgt: cmnd_abort(1156) 29 1 0 42
 57344 0 0
 Oct 18 10:43:52 poulenc kernel: iscsi_trgt: Abort Task (01) issued on
 tid:1 lun:0 by sid:630024457682948 (Unknown Task)
 
 I run for 12 hours some dd's (read and write in nullio)
 between
 initiator and target without any disconnection. Thus iSCSI code seems
 to
 be robust. Both initiator and target are alone on a single gigabit
 ethernet link (without any switch). I'm investigating...

Can you reproduce on 2.6.22?

Also, I do not think this is the cause of your failure, but you have
CONFIG_DMA_ENGINE=y in your config.  Setting this to 'n' will compile
out the unneeded checks for offload engines in async_memcpy and
async_xor.
 
 Regards,
 JKB

Regards,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

chunk size (was Re: Time to deprecate old RAID formats?)

2007-10-19 Thread Michal Soltys


Doug Ledford wrote:

course, this comes at the expense of peak throughput on the device.
Let's say you were building a mondo movie server, where you were
streaming out digital movie files.  In that case, you very well may care
more about throughput than seek performance since I suspect you wouldn't
have many small, random reads.  Then I would use a small chunk size,
sacrifice the seek performance, and get the throughput bonus of parallel
reads from the same stripe on multiple disks.  On the other hand, if I



Out of curiosity though - why wouldn't large chunk work well here ? If you 
stream video (I assume large files, so like a good few MBs at least), the 
reads are parallel either way.


Yes, the amount of data read from each of the disks will be in less perfect 
proportion than in small chunk size scenario, but it's pretty neglible. 
Benchamrks I've seen (like Justin's one) seem not to care much about chunk 
size in sequential read/write scenarios (and often favors larger chunks). 
Some of my own tests I did few months ago confirmed that as well.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

40 matches

Mail list logo