Re: recommendations for stripe/chunk size

2008-02-07 Thread Wolfgang Denk
Dear Nail,

in message [EMAIL PROTECTED] you wrote:
 
 quote
 The second improvement is to remove a memory copy that is internal to the MD 
 driver. The MD
 driver stages strip data ready to be written next to the I/O controller in a 
 page size pre-
 allocated buffer. It is possible to bypass this memory copy for sequential 
 writes thereby saving
 SDRAM access cycles.
 /quote
 
 I sure hope you've checked that the filesystem never (ever) changes a
 buffer while it is being written out.  Otherwise the data written to
 disk might be different from the data used in the parity calculation
 :-)

Sure. Note that usage szenarios of this implementation are  not  only
(actually  not  even  primarily)  focussed  on  using such a setup as
normal RAID server - instead processors like the 440SPe  will  likely
be  used  on  RAID  controller  cards itself - and data may come from
iSCSI or over one of the PCIe busses, but  not  from  a  normal  file
system.

 And what are the Second memcpy and First memcpy in the graph?
 I assume one is the memcpy mentioned above, but what is the other?

Avoiding the 1st memcpy means to skip the system block level caching,
i. e. try to use DIRECT_IO capability  (-dio  option  to  xdd  tool
which was used for these benchmarks).

The 2nd memcpy is the optimization for large  sequential  writes  you
quoted above.

Please keep  in  mind  that  these  optimizations  are  probably  not
directly  useful  for  general purpose use of a normal file system on
top of the RAID array; they have other goals: provide benchmarks  for
the  special  case  of  large synchrounous I/O operations (as used by
RAID controller manufacturers to show off their competitors), and  to
provide a base for the firmware of such controllers.

Nevertheless, they clearly show  where  optimizations  are  possible,
assuming you understand exactly your usuage szenario.

In real life, your  optimization  may  require  completely  different
strategies  -  for  example,  on  our  main file server we see such a
distribution of file sizes:

Out of a sample of 14.2e6 files,

 65%are smaller than  4 kB
 80%are smaller than  8 kB
 90%are smaller than 16 kB
 96%are smaller than 32 kB
 98.4%  are smaller than 64 kB

You don't want - for example - huge stripe sizes in such a system.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Egotist: A person of low taste, more interested in  himself  than  in
me.  - Ambrose Bierce
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: recommendations for stripe/chunk size

2008-02-06 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:

  I actually  think the kernel should operate with block sizes
  like this and not wth 4 kiB blocks. It is the readahead and the elevator
  algorithms that save us from randomly reading 4 kb a time.
 

 Exactly, and nothing save a R-A-RW cycle if the write is a partial chunk.

Indeed kernel page size is an important factor in such optimizations.
But you have to keep in mind that this is mostly efficient for (very)
large strictly sequential I/O operations only -  actual  file  system
traffic may be *very* different.

We implemented the option to select kernel page sizes of  4,  16,  64
and  256  kB for some PowerPC systems (440SPe, to be precise). A nice
graphics of the effect can be found here:

https://www.amcc.com/MyAMCC/retrieveDocument/PowerPC/440SPe/RAIDinLinux_PB_0529a.pdf

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
You got to learn three things. What's  real,  what's  not  real,  and
what's the difference.   - Terry Pratchett, _Witches Abroad_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: identifying failed disk/s in an array.

2008-01-23 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:

 And/or use smartctl to look up the make/model/serial number and look at the
 drive label. I always do this to make sure I'm pulling the right drive (also
 useful to RMA the drive)

Or, probblay even faster, do a ls -l /dev/disk/by-id (assuming you
are using udev).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Command, n.:
Statement presented by a human and accepted by a computer
in such a manner as to make the human feel as if he is in control.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.

2007-12-30 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:
 what is nobarrier ?
...
   # mount -o logbsize=256k,nobarrier dev mtpt

See http://oss.sgi.com/projects/xfs/faq.html

Q: How can I address the problem with the write cache?

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
One of the advantages of being a captain is being able to ask for ad-
vice without necessarily having to take it.
-- Kirk, Dagger of the Mind, stardate 2715.2
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 lockup with AMCC440 and async-tx

2007-10-01 Thread Wolfgang Denk
Dear Dale,

in message [EMAIL PROTECTED] you wrote:

 Latest code from Dan or latest code from denx.de? I grabbed the latest

From linux-2.6-denx

 code from Dan, but I'm having trouble cloning denx.de:
 
 remote: error: object directory /home/git/linux-2.6/.git/objects does
 not exist; check .git/objects/info/alternates.

Argh.. Stupid me.

Please try again - this one is fixed now.

  We saw similar problems, in our case they showed up only with a large
  number of disks in combination with big kernel pages sizes (64 kB).
 
 The problem occurs for me with both 4k and 64k pages.

Probably using more than one controller adds to the likelyhood of
being hit by this race condition.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Immortality consists largely of boredom.
-- Zefrem Cochrane, Metamorphosis, stardate 3219.8
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backups w/ rsync

2007-09-28 Thread Wolfgang Denk
Dear Bill,

in message [EMAIL PROTECTED] you wrote:

 Be aware that rsync is useful for making a *copy* of your files, which 
 isn't always the best backup. If the goal is to preserve data and be 
 able to recover in time of disaster, it's probably not optimal, while if 
 you need frequent access to old or deleted files it's fine.

If you want to do real backups you should use real tools, like bacula
etc.

 Now you can do an incremental (since last full or incremental) or 
 partial (since last full):
 
 touch bkup_incr_new
 timestamp=$(date +%Y%m%d-%T)
 find /home -cnewer bkup_incr | cpio -o -Hcrc |
gzip -3 /mnt/USBbkup/incr-$timestamp 
mv -f bkup_incr_new bkup_incr
 
 timestamp=$(date +%Y%m%d-%T)
 find /home -cnewer bkup_full  | cpio -o -Hcrc |
gzip -3 /mnt/USBbkup/part-$timestamp

Now have Johnny Loser downloading some stuff, say:

$ wget -N ftp://ftp.kernel.org/pub/linux/kernel/v2.6/linux-2.6.12.tar.gz

Are you aware that this file will never be backed up by your script?

Also, what about permission / owner changes etc.?

A backup tool should never work based on timestamps alone.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, MD: Wolfgang Denk  Detlev Zundel
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
All he had was nothing, but that was something, and now it  had  been
taken away. - Terry Pratchett, _Sourcery_
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data corruption on software raid.

2007-03-18 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:
 
 But what amazes me is that no media errors can be detected by doing a
 write/read check on every sector of the disk with mke2fs, and no data
 corruption occurs when moving data to the set locally!
 
 Can anyone shed some light on what i can try next to isolate what is
 causing all this?  It's not the software raid code, the IDE set is

If it happens only with downloaded data, you may see data corruption
in the NIC hardware and/or driver. Try using another network card
(other vendor, other type).

Another possible culprit is memory - you may see memory errors  under
certain usage patterns. Make sur to run a memory test, and/or try
changing RAM.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, HRB 165235 Munich, CEO: Wolfgang Denk
Office:  Kirchenstr. 5,   D-82194 Groebenzell,Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
Time is an illusion perpetrated by the manufacturers of space.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [PPC32] ADMA support for PPC 440SPe processors.

2007-03-16 Thread Wolfgang Denk
In message [EMAIL PROTECTED] you wrote:

 They are in -mm (git-md-accel.patch).  I'll review this driver and and
 integrate it into my next push to Andrew, along with some further
 cleanups.

Thanks.

We're doing some cleanup now based on the feedback we receive.

What is easier for you to handle  -  a  complete  new  patch,  or  an
incrementan  one  on  top  of  what  we  submitted  now?  (I'd prefer
incremental, but will do whatever works better for you).

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH, HRB 165235 Munich, CEO: Wolfgang Denk
Office:  Kirchenstr. 5,   D-82194 Groebenzell,Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: [EMAIL PROTECTED]
You may call me by my name, Wirth, or by my value, Worth.
- Nicklaus Wirth
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html