Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
On Saturday May 13, [EMAIL PROTECTED] wrote: Paul Clements [EMAIL PROTECTED] wrote: Andrew Morton wrote: The loss of pagecache coherency seems sad. I assume there's never a requirement for userspace to read this file. Actually, there is. mdadm reads the bitmap file, so that would be broken. Also, it's just useful for a user to be able to read the bitmap (od -x, or similar) to figure out approximately how much more he's got to resync to get an array in-sync. Other than reading the bitmap file, I don't know of any way to determine that. Read it with O_DIRECT :( Which is exactly what the next release of mdadm does. As the patch comment said: : With this approach the pagecache may contain data which is inconsistent with : what is on disk. To alleviate the problems this can cause, md invalidates : the pagecache when releasing the file. If the file is to be examined : while the array is active (a non-critical but occasionally useful function), : O_DIRECT io must be used. And new version of mdadm will have support for this. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
Neil Brown [EMAIL PROTECTED] wrote: On Saturday May 13, [EMAIL PROTECTED] wrote: Paul Clements [EMAIL PROTECTED] wrote: Andrew Morton wrote: The loss of pagecache coherency seems sad. I assume there's never a requirement for userspace to read this file. Actually, there is. mdadm reads the bitmap file, so that would be broken. Also, it's just useful for a user to be able to read the bitmap (od -x, or similar) to figure out approximately how much more he's got to resync to get an array in-sync. Other than reading the bitmap file, I don't know of any way to determine that. Read it with O_DIRECT :( Which is exactly what the next release of mdadm does. As the patch comment said: : With this approach the pagecache may contain data which is inconsistent with : what is on disk. To alleviate the problems this can cause, md invalidates : the pagecache when releasing the file. If the file is to be examined : while the array is active (a non-critical but occasionally useful function), : O_DIRECT io must be used. And new version of mdadm will have support for this. Which doesn't help `od -x' and is going to cause older mdadm userspace to mysteriously and subtly fail. Or does the user-kernel interface have versioning which will prevent this? - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softraid and multiple distros
What do I need to do when I want to install a different distro on the machine with a raid5 array? Which files do I need? /etc/mdadm.conf? /etc/raittab? both? MD doesn't need any files to function, since it can auto-assemble arrays based on their superblocks (for partition-type 0xfd). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softraid and multiple distros
Am Sonntag, 14. Mai 2006 16:50 schrieben Sie: What do I need to do when I want to install a different distro on the machine with a raid5 array? Which files do I need? /etc/mdadm.conf? /etc/raittab? both? MD doesn't need any files to function, since it can auto-assemble arrays based on their superblocks (for partition-type 0xfd). I see. Now an issue arises someone else here mentioned: My first attempt was to use the entire disks, then I was hinted that this approach wasn't too hot so I made partitions. Now the devices have all two superblocks, the one left from the first try which are now kinda orphaned and those now active. Can I trust mdadm to handle this properly on its own? Dex -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C+++() UL+ P+++ L+++ E-- W++ N o? K- w--(---) !O M+ V- PS++(+) PE(-) Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D G++ e* h++ r%* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softraid and multiple distros
Now the devices have all two superblocks, the one left from the first try which are now kinda orphaned and those now active. Can I trust mdadm to handle this properly on its own? I'm not sure what properly means. you should not leave around 0xfd partitions with bogus superblocks, since MD will certainly try to assemble them. I don't know offhand how it decides which components to make into a single array (UUID?), but why screw around? for orphan partitions, either change the partition type or zero the superblock or both... - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Problem with large devices 2TB
Followup to: [EMAIL PROTECTED] By author:Jim Klimov [EMAIL PROTECTED] In newsgroup: linux.dev.raid Since the new parted worked ok (older one didn't), we were happy until we tried a reboot. During the device initialization and after it the system only recognises the 6 or 7 partitions which start before the 2000Gb limit: For a DOS partition table, there is no such thing as a partition starting beyond 2 TB. You need to use a GPT or other more sophisticated partition table. -hpa - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: softraid and multiple distros
On Sunday May 14, [EMAIL PROTECTED] wrote: Am Sonntag, 14. Mai 2006 16:50 schrieben Sie: What do I need to do when I want to install a different distro on the machine with a raid5 array? Which files do I need? /etc/mdadm.conf? /etc/raittab? both? MD doesn't need any files to function, since it can auto-assemble arrays based on their superblocks (for partition-type 0xfd). I see. Now an issue arises someone else here mentioned: My first attempt was to use the entire disks, then I was hinted that this approach wasn't too hot so I made partitions. I always use entire disks if I want the entire disks raided (sounds obvious, doesn't it...) I only use partitions when I want to vary the raid layout for different parts of the disk (e.g. mirrored root, mirrored swap, raid6 for the rest). But that certainly doesn't mean it is wrong to use partitions for the whole disk. Now the devices have all two superblocks, the one left from the first try which are now kinda orphaned and those now active. Can I trust mdadm to handle this properly on its own? You can tell mdadm where to look. If you want to be sure that it won't look at entire drives, only partitions, then a line like DEVICES /dev/[hs]d*[0-1] in /etc/mdadm.conf might be what you want. However as you should be listing the uuids in /etc/mdadm.conf, any superblock with an unknown uuid will easily be ignored. If you are relying nf 0xfd autodetect to assemble your arrays, then obviously the entire-disk superblock will be ignored (because they wont be in the right place in any partition). NeilBrown - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
(replying to bits of several emails) On Friday May 12, [EMAIL PROTECTED] wrote: Neil Brown [EMAIL PROTECTED] wrote: However some IO requests cannot complete until the filesystem I/O completes, so we need to be sure that the filesystem I/O won't block waiting for memory, or fail with -ENOMEM. That sounds like a complex deadlock. Suppose the bitmap writeout requres some writeback to happen before it can get enough memory to proceed. Exactly. Bitmap writeout must not block on fs-writeback. It can block on device writeout (e.g. queue congestion or mempool exhaustion) but it must complete without waiting in the fs layer or above, and without the possibility of any error other -EIO. Otherwise we can get deadlocked writing to the raid array. bh_submit (or bio_submit) is certain to be safe in this respect. I'm not so confident about anything at the fs level. Read it with O_DIRECT :( Which is exactly what the next release of mdadm does. As the patch comment said: : With this approach the pagecache may contain data which is inconsistent with : what is on disk. To alleviate the problems this can cause, md invalidates : the pagecache when releasing the file. If the file is to be examined : while the array is active (a non-critical but occasionally useful function), : O_DIRECT io must be used. And new version of mdadm will have support for this. Which doesn't help `od -x' and is going to cause older mdadm userspace to mysteriously and subtly fail. Or does the user-kernel interface have versioning which will prevent this? As I said: 'non-critical'. Nothing important breaks if reading the file gets old data. Reading the file while the array is active is purely a curiosity thing. There is information in /proc/mdstat which gives a fairly coarse view of the same data. It could lead to some confusion, but if a compliant mdadm comes out before this gets into a mainline kernel, I doubt there will be any significant issue. Read/writing the bitmap needs to work reliably when the array is not active, but suitable sync/invalidate calls in the kernel should make that work perfectly. I know this is technically a regression in user-space interface, and you don't like such regression with good reason Maybe I could call invalidate_inode_pages every few seconds or whenever the atime changes, just to be on the safe side :-) I have a patch which did that, but decided that the possibility of kmalloc failure at awkward times would make that not suitable. submit_bh() can and will allocate memory, although most decent device drivers should be OK. submit_bh (like all decent device drivers) uses a mempool for memory allocation so we can be sure that the delay in getting memory is bounded by the delay for a few IO requests to complete, and we can be sure the allocation won't fail. This is perfectly fine. I don't think a_ops really provides an interface that I can use, partly because, as I said in a previous email, it isn't really a public interface to a filesystem. It's publicer than bmap+submit_bh! I don't know how you can say that. bmap is so public that it is exported to userspace through an IOCTL and is used by lilo (admitted only for reading, not writing). More significantly it is used by swapfile which is a completely independent subsystem from the filesystem. Contrast this with a_ops. The primary users of a_ops are routines like generic_file_{read,write} and friends. These are tools - library routines - that are used by filesystems to implement their 'file_operations' which are the real public interface. As far as these uses go, it is not a public interface. Where a filesystem doesn't use some library routines, it does not need to implement the matching functionality in the a_op interface. The other main user is the 'VM' which might try to flush out or invalidate pages. However the VM isn't using this interface to interact with files, but only to interact with pages, and it doesn't much care what is done with the pages providing they get clean, or get released, or whatever. The way I perceive Linux design/development, active usage is far more significant than documented design. If some feature of an interface isn't being actively used - by in-kernel code - then you cannot be sure that feature will be uniformly implemented, or that it won't change subtly next week. So when I went looking for the best way to get md/bitmap to write to a file, I didn't just look at the interface specs (which are pretty poorly documented anyway), I looked at existing code. I can find 3 different parts of the kernel that write to a file. They are swap-file loop nfsd nfsd uses vfs_read/vfs_write which have too many failure/delay modes for me to safely use. loop uses prepare_write/commit_write (if available) or f_op-write (not vfs_write - I wonder why) which is not much better than what nfsd uses. And as far as I can tell