Re: [RFD] FAT robustness
Hiroyuki Machida <[EMAIL PROTECTED]> writes: >>> - Utilize noop elevator to cancel unexpected operation reordering >> Why don't you use the barrier? > > You mean that using requests with barrier flag is enough and there is > no reason to specify IO-sched ? > > It is better to preserve order of updating data, some circumstance > like appending data. > > At xvfat for 2.4 had own elevator function, to preserve EraseBlock unit > ordering for memory card device. > > To begin consideration for 2.6, I'd like to make it simple. But later > we need to address to this issue. So I thought at first using "noop", > later switch special elevator function to handle device better. Um.. The independent updates is merged at special elevator, yes? If so, to merge also can do it well by fs-layer, then normal elevator can optimize the seek for HDD, no? If it's possible, I'd not like to depend to special elevator. >>>- With O_SYNC, close() make flush all related data and >>> meta-data, then wait completion of I/O >> What is this meaning? Why does O_SYNC only flush at close()? > > From application's point of view, application wants to believe > close()ed file is correctly written, without any corruption. > > At least close() need to guarantee this. It's ok every write() > flush meta data and data and wait compeletion I/O. > > At least fat on 2.4.20, VFS sync inode on write() with O_SYNC, > however it don't take care about super block. At FAT side don't care > about O_SYNC. That's problem. Ah, 2.6.12 was added the support of O_SYNC. Yes, it should be the every write(). If application need it at close() only, can use fsync(). Hiroyuki Machida <[EMAIL PROTECTED]> writes: > I need to explain background information more. My descriptions tends > to be depend on some knowledge about current xvfat for 2.4 kernel. > > I'm not a author of xvfat fo 2.4 kernel, but can explain little more. > > Current xvfat for 2.4 is designed to some specific flash memory card > controller which can guarantee atomicity of operation on ERASE-BLOCK size > unit. Xvfat for 2.4 try to merge operations on same ERASE-BLOCK under > some ordering constrain. > > And xvfat for 2.4 uses own version of transaction control using > in-core memory, not storage device like HDD nor flash ram, > to accomplish the above goal, with minimal changes on existing > FAT implementation. And this transaction control let FAT operations > came from different threads to fee from mixed up, where potentially > operation ordering problems would be caused. > > We'll start with HDD, however later we'll cover memory devices. > For memory devices we may prepare another elevator functions, > depending on property of devices or lower layer. E.g. NAND/AND flash > have different operation units for read/write and erase, > and have some translation layer. [...] > As other messages said, some developers suggest "SoftUpdate" to be used. > I need to consider about situation where memory devices are used, not HDD. SoftUpdates itself is not depending to blocksize of atomicity operation. And same robustness with xvfat will be provided by it, although probably multiple blocksize of atomicity operation may be complex. It is controling the dependency at very fine-granularity. And by default it will be used, because performance is very good. This is why I'm thinking SoftUpdates is best solution and I'd like to hear the detail. -- OGAWA Hirofumi <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hi, OGAWA Hirofumi wrote: Hiroyuki Machida <[EMAIL PROTECTED]> writes: We currently plan to add following features to address FAT corruption. - Utilize standard 2.6 features as much as possible - Implement as options of fat, vfat and uvfat What is the uvfat? typo (xvfat)? Why is this an option (does it have the big demerit)? uvfat is another variant of vfat, like umsdos. Xvfat for 2.4 has following directories and file organization; most files are located at fs/xvfat. and most of them, copied from fs/fat and fs/vfat and renamed to have prefix like 'xvfat_'. For 2.6, I feel that the above organization need to be changed. And xvfat for 2.4 had some performance degradation. So I guess 'option' is better. - Utilize noop elevator to cancel unexpected operation reordering Why don't you use the barrier? You mean that using requests with barrier flag is enough and there is no reason to specify IO-sched ? It is better to preserve order of updating data, some circumstance like appending data. At xvfat for 2.4 had own elevator function, to preserve EraseBlock unit ordering for memory card device. To begin consideration for 2.6, I'd like to make it simple. But later we need to address to this issue. So I thought at first using "noop", later switch special elevator function to handle device better. - Coordinate order of operations so that update data first, meta data later with transaction control Is this meaning the SoftUpdates? What does this guarantee? How does this handle the rename(), and cyclic dependency of updates? In <[EMAIL PROTECTED]>, I mentioned about this. - With O_SYNC, close() make flush all related data and meta-data, then wait completion of I/O What is this meaning? Why does O_SYNC only flush at close()? From application's point of view, application wants to believe close()ed file is correctly written, without any corruption. At least close() need to guarantee this. It's ok every write() flush meta data and data and wait compeletion I/O. At least fat on 2.4.20, VFS sync inode on write() with O_SYNC, however it don't take care about super block. At FAT side don't care about O_SYNC. That's problem. Almost things in your email is needing the detail. I'm thinking the SoftUpdates is best solution for now. Could you tell the detail of your solution? In <[EMAIL PROTECTED]>, I mentioned about this. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hi, I need to explain background information more. My descriptions tends to be depend on some knowledge about current xvfat for 2.4 kernel. I'm not a author of xvfat fo 2.4 kernel, but can explain little more. Current xvfat for 2.4 is designed to some specific flash memory card controller which can guarantee atomicity of operation on ERASE-BLOCK size unit. Xvfat for 2.4 try to merge operations on same ERASE-BLOCK under some ordering constrain. And xvfat for 2.4 uses own version of transaction control using in-core memory, not storage device like HDD nor flash ram, to accomplish the above goal, with minimal changes on existing FAT implementation. And this transaction control let FAT operations came from different threads to fee from mixed up, where potentially operation ordering problems would be caused. We'll start with HDD, however later we'll cover memory devices. For memory devices we may prepare another elevator functions, depending on property of devices or lower layer. E.g. NAND/AND flash have different operation units for read/write and erase, and have some translation layer. Paulo Marques wrote: Hiroyuki Machida wrote: [...] Q3 : I'm not sure JBD can be used for FAT improvements. Do you have any comments ? I might not be the best person to answer this, but this just seems so obvious: Any comments are welcome. If you plan to let a recently hot-unplugged device to be used in another OS that doesn't understand your journaling extensions, your disk will be corrupted. If this is supposed to work only on OS's that understand your journaling extensions, then there are much better filesystems out there with journaling already. I agree. Even not removable media, this situation will be occurred. Suppose that device like audio player which acts as USB client and provide USB Mass class target class. Embedded storage may be handled through by USB Host side, like Win PC or Mac. You might be able to reduce the size of the time window where hot removing the media will cause problems, like writting all the data first and update the metadata in as few operations as possible. But that just reduces the probability of data corruption. It doesn't eliminate it at all. As other messages said, some developers suggest "SoftUpdate" to be used. I need to consider about situation where memory devices are used, not HDD. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hi! > >[...] > > Q3 : I'm not sure JBD can be used for FAT improvements. Do you > >have any comments ? > > I might not be the best person to answer this, but this just seems so > obvious: > > If you plan to let a recently hot-unplugged device to be used in another > OS that doesn't understand your journaling extensions, your disk will be > corrupted. It will only be corrupted if you unplug it without unmounting, and it will only be corrupted as much as non-journalling disk is. Plus, you might intentionaly damage superblock signature on mount (an fix it on clean umount) so that you force user to plug it back to journalling system Pavel -- teflon -- maybe it is a trademark, but it should not be. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
On Tuesday 19 July 2005 19:58, Etienne Lorrain wrote: > > I'd like to have a discussion about FAT robustness. > > Please give your thought, comments and related issues. > > What I would like is to treat completely differently writing to > FAT (writing to a removeable drive) which need a complete "mount", > and just reading quickly a file (a standard use of removeable devices). > > Basically, to read you would not need to mount the partition, just > read /readfs/fd1 which uses two or three functions accessing /dev/fd1 > in raw mode to read the filesystem descriptor and the root directory. > Same for /readfs/cdrom and /readfs/sda4 (USB drive). > The only cache would be the one provided by /dev/fd1 - a kind of > mount read-only at each file opening. > > This system would be disabled if the partition is already mounted > read/write somewhere - but as long as you do not try to write to > a removeable disk you can extract it at any time. > > The two or three function I am talking of are located in Gujin > "fs.c" file to access read-only FAT12/16/32, EXT2/3 and ISOFS > ( http://gujin.org ). Just few kilobytes - and some source > modifications for that use. I think we will be better with more generic 'flush all dirty data and mark superblock as clean asap' behaviour, aka 'weak O_SYNC', so that we can remove e.g. USB removable almost anytime (can't safely remove it _only while it is being written to_). -- vda - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Etienne Lorrain <[EMAIL PROTECTED]> wrote: > > I'd like to have a discussion about FAT robustness. > > Please give your thought, comments and related issues. > What I would like is to treat completely differently writing to > FAT (writing to a removeable drive) which need a complete "mount", > and just reading quickly a file (a standard use of removeable devices). Sounds like a job for mtools(1). -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, ChileFax: +56 32 797513 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFD] FAT robustness
> I'd like to have a discussion about FAT robustness. > Please give your thought, comments and related issues. What I would like is to treat completely differently writing to FAT (writing to a removeable drive) which need a complete "mount", and just reading quickly a file (a standard use of removeable devices). Basically, to read you would not need to mount the partition, just read /readfs/fd1 which uses two or three functions accessing /dev/fd1 in raw mode to read the filesystem descriptor and the root directory. Same for /readfs/cdrom and /readfs/sda4 (USB drive). The only cache would be the one provided by /dev/fd1 - a kind of mount read-only at each file opening. This system would be disabled if the partition is already mounted read/write somewhere - but as long as you do not try to write to a removeable disk you can extract it at any time. The two or three function I am talking of are located in Gujin "fs.c" file to access read-only FAT12/16/32, EXT2/3 and ISOFS ( http://gujin.org ). Just few kilobytes - and some source modifications for that use. Etienne. ___ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger Téléchargez cette version sur http://fr.messenger.yahoo.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hiroyuki Machida <[EMAIL PROTECTED]> writes: > We currently plan to add following features to address FAT corruption. > > - Utilize standard 2.6 features as much as possible > - Implement as options of fat, vfat and uvfat What is the uvfat? typo (xvfat)? Why is this an option (does it have the big demerit)? > - Utilize noop elevator to cancel unexpected operation reordering Why don't you use the barrier? > - Coordinate order of operations so that update data first, meta >data later with transaction control Is this meaning the SoftUpdates? What does this guarantee? How does this handle the rename(), and cyclic dependency of updates? > - With O_SYNC, close() make flush all related data and >meta-data, then wait completion of I/O What is this meaning? Why does O_SYNC only flush at close()? Almost things in your email is needing the detail. I'm thinking the SoftUpdates is best solution for now. Could you tell the detail of your solution? -- OGAWA Hirofumi <[EMAIL PROTECTED]> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] FAT robustness
Hiroyuki Machida wrote: [...] Q3 : I'm not sure JBD can be used for FAT improvements. Do you have any comments ? I might not be the best person to answer this, but this just seems so obvious: If you plan to let a recently hot-unplugged device to be used in another OS that doesn't understand your journaling extensions, your disk will be corrupted. If this is supposed to work only on OS's that understand your journaling extensions, then there are much better filesystems out there with journaling already. You might be able to reduce the size of the time window where hot removing the media will cause problems, like writting all the data first and update the metadata in as few operations as possible. But that just reduces the probability of data corruption. It doesn't eliminate it at all. -- Paulo Marques - www.grupopie.com It is a mistake to think you can solve any major problems just with potatoes. Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFD] FAT robustness
Folks, I'd like to have a discussion about FAT robustness. Please give your thought, comments and related issues. About few years ago, we added some features to FAT, called xvfat, so that System and FAT have robustness against unexpected media hot unplug and ability to let applications correctly be aware the event. Just for your reference, I put a patch to 2.4.20 kernel at http://www.celinuxforum.org/CelfPubWiki/XvFatDiscussion?action=AttachFile&do=get&target=20050715-xvfat-2.4.20.patch This includes following features; Handle media removed during “mount” Notification of media removal to application Cancellation of I/O Elevator for Block device Block system calls until a completion of writing Control order of meta-data updates, using transaction control implemented in fs/xvfat/fwrq.c File syscall return “error”, except umount Japanese file name support possible 1-N mapping issues SJIS <-> UNICODE Dirty Flag support TIME ZONE support On moving to 2.6, we consider and categorize issues, again. And we are planing to have open source project for these features to add 2.6 kernel. I'd like to open discussion about these features and how to implement on 2.6 kernel. 1. Issues to be addressed - Issues around FAT with CE devices - Hot unplug issues - File System corruption on unplug media/storage device Almost same as power down without umount - Notification of the event Application need to know the event precisely Need to more investigation - System stability after unplug Almost same as I/O error recovery issues discussed at LKLM http://developer.osdl.jp/projects/doubt/fs-consistency-and-coherency/index.html http://groups.google.co.jp/group/linux.kernel/browse_thread/thread/b9c11bccd59e0513/4a4dd84b411c6d32?q=[RFD]+FS+behavior+(I%2FO+failure)+in+kernel+summit++lkml&rnum=1&hl=ja#4a4dd84b411c6d32 - Other issues - Time stamp issues using always local time time resolution is 2sec unit - Issues around mapping with UNICODE and local char code 1-N mapping SJIS<-> UNICODE Potential directory cache problem due to 1 –N mapping Possible inconsistency problems with application side - Support file size over 2GB - Support dirty flag Q1 : First issue for discussion is "Do you have any other issues about this?" and "Do you have any other idea to categorize the issues?" 2. FAT corruption on unplug media/storage device On starting the open source project, we focus to the following issue, first. - File System corruption on unplug media/storage device Almost same as power down without umount And, we are planing to focus on HDD device and treat system power down instead of unplug media, because A. Damages and it's counter methods may depend on property of lower layer E.g. - Memory Card Some controller can guaranty atomicity of certain operations - Flush Memory (NAND, NOR) I/O operations may be constrained by Block Size (e,g, 128KB) or Page Size (e.g. 2KB) - HDD - Cache memory my resident inside in - Sector which is under writing on power down may be corrupted(can't read anymore) B. It may make the problem easier - Sector size is 512 Byte - Many developers may check with PC Q2 : Do you know any other storage devices and it's property, to be address later? 3. Features to be developed for FAT corruption. We currently plan to add following features to address FAT corruption. - Utilize standard 2.6 features as much as possible - Implement as options of fat, vfat and uvfat - Utilize existent journal block device (JBD) for transaction control - Utilize noop elevator to cancel unexpected operation reordering - Coordinate order of operations so that update data first, meta data later with transaction control - With O_SYNC, close() make flush all related data and meta-data, then wait completion of I/O Q3 : I'm not sure JBD can be used for FAT improvements. Do you have any comments ? Thanks, Hiroyuki Machida - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/