Re: [RFC] Parallelize IO for e2fsck
On Jan 17, 2008 5:15 PM, David Chinner <[EMAIL PROTECTED]> wrote: > On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote: > > Hi y'all, > > > > This is a request for comments on the rewrite of the e2fsck IO > > parallelization patches I sent out a few months ago. The mechanism is > > totally different. Previously IO was parallelized by issuing IOs from > > multiple threads; now a single thread issues fadvise(WILLNEED) and > > then uses read() to complete the IO. > > Interesting. > > We ultimately rejected a similar patch to xfs_repair (pre-population > the kernel block device cache) mainly because of low memory > performance issues and it doesn't really enable you to do anything > particularly smart with optimising I/O patterns for larger, high > performance RAID arrays. > > The low memory problems were particularly bad; the readahead > thrashing cause a slowdown of 2-3x compared to the baseline and > often it was due to the repair process requiring all of memory > to cache stuff it would need later. IIRC, multi-terabyte ext3 > filesystems have similar memory usage problems to XFS, so there's > a good chance that this patch will see the same sorts of issues. That was one of my first concerns - how to avoid overflowing memory? Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due to the minor detail of every single block being read from disk twice. :) I have a partial solution that sort of blindly manages the buffer cache. First, the user passes e2fsck a parameter saying how much memory is available as buffer cache. The readahead thread reads things in and immediately throws them away so they are only in buffer cache (no double-caching). Then readahead and e2fsck work together so that readahead only reads in new blocks when the main thread is done with earlier blocks. The already-used blocks get kicked out of buffer cache to make room for the new ones. What would be nice is to take into account the current total memory usage of the whole fsck process and factor that in. I don't think it would be hard to add to the existing cache management framework. Thoughts? > Promising results, though Thanks! It's solving a rather simpler problem than XFS check/repair. :) -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Parallelize IO for e2fsck
On Jan 17, 2008 5:15 PM, David Chinner [EMAIL PROTECTED] wrote: On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote: Hi y'all, This is a request for comments on the rewrite of the e2fsck IO parallelization patches I sent out a few months ago. The mechanism is totally different. Previously IO was parallelized by issuing IOs from multiple threads; now a single thread issues fadvise(WILLNEED) and then uses read() to complete the IO. Interesting. We ultimately rejected a similar patch to xfs_repair (pre-population the kernel block device cache) mainly because of low memory performance issues and it doesn't really enable you to do anything particularly smart with optimising I/O patterns for larger, high performance RAID arrays. The low memory problems were particularly bad; the readahead thrashing cause a slowdown of 2-3x compared to the baseline and often it was due to the repair process requiring all of memory to cache stuff it would need later. IIRC, multi-terabyte ext3 filesystems have similar memory usage problems to XFS, so there's a good chance that this patch will see the same sorts of issues. That was one of my first concerns - how to avoid overflowing memory? Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due to the minor detail of every single block being read from disk twice. :) I have a partial solution that sort of blindly manages the buffer cache. First, the user passes e2fsck a parameter saying how much memory is available as buffer cache. The readahead thread reads things in and immediately throws them away so they are only in buffer cache (no double-caching). Then readahead and e2fsck work together so that readahead only reads in new blocks when the main thread is done with earlier blocks. The already-used blocks get kicked out of buffer cache to make room for the new ones. What would be nice is to take into account the current total memory usage of the whole fsck process and factor that in. I don't think it would be hard to add to the existing cache management framework. Thoughts? Promising results, though Thanks! It's solving a rather simpler problem than XFS check/repair. :) -VAL -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 16, 2008 3:49 AM, Pavel Machek <[EMAIL PROTECTED]> wrote: > > ext3's "lets fsck on every 20 mounts" is good idea, but it can be > annoying when developing. Having option to fsck while filesystem is > online takes that annoyance away. I'm sure everyone on cc: knows this, but for the record you can change ext3's fsck on N mounts or every N days to something that makes sense for your use case. Usually I just turn it off entirely and run fsck by hand when I'm worried: # tune2fs -c 0 -i 0 /dev/whatever -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)
On Jan 16, 2008 3:49 AM, Pavel Machek [EMAIL PROTECTED] wrote: ext3's lets fsck on every 20 mounts is good idea, but it can be annoying when developing. Having option to fsck while filesystem is online takes that annoyance away. I'm sure everyone on cc: knows this, but for the record you can change ext3's fsck on N mounts or every N days to something that makes sense for your use case. Usually I just turn it off entirely and run fsck by hand when I'm worried: # tune2fs -c 0 -i 0 /dev/whatever -VAL -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] Incremental fsck
On Jan 8, 2008 8:40 PM, Al Boldi <[EMAIL PROTECTED]> wrote: > Rik van Riel wrote: > > Al Boldi <[EMAIL PROTECTED]> wrote: > > > Has there been some thought about an incremental fsck? > > > > > > You know, somehow fencing a sub-dir to do an online fsck? > > > > Search for "chunkfs" > > Sure, and there is TileFS too. > > But why wouldn't it be possible to do this on the current fs infrastructure, > using just a smart fsck, working incrementally on some sub-dir? Several data structures are file system wide and require finding every allocated file and block to check that they are correct. In particular, block and inode bitmaps can't be checked per subdirectory. http://infohost.nmt.edu/~val/review/chunkfs.pdf -VAL -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFD] Incremental fsck
On Jan 8, 2008 8:40 PM, Al Boldi [EMAIL PROTECTED] wrote: Rik van Riel wrote: Al Boldi [EMAIL PROTECTED] wrote: Has there been some thought about an incremental fsck? You know, somehow fencing a sub-dir to do an online fsck? Search for chunkfs Sure, and there is TileFS too. But why wouldn't it be possible to do this on the current fs infrastructure, using just a smart fsck, working incrementally on some sub-dir? Several data structures are file system wide and require finding every allocated file and block to check that they are correct. In particular, block and inode bitmaps can't be checked per subdirectory. http://infohost.nmt.edu/~val/review/chunkfs.pdf -VAL -VAL -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] ebizzy 0.2 released
On Sun, Sep 30, 2007 at 05:27:03PM -0700, David Miller wrote: > From: Valerie Henson <[EMAIL PROTECTED]> > Date: Wed, 22 Aug 2007 19:06:26 -0600 > > > ebizzy is designed to generate a workload resembling common web > > application server workloads. > > I downloaded this only to be basically disappointed. > > Any program which claims to generate workloads "resembling common web > application server workloads", and yet does zero network activity and > absolutely nothing with sockets is so far disconnected from reality > that I truly question how useful it really is even in the context it > was designed for. > > Please describe this program differently, "a threaded cpu eater", "a > threaded memory scanner", "a threaded hash lookup", or something > suitably matching what it really does. > > I'm sure there are at least 10 or even more programs in LTP that one > could run under "time" and get the same exact functionality. You're right, that part of the description is misleading. (I've even had people ask me if it's a file systems benchmark!) Ebizzy is based on a real web application server and does do things that are fairly common in such applications (multithreaded memory allocation and memory access), but it ignores networking for two reasons: the network stack was not the bottleneck for this workload, the VM was, and really good network benchmarks already exist. :) ebizzy is not useful to networking (or file systems) developer, but it has been used to improve malloc() behavior in glibc and to test VMA handling optimizations. In general, I try to make the source of a benchmark clear because it's so tempting to optimize for completely artificial benchmarks. The trick is to do this without misleading the reader (or breaking my NDA). ebizzy -- ebizzy is a workload that stresses memory allocation and the virtual memory subsystem. It was initially written to model the local computation portion of a web application server running a large internet commerce site. ebizzy is highly threaded, has a large in-memory working set with poor locality, and allocates and deallocates memory frequently. When running most efficiently, ebizzy will max out the CPU. When running inefficiently, it will be blocked much of the time. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] ebizzy 0.2 released
On Sun, Sep 30, 2007 at 05:27:03PM -0700, David Miller wrote: From: Valerie Henson [EMAIL PROTECTED] Date: Wed, 22 Aug 2007 19:06:26 -0600 ebizzy is designed to generate a workload resembling common web application server workloads. I downloaded this only to be basically disappointed. Any program which claims to generate workloads resembling common web application server workloads, and yet does zero network activity and absolutely nothing with sockets is so far disconnected from reality that I truly question how useful it really is even in the context it was designed for. Please describe this program differently, a threaded cpu eater, a threaded memory scanner, a threaded hash lookup, or something suitably matching what it really does. I'm sure there are at least 10 or even more programs in LTP that one could run under time and get the same exact functionality. You're right, that part of the description is misleading. (I've even had people ask me if it's a file systems benchmark!) Ebizzy is based on a real web application server and does do things that are fairly common in such applications (multithreaded memory allocation and memory access), but it ignores networking for two reasons: the network stack was not the bottleneck for this workload, the VM was, and really good network benchmarks already exist. :) ebizzy is not useful to networking (or file systems) developer, but it has been used to improve malloc() behavior in glibc and to test VMA handling optimizations. In general, I try to make the source of a benchmark clear because it's so tempting to optimize for completely artificial benchmarks. The trick is to do this without misleading the reader (or breaking my NDA). ebizzy -- ebizzy is a workload that stresses memory allocation and the virtual memory subsystem. It was initially written to model the local computation portion of a web application server running a large internet commerce site. ebizzy is highly threaded, has a large in-memory working set with poor locality, and allocates and deallocates memory frequently. When running most efficiently, ebizzy will max out the CPU. When running inefficiently, it will be blocked much of the time. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] ebizzy 0.2 released
ebizzy is designed to generate a workload resembling common web application server workloads. It is especially useful for testing changes to memory management, and whenever a highly threaded application with a large working set and many vmas is needed. This is release 0.2 of ebizzy. It reports a rate of transactions per second, compiles on Solaris, and scales better. Thanks especially to Rodrigo Rubira Branco, Brian Twichell, and Yong Cai for their work on this release. Available for download at the fancy new Sourceforge site: http://sourceforge.net/projects/ebizzy/ ChangeLog below. -VAL 2008-08-15 Valerie Henson <[EMAIL PROTECTED]> * Release 0.2. * Started reporting a rate of transactions per second rather than just measuring the time. * Solaris compatibility, thanks to Rodrigo Rubira Branco <[EMAIL PROTECTED]> for frequent patches and testing. * rand() was limiting scalability, use cheap dumb inline "random" function to avoid that. Thanks to Brian Twichell <[EMAIL PROTECTED]> for finding it and Yong Cai <[EMAIL PROTECTED]> for testing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] ebizzy 0.2 released
ebizzy is designed to generate a workload resembling common web application server workloads. It is especially useful for testing changes to memory management, and whenever a highly threaded application with a large working set and many vmas is needed. This is release 0.2 of ebizzy. It reports a rate of transactions per second, compiles on Solaris, and scales better. Thanks especially to Rodrigo Rubira Branco, Brian Twichell, and Yong Cai for their work on this release. Available for download at the fancy new Sourceforge site: http://sourceforge.net/projects/ebizzy/ ChangeLog below. -VAL 2008-08-15 Valerie Henson [EMAIL PROTECTED] * Release 0.2. * Started reporting a rate of transactions per second rather than just measuring the time. * Solaris compatibility, thanks to Rodrigo Rubira Branco [EMAIL PROTECTED] for frequent patches and testing. * rand() was limiting scalability, use cheap dumb inline random function to avoid that. Thanks to Brian Twichell [EMAIL PROTECTED] for finding it and Yong Cai [EMAIL PROTECTED] for testing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote: > Andrew Morton wrote: > >On Wed, 08 Aug 2007 14:10:15 -0700 > >"Martin J. Bligh" <[EMAIL PROTECTED]> wrote: > > > >>Why isn't this easily fixable by just adding an additional dirty > >>flag that says atime has changed? Then we only cause a write > >>when we remove the inode from the inode cache, if only atime > >>is updated. > > > >I think that could be made to work, and it would fix the performance > >issue. > > > >It is a behaviour change. At present ext3 (for example) commits everything > >every five seconds. After a change like this, a crash+recovery could cause > >a file's atime to go backwards by an arbitrarily large time interval - it > >could easily be months. > > A second pdflush / workqueue at a slower rate would alleviate that. This becomes delayed atime writes. I'm not sure that it's better to batch up the writes and do them all in one big seeky go, or to trickle them out as they are done. Best of all is not to do them at all. Note when talking about saving up atime updates to write out that the final write is going to be sloow. Inodes are typically 128 bytes, and you may have to do a seek between every one. Currents disks can do on the order of 100 seeks a second. So do a find on 1000 files and you've just created 10 seconds of I/O hanging out in memory. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/23] per device dirty throttling -v8
On Wed, Aug 08, 2007 at 05:54:57PM -0700, Martin Bligh wrote: Andrew Morton wrote: On Wed, 08 Aug 2007 14:10:15 -0700 Martin J. Bligh [EMAIL PROTECTED] wrote: Why isn't this easily fixable by just adding an additional dirty flag that says atime has changed? Then we only cause a write when we remove the inode from the inode cache, if only atime is updated. I think that could be made to work, and it would fix the performance issue. It is a behaviour change. At present ext3 (for example) commits everything every five seconds. After a change like this, a crash+recovery could cause a file's atime to go backwards by an arbitrarily large time interval - it could easily be months. A second pdflush / workqueue at a slower rate would alleviate that. This becomes delayed atime writes. I'm not sure that it's better to batch up the writes and do them all in one big seeky go, or to trickle them out as they are done. Best of all is not to do them at all. Note when talking about saving up atime updates to write out that the final write is going to be sloow. Inodes are typically 128 bytes, and you may have to do a seek between every one. Currents disks can do on the order of 100 seeks a second. So do a find on 1000 files and you've just created 10 seconds of I/O hanging out in memory. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [TULIP] Need new maintainer
On Mon, Jul 30, 2007 at 03:31:58PM -0400, Kyle McMartin wrote: > On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote: > > The Tulip network driver needs a new maintainer! I no longer have > > time to maintain the Tulip network driver and I'm stepping down. Jeff > > Garzik would be happy to get volunteers. > > > > Since I already take care of a major consumer of these devices (parisc, > which pretty much all have tulip) I'm willing to take care of this. > Alternately, Grant is probably willing. And I coulda handed you a suitcase full of cards and I missed my chance! It's fine by me, although Jeff is the final arbiter. Thanks! -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tulip: Remove tulip maintainer
Remove Val Henson as tulip maintainer and let her roam free, FREE! Signed-off-by: Val Henson <[EMAIL PROTECTED]> --- linux-2.6.orig/MAINTAINERS +++ linux-2.6/MAINTAINERS @@ -3569,11 +3569,9 @@ W: http://www.auk.cx/tms380tr/ S: Maintained TULIP NETWORK DRIVER -P: Valerie Henson -M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] W: http://sourceforge.net/projects/tulip/ -S: Maintained +S: Orphan TUN/TAP driver P: Maxim Krasnyansky - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[TULIP] Need new maintainer
The Tulip network driver needs a new maintainer! I no longer have time to maintain the Tulip network driver and I'm stepping down. Jeff Garzik would be happy to get volunteers. The only current major outstanding patch I know of is Grant's shutdown race patch, which was incorrectly dropped as obsoleted from -mm (my fault, I was moving at the time): http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html I have a very much non-working patch to do it with the preferred order, ask me for it and I'll see if I can dig it up. It's unpleasant partly because it pointed out a lot of latent bugs (e.g., del_timer_sync() in interrupt context). Also, someone is working on support for an emulated Tulip card (yes, Tulip will _never_ die), so expect possible patches for that. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[TULIP] Need new maintainer
The Tulip network driver needs a new maintainer! I no longer have time to maintain the Tulip network driver and I'm stepping down. Jeff Garzik would be happy to get volunteers. The only current major outstanding patch I know of is Grant's shutdown race patch, which was incorrectly dropped as obsoleted from -mm (my fault, I was moving at the time): http://www.mail-archive.com/[EMAIL PROTECTED]/msg12161.html I have a very much non-working patch to do it with the preferred order, ask me for it and I'll see if I can dig it up. It's unpleasant partly because it pointed out a lot of latent bugs (e.g., del_timer_sync() in interrupt context). Also, someone is working on support for an emulated Tulip card (yes, Tulip will _never_ die), so expect possible patches for that. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] tulip: Remove tulip maintainer
Remove Val Henson as tulip maintainer and let her roam free, FREE! Signed-off-by: Val Henson [EMAIL PROTECTED] --- linux-2.6.orig/MAINTAINERS +++ linux-2.6/MAINTAINERS @@ -3569,11 +3569,9 @@ W: http://www.auk.cx/tms380tr/ S: Maintained TULIP NETWORK DRIVER -P: Valerie Henson -M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] W: http://sourceforge.net/projects/tulip/ -S: Maintained +S: Orphan TUN/TAP driver P: Maxim Krasnyansky - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [TULIP] Need new maintainer
On Mon, Jul 30, 2007 at 03:31:58PM -0400, Kyle McMartin wrote: On Mon, Jul 30, 2007 at 01:04:13PM -0600, Valerie Henson wrote: The Tulip network driver needs a new maintainer! I no longer have time to maintain the Tulip network driver and I'm stepping down. Jeff Garzik would be happy to get volunteers. Since I already take care of a major consumer of these devices (parisc, which pretty much all have tulip) I'm willing to take care of this. Alternately, Grant is probably willing. And I coulda handed you a suitcase full of cards and I missed my chance! It's fine by me, although Jeff is the final arbiter. Thanks! -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Cross-chunk reference checking time estimates
Hey all, I altered Karuna's cref tool to print the number of seconds it would take to check the cross-references for a chunk. The results look good for chunkfs: on my laptop /home file system and a 1 GB chunk size, the per-chunk cross-reference check time would be an average of 5 seconds and a max of 160 seconds in 2013. This is calculated assuming average seek time and rotational latency delay for every cross-reference checked; some simple batching of I/Os could significantly improve that. The tool is a little dodgy on error handling and other edge cases ATM, but for now, here's the results and the code (attached): [EMAIL PROTECTED]:~/chunkfs/cref_new$ sudo ./cref.sh /dev/hda3 dump /home 1024 Total size = 19535040 KB Total data stored = 13998240 KB Number of files = 445406 Number of directories = 31836 Number of special files = 12156 Size of block groups = 1048576 KB Inodes per block group = 130304 Intra-file cross references = 63167 Directory-subdirectory references = 429 Directory-file references = 2381 Total directory cross references = 2810 Total cross references = 65977 Total cross references = 65977 Average cross references per group = 439 Maximum cross references in a group = 13997 Max group is 4 (0:3, 1:46, 2:282, 3:4996, 5:8445, 6:2, 7:1, 8:27, 9:1, 10:2, 12:1, 13:51, 14:32, 15:99, 16:2, 17:5, 18:2, ) Average additional time to check cross references = 6.77 s Max additional time to check cross references = 215.55 s 2013 average additional time to check cross references = 4.93 s 2013 max additional time to check cross references = 156.77 s Questions? Come talk on #linuxfs at irc.oftc.net. -VAL cref_new.tar.gz Description: GNU Zip compressed data
Cross-chunk reference checking time estimates
Hey all, I altered Karuna's cref tool to print the number of seconds it would take to check the cross-references for a chunk. The results look good for chunkfs: on my laptop /home file system and a 1 GB chunk size, the per-chunk cross-reference check time would be an average of 5 seconds and a max of 160 seconds in 2013. This is calculated assuming average seek time and rotational latency delay for every cross-reference checked; some simple batching of I/Os could significantly improve that. The tool is a little dodgy on error handling and other edge cases ATM, but for now, here's the results and the code (attached): [EMAIL PROTECTED]:~/chunkfs/cref_new$ sudo ./cref.sh /dev/hda3 dump /home 1024 Total size = 19535040 KB Total data stored = 13998240 KB Number of files = 445406 Number of directories = 31836 Number of special files = 12156 Size of block groups = 1048576 KB Inodes per block group = 130304 Intra-file cross references = 63167 Directory-subdirectory references = 429 Directory-file references = 2381 Total directory cross references = 2810 Total cross references = 65977 Total cross references = 65977 Average cross references per group = 439 Maximum cross references in a group = 13997 Max group is 4 (0:3, 1:46, 2:282, 3:4996, 5:8445, 6:2, 7:1, 8:27, 9:1, 10:2, 12:1, 13:51, 14:32, 15:99, 16:2, 17:5, 18:2, ) Average additional time to check cross references = 6.77 s Max additional time to check cross references = 215.55 s 2013 average additional time to check cross references = 4.93 s 2013 max additional time to check cross references = 156.77 s Questions? Come talk on #linuxfs at irc.oftc.net. -VAL cref_new.tar.gz Description: GNU Zip compressed data
[PATCH] Update tulip maintainer email address
I've quit Intel and gone into business as a Linux consultant. Update my email address in MAINTAINERS. Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> --- laptop-2.6.orig/MAINTAINERS +++ laptop-2.6/MAINTAINERS @@ -3497,7 +3497,7 @@ S:Maintained TULIP NETWORK DRIVER P: Valerie Henson -M: [EMAIL PROTECTED] +M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] W: http://sourceforge.net/projects/tulip/ S: Maintained - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Update tulip maintainer email address
I've quit Intel and gone into business as a Linux consultant. Update my email address in MAINTAINERS. Signed-off-by: Valerie Henson [EMAIL PROTECTED] --- laptop-2.6.orig/MAINTAINERS +++ laptop-2.6/MAINTAINERS @@ -3497,7 +3497,7 @@ S:Maintained TULIP NETWORK DRIVER P: Valerie Henson -M: [EMAIL PROTECTED] +M: [EMAIL PROTECTED] L: [EMAIL PROTECTED] W: http://sourceforge.net/projects/tulip/ S: Maintained - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
On Mon, Apr 23, 2007 at 02:05:47AM +0530, Karuna sagar K wrote: > Hi, > > The attached code contains program to estimate the cross-chunk > references for ChunkFS file system (idea from Valh). Below are the > results: Nice work! Thank you very much for doing this! -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote: > > Also, is it considered a cross-chunk reference if a directory entry is > referencing an inode in another group? Should there be a continuation > inode in the local group, or is the directory entry itself enough? (Sorry for the delay; just moved to Portland these last couple of weeks.) It is a cross-chunk reference - we can't calculate the correct link count for the target file unless we have a quick way to get all the directory entries pointing to an inode. My current scheme is to create a continuation inode for the directory in the chunk containing the inode (if the chunk containing the inode is full, create new continuation inodes for both in a new chunk). -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
On Mon, Apr 23, 2007 at 02:53:33PM -0600, Andreas Dilger wrote: Also, is it considered a cross-chunk reference if a directory entry is referencing an inode in another group? Should there be a continuation inode in the local group, or is the directory entry itself enough? (Sorry for the delay; just moved to Portland these last couple of weeks.) It is a cross-chunk reference - we can't calculate the correct link count for the target file unless we have a quick way to get all the directory entries pointing to an inode. My current scheme is to create a continuation inode for the directory in the chunk containing the inode (if the chunk containing the inode is full, create new continuation inodes for both in a new chunk). -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ChunkFS - measuring cross-chunk references
On Mon, Apr 23, 2007 at 02:05:47AM +0530, Karuna sagar K wrote: Hi, The attached code contains program to estimate the cross-chunk references for ChunkFS file system (idea from Valh). Below are the results: Nice work! Thank you very much for doing this! -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote: > On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: > > > I'd really like to see a generic VFS-level detection of > > read()/write()/creat()/mkdir()/etc. patterns which could detect things > > like "Oh, this file is likely to be deleted immediately, wait and see > > if it goes away and don't bother sending it on to the FS immediately" > > or "Looks like this file will grow pretty big, let's go pre-allocate > > some space for it." This is probably best done as a set of helper > > functions in the usual way. > > What patterns do you think means things like "this file is likely to > be deleted immediate", or "this file will grow pretty big"? I don't > think there are any that would be generally valid. I wouldn't have guessed that either, but it turns out there are: http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable opti- mizations. For example, lock files have predictable names and permissions, and live for a fraction of second in most cases. Files which are appended a few hundred bytes at a time are probably log files and will continue to grow in this manner. Some of their predictions were 98% accurate! In any case, any predictive algorithms we already do at the file system level can be done at the VFS level, and shared between file systems, instead of being reimplemented over and over again. Just food for thought. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Fri, May 04, 2007 at 08:23:08AM -0400, Theodore Tso wrote: On Thu, May 03, 2007 at 02:14:52PM -0700, Valerie Henson wrote: I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately or Looks like this file will grow pretty big, let's go pre-allocate some space for it. This is probably best done as a set of helper functions in the usual way. What patterns do you think means things like this file is likely to be deleted immediate, or this file will grow pretty big? I don't think there are any that would be generally valid. I wouldn't have guessed that either, but it turns out there are: http://www.eecs.harvard.edu/~ellard/pubs/able-usenix04.pdf We present evidence that attributes that are known to the file system when a file is created, such as its name, permission mode, and owner, are often strongly related to future properties of the file such as its ultimate size, lifespan, and access pattern. More importantly, we show that we can exploit these relationships to automatically generate predictive models for these properties, and that these predictions are sufficiently accurate to enable opti- mizations. For example, lock files have predictable names and permissions, and live for a fraction of second in most cases. Files which are appended a few hundred bytes at a time are probably log files and will continue to grow in this manner. Some of their predictions were 98% accurate! In any case, any predictive algorithms we already do at the file system level can be done at the VFS level, and shared between file systems, instead of being reimplemented over and over again. Just food for thought. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: > On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: > > Hello all, > > > > I've been testing the NAS performance of ext3/Openfiler 2.2 against > > NTFS/WinXP and have found that NTFS significantly outperforms ext3 for > > video workloads. The Windows CIFS client will attempt a poor-man's > > pre-allocation of the file on the server by sending 1-byte writes at > > 128K-byte strides, breaking block allocation on ext3 and leading to > > fragmentation and poor performance. This will happen for many > > applications (including iTunes) as the CIFS client issues these > > pre-allocates under the application layer. > > > > I've posted a brief paper on Intel's OSS website > > (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give > > it a read and let me know what you think. In particular, I'd like to > > arrive at the right place to fix this problem: is it in the filesystem, > > VFS, or Samba? > > As I commented on IRC to Val Henson - the XFS performance indicates > that it is not a VFS or Samba problem. In terms of what piece of code we can swap out and get good performance, the problem is indeed in ext3 - it's clear that the cause of the bad performance is the 1-byte writes resulting in ext3 fragmenting the on-disk layout of the file, and replacing it with XFS results in nice, clean, unfragmented files. But in terms of what we should do to fix it, there is the possibility of some debate. In general, I think there is a lot of code stuck down in individual file systems - especially in XFS - that could be usefully hoisted up to a higher level as generic helper functions. For example, we've got at least two implementations of reservations, one in XFS and one in ext3/4. At least some of the code could be generic - both file systems want to reserve long contiguous extents - with the actual mechanics of looking up and reserving free blocks implemented in per-fs code. I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like "Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately" or "Looks like this file will grow pretty big, let's go pre-allocate some space for it." This is probably best done as a set of helper functions in the usual way. For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Ext3 vs NTFS performance
On Thu, May 03, 2007 at 01:44:14AM +1000, David Chinner wrote: On Tue, May 01, 2007 at 01:43:18PM -0700, Cabot, Mason B wrote: Hello all, I've been testing the NAS performance of ext3/Openfiler 2.2 against NTFS/WinXP and have found that NTFS significantly outperforms ext3 for video workloads. The Windows CIFS client will attempt a poor-man's pre-allocation of the file on the server by sending 1-byte writes at 128K-byte strides, breaking block allocation on ext3 and leading to fragmentation and poor performance. This will happen for many applications (including iTunes) as the CIFS client issues these pre-allocates under the application layer. I've posted a brief paper on Intel's OSS website (http://softwarecommunity.intel.com/articles/eng/1259.htm). Please give it a read and let me know what you think. In particular, I'd like to arrive at the right place to fix this problem: is it in the filesystem, VFS, or Samba? As I commented on IRC to Val Henson - the XFS performance indicates that it is not a VFS or Samba problem. In terms of what piece of code we can swap out and get good performance, the problem is indeed in ext3 - it's clear that the cause of the bad performance is the 1-byte writes resulting in ext3 fragmenting the on-disk layout of the file, and replacing it with XFS results in nice, clean, unfragmented files. But in terms of what we should do to fix it, there is the possibility of some debate. In general, I think there is a lot of code stuck down in individual file systems - especially in XFS - that could be usefully hoisted up to a higher level as generic helper functions. For example, we've got at least two implementations of reservations, one in XFS and one in ext3/4. At least some of the code could be generic - both file systems want to reserve long contiguous extents - with the actual mechanics of looking up and reserving free blocks implemented in per-fs code. I'd really like to see a generic VFS-level detection of read()/write()/creat()/mkdir()/etc. patterns which could detect things like Oh, this file is likely to be deleted immediately, wait and see if it goes away and don't bother sending it on to the FS immediately or Looks like this file will grow pretty big, let's go pre-allocate some space for it. This is probably best done as a set of helper functions in the usual way. For this particular case, Ted is probably right and the only place we'll ever see this insane poor man's pre-allocate pattern is from the Windows CIFS client, in which case fixing this in Samba makes sense - although I'm a bit horrified by the idea of writing 128K of zeroes to pre-allocate... oh well, it's temporary, and what we care about here is the read performance, more than the write performance. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Fri, Apr 27, 2007 at 11:06:47AM -0400, Jeff Dike wrote: > On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote: > > Here's an example, spelled out: > > > > Allocate file 1 in chunk A. > > Grow file 1. > > Chunk A fills up. > > Allocate continuation inode for file 1 in chunk B. > > Chunk A gets some free space. > > Chunk B fills up. > > Pick chunk A for allocating next block of file 1. > > Try to look up a continuation inode for file 1 in chunk A. > > Continuation inode for file 1 found in chunk A! > > Attach newly allocated block to existing inode for file 1 in chunk A. > > So far, so good (and the slides are helpful, tx!). What happens when > file 1 keeps growing and chunk A fills up (and chunk B is still full)? > Can the same continuation inode also point at chunk C, where the file > is going to grow to? You allocate a new continuation inode in chunk C. The rule is that only inodes inside a chunk can point to blocks inside the chunk, so you need an inode in C if you want to allocate blocks from C. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Fri, Apr 27, 2007 at 11:06:47AM -0400, Jeff Dike wrote: On Thu, Apr 26, 2007 at 09:58:25PM -0700, Valerie Henson wrote: Here's an example, spelled out: Allocate file 1 in chunk A. Grow file 1. Chunk A fills up. Allocate continuation inode for file 1 in chunk B. Chunk A gets some free space. Chunk B fills up. Pick chunk A for allocating next block of file 1. Try to look up a continuation inode for file 1 in chunk A. Continuation inode for file 1 found in chunk A! Attach newly allocated block to existing inode for file 1 in chunk A. So far, so good (and the slides are helpful, tx!). What happens when file 1 keeps growing and chunk A fills up (and chunk B is still full)? Can the same continuation inode also point at chunk C, where the file is going to grow to? You allocate a new continuation inode in chunk C. The rule is that only inodes inside a chunk can point to blocks inside the chunk, so you need an inode in C if you want to allocate blocks from C. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Fri, Apr 27, 2007 at 12:53:34PM +0200, J??rn Engel wrote: > > All this would get easier if continuation inodes were known to be rare. > You can ditch the doubly-linked list in favor of a pointer to the main > inode then - traversing the list again is cheap, after all. And you can > just try to read the same block once for every continuation inode. > > If those lists can get long and you need a mapping from offset to > continuation inode on the medium, you are basically fscked. Storing the > mapping requires space. You need the mapping only when space (in some > chunk) gets tight and you allocate continuation inodes. So either you > don't need the mapping or you don't have a good place to put it. Any mapping structure will have to be pre-allocated. > Having a mapping in memory is also questionable. Either you scan the > whole file on first access and spend a long time for large files. Or > you create the mapping on the fly. In that case the page cache will > already give you a 90% solution for free. So in my secret heart of hearts, I do indeed hope that cnodes are rare enough that we don't actually have to do anything smart to make them go fast. Either having no fast lookup structure or creating it in memory as needed would be the nicest solution. However, since I can't guarantee this will be the case, it's nice to have some idea of what we'll do if this does become important. > You should spend a lot of effort trying to minimize cnodes. ;) Yep. It's much better to optimize away most cnodes instead of trying to make the go fast. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Fri, Apr 27, 2007 at 12:53:34PM +0200, J??rn Engel wrote: All this would get easier if continuation inodes were known to be rare. You can ditch the doubly-linked list in favor of a pointer to the main inode then - traversing the list again is cheap, after all. And you can just try to read the same block once for every continuation inode. If those lists can get long and you need a mapping from offset to continuation inode on the medium, you are basically fscked. Storing the mapping requires space. You need the mapping only when space (in some chunk) gets tight and you allocate continuation inodes. So either you don't need the mapping or you don't have a good place to put it. Any mapping structure will have to be pre-allocated. Having a mapping in memory is also questionable. Either you scan the whole file on first access and spend a long time for large files. Or you create the mapping on the fly. In that case the page cache will already give you a 90% solution for free. So in my secret heart of hearts, I do indeed hope that cnodes are rare enough that we don't actually have to do anything smart to make them go fast. Either having no fast lookup structure or creating it in memory as needed would be the nicest solution. However, since I can't guarantee this will be the case, it's nice to have some idea of what we'll do if this does become important. You should spend a lot of effort trying to minimize cnodes. ;) Yep. It's much better to optimize away most cnodes instead of trying to make the go fast. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ZFS with Linux: An Open Plea
On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote: > > Does it matter that google's recent report on disk failures indicated > that SMART never predicted anything useful as far as they could tell? > Certainly none of my drive failures ever had SMART make any kind of > indication that anything was wrong. I saw that talk, and that's not what I got out of it. They found that SMART error reports _did_ correlate with drive failure. See page 8 of: http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf (If you're not a USENIX member, you may be able to find a free download copy elsewhere.) However, they found that the correlation was not strong enough to make it economically feasible to replace disks reporting SMART failures, since something like 70% of disks were still working a year after the first failure report. Also, they found that some disks failed without any SMART error reports. Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of data, so for them, "economically feasible" means something different than for my personal laptop hard drive. I have twice had my laptop hard drive start spitting SMART errors and then die within a week. It is economically quite sensible for me to replace my laptop drive once it has an error, since I don't carry around 3 laptops everywhere I go. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote: > Do I get it right that you just have in each cnode a pointer to the > previous & next cnode? But then if two consecutive cnodes get corrupted, > you have no way to connect the chain, do you? If each cnode contained > some unique identifier of the file and a number identifying position of > cnode, then there would be at least some way (through expensive) to > link them together correctly... You're right, it's easy to add a little more redundancy that would make it possible to recover from two consecutive nodes being corrupted. Keeping a parent inode id in each continuation inode is definitely a smart thing to do. Some minor side notes: Continuation inodes aren't really in any defined order - if you look at Jeff's ping-pong chunk allocation example, you'll see that the data in each continuation inode won't be in linearly increasing order. Also, while the current implementation is a simple doubly-linked list, this may not be the best solution long-term. What's important is that each continuation inode have a back pointer to the parent and that there is some structure for quickly looking up the continuation inode for a given file offset. Suggestions for data structures that work well in this situation are welcome. :) -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote: > > No, I'm referring to a different file. The scenario is that you have > a growing file in a nearly full disk with files being deleted (and > thus space being freed) such that allocations for the growing file > bounce back and forth between chunks. This is an excellent question. I call this the ping-pong problem. The solution is as Amit describes: You have a maximum of one continuation inode per file per chunk, and you require sparse files. Here's an example, spelled out: Allocate file 1 in chunk A. Grow file 1. Chunk A fills up. Allocate continuation inode for file 1 in chunk B. Chunk A gets some free space. Chunk B fills up. Pick chunk A for allocating next block of file 1. Try to look up a continuation inode for file 1 in chunk A. Continuation inode for file 1 found in chunk A! Attach newly allocated block to existing inode for file 1 in chunk A. This is why the file format inside each chunk needs to support sparse files. I have a presentation that has a series of slides on problems and potential resolutions that might help: http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 12:05:04PM -0400, Jeff Dike wrote: No, I'm referring to a different file. The scenario is that you have a growing file in a nearly full disk with files being deleted (and thus space being freed) such that allocations for the growing file bounce back and forth between chunks. This is an excellent question. I call this the ping-pong problem. The solution is as Amit describes: You have a maximum of one continuation inode per file per chunk, and you require sparse files. Here's an example, spelled out: Allocate file 1 in chunk A. Grow file 1. Chunk A fills up. Allocate continuation inode for file 1 in chunk B. Chunk A gets some free space. Chunk B fills up. Pick chunk A for allocating next block of file 1. Try to look up a continuation inode for file 1 in chunk A. Continuation inode for file 1 found in chunk A! Attach newly allocated block to existing inode for file 1 in chunk A. This is why the file format inside each chunk needs to support sparse files. I have a presentation that has a series of slides on problems and potential resolutions that might help: http://infohost.nmt.edu/~val/review/chunkfs_presentation.pdf -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Thu, Apr 26, 2007 at 10:47:38AM +0200, Jan Kara wrote: Do I get it right that you just have in each cnode a pointer to the previous next cnode? But then if two consecutive cnodes get corrupted, you have no way to connect the chain, do you? If each cnode contained some unique identifier of the file and a number identifying position of cnode, then there would be at least some way (through expensive) to link them together correctly... You're right, it's easy to add a little more redundancy that would make it possible to recover from two consecutive nodes being corrupted. Keeping a parent inode id in each continuation inode is definitely a smart thing to do. Some minor side notes: Continuation inodes aren't really in any defined order - if you look at Jeff's ping-pong chunk allocation example, you'll see that the data in each continuation inode won't be in linearly increasing order. Also, while the current implementation is a simple doubly-linked list, this may not be the best solution long-term. What's important is that each continuation inode have a back pointer to the parent and that there is some structure for quickly looking up the continuation inode for a given file offset. Suggestions for data structures that work well in this situation are welcome. :) -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ZFS with Linux: An Open Plea
On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote: Does it matter that google's recent report on disk failures indicated that SMART never predicted anything useful as far as they could tell? Certainly none of my drive failures ever had SMART make any kind of indication that anything was wrong. I saw that talk, and that's not what I got out of it. They found that SMART error reports _did_ correlate with drive failure. See page 8 of: http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf (If you're not a USENIX member, you may be able to find a free download copy elsewhere.) However, they found that the correlation was not strong enough to make it economically feasible to replace disks reporting SMART failures, since something like 70% of disks were still working a year after the first failure report. Also, they found that some disks failed without any SMART error reports. Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of data, so for them, economically feasible means something different than for my personal laptop hard drive. I have twice had my laptop hard drive start spitting SMART errors and then die within a week. It is economically quite sensible for me to replace my laptop drive once it has an error, since I don't carry around 3 laptops everywhere I go. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 05:38:34AM -0600, Andreas Dilger wrote: > > The case where only a fsck of the corrupt chunk is done would not find the > cnode references. Maybe there needs to be per-chunk info which contains > a list/bitmap of other chunks that have cnodes shared with each chunk? Yes, exactly. One might almost think you had solved this problem before. :):):) -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote: > On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote: > > > > The structure looks like this: > > > > -- -- > > | cnode 0 |-->| cnode 0 |--> to another cnode or NULL > > -- -- > > | cnode 1 |- | cnode 1 |- > > -- | -- | > > | cnode 2 |-- | | cnode 2 |-- | > > -- | |-- | | > > | cnode 3 | | | | cnode 3 | | | > > -- | |-- | | > > | | || | | > > > >inodes inodes or NULL > > How do you recover if fsfuzzer takes out a cnode in the chain? The > chunk is marked clean, but clearly corrupted and needs fixing and > you don't know what it was pointing at. Hence you have a pointer to > a trashed cnode *somewhere* that you need to find and fix, and a > bunch of orphaned cnodes that nobody points to *somewhere else* in > the filesystem that you have to find. That's a full scan fsck case, > isn't? Excellent question. This is one of the trickier aspects of chunkfs - the orphan inode problem (tricky, but solvable). The problem is what if you smash/lose/corrupt an inode in one chunk that has a continuation inode in another chunk? A back pointer does you no good if the back pointer is corrupted. What you do is keep tabs on whether you see damage that looks like this has occurred - e.g., inode use/free counts wrong, you had to zero a corrupted inode - and when this happens, you do a scan of all continuation inodes in chunks that have links to the corrupted chunk. What you need to make this go fast is (1) a pre-made list of which chunks have links with which other chunks, (2) a fast way to read all of the continuation inodes in a chunk (ignoring chunk-local inodes). This stage is O(fs size) approximately, but it should be quite swift. > It seems that any sort of damage to the underlying storage (e.g. > media error, I/O error or user brain explosion) results in the need > to do a full fsck and hence chunkfs gives you no benefit in this > case. I worry about this but so far haven't found something which couldn't be cut down significantly with just a little extra work. It might be helpful to look at an extreme case. Let's say we're incredibly paranoid. We could be justified in running a full fsck on the entire file system in between every single I/O. After all, something *might* have been silently corrupted. But this would be ridiculously slow. We could instead never check the file system. But then we would end up panicking and corrupting the file system a lot. So what's a good compromise? In the chunkfs case, here's my rules of thumb so far: 1. Detection: All metadata has magic numbers and checksums. 2. Scrubbing: Random check of chunks when possible. 3. Repair: When we detect corruption, either by checksum error, file system code assertion failure, or hardware tells us we have a bug, check the chunk containing the error and any outside-chunk information that could be affected by it. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 03:34:03PM +0400, Nikita Danilov wrote: > > What is more important, design puts (as far as I can see) no upper limit > on the number of continuation inodes, and hence, even if _average_ fsck > time is greatly reduced, occasionally it can take more time than ext2 of > the same size. This is clearly unacceptable in many situations (HA, > etc.). Actually, there is an upper limit on the number of continuation inodes. Each file can have a maximum of one continuation inode per chunk. (This is why we need to support sparse files.) -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Tue, Apr 24, 2007 at 11:34:48PM +0400, Nikita Danilov wrote: > > Maybe I failed to describe the problem presicely. > > Suppose that all chunks have been checked. After that, for every inode > I0 having continuations I1, I2, ... In, one has to check that every > logical block is presented in at most one of these inodes. For this one > has to read I0, with all its indirect (double-indirect, triple-indirect) > blocks, then read I1 with all its indirect blocks, etc. And to repeat > this for every inode with continuations. > > In the worst case (every inode has a continuation in every chunk) this > obviously is as bad as un-chunked fsck. But even in the average case, > total amount of io necessary for this operation is proportional to the > _total_ file system size, rather than to the chunk size. Fsck in chunkfs is still going to have an element that is proportional to the file system size for certain cases. However, that element will be a great deal smaller than in a regular file system, except in the most pathological cases. If those pathological cases happen often, then it's back to the drawing board. My hunch is that they won't be common. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Tue, Apr 24, 2007 at 11:34:48PM +0400, Nikita Danilov wrote: Maybe I failed to describe the problem presicely. Suppose that all chunks have been checked. After that, for every inode I0 having continuations I1, I2, ... In, one has to check that every logical block is presented in at most one of these inodes. For this one has to read I0, with all its indirect (double-indirect, triple-indirect) blocks, then read I1 with all its indirect blocks, etc. And to repeat this for every inode with continuations. In the worst case (every inode has a continuation in every chunk) this obviously is as bad as un-chunked fsck. But even in the average case, total amount of io necessary for this operation is proportional to the _total_ file system size, rather than to the chunk size. Fsck in chunkfs is still going to have an element that is proportional to the file system size for certain cases. However, that element will be a great deal smaller than in a regular file system, except in the most pathological cases. If those pathological cases happen often, then it's back to the drawing board. My hunch is that they won't be common. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 03:34:03PM +0400, Nikita Danilov wrote: What is more important, design puts (as far as I can see) no upper limit on the number of continuation inodes, and hence, even if _average_ fsck time is greatly reduced, occasionally it can take more time than ext2 of the same size. This is clearly unacceptable in many situations (HA, etc.). Actually, there is an upper limit on the number of continuation inodes. Each file can have a maximum of one continuation inode per chunk. (This is why we need to support sparse files.) -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 08:54:34PM +1000, David Chinner wrote: On Tue, Apr 24, 2007 at 04:53:11PM -0500, Amit Gud wrote: The structure looks like this: -- -- | cnode 0 |--| cnode 0 |-- to another cnode or NULL -- -- | cnode 1 |- | cnode 1 |- -- | -- | | cnode 2 |-- | | cnode 2 |-- | -- | |-- | | | cnode 3 | | | | cnode 3 | | | -- | |-- | | | | || | | inodes inodes or NULL How do you recover if fsfuzzer takes out a cnode in the chain? The chunk is marked clean, but clearly corrupted and needs fixing and you don't know what it was pointing at. Hence you have a pointer to a trashed cnode *somewhere* that you need to find and fix, and a bunch of orphaned cnodes that nobody points to *somewhere else* in the filesystem that you have to find. That's a full scan fsck case, isn't? Excellent question. This is one of the trickier aspects of chunkfs - the orphan inode problem (tricky, but solvable). The problem is what if you smash/lose/corrupt an inode in one chunk that has a continuation inode in another chunk? A back pointer does you no good if the back pointer is corrupted. What you do is keep tabs on whether you see damage that looks like this has occurred - e.g., inode use/free counts wrong, you had to zero a corrupted inode - and when this happens, you do a scan of all continuation inodes in chunks that have links to the corrupted chunk. What you need to make this go fast is (1) a pre-made list of which chunks have links with which other chunks, (2) a fast way to read all of the continuation inodes in a chunk (ignoring chunk-local inodes). This stage is O(fs size) approximately, but it should be quite swift. It seems that any sort of damage to the underlying storage (e.g. media error, I/O error or user brain explosion) results in the need to do a full fsck and hence chunkfs gives you no benefit in this case. I worry about this but so far haven't found something which couldn't be cut down significantly with just a little extra work. It might be helpful to look at an extreme case. Let's say we're incredibly paranoid. We could be justified in running a full fsck on the entire file system in between every single I/O. After all, something *might* have been silently corrupted. But this would be ridiculously slow. We could instead never check the file system. But then we would end up panicking and corrupting the file system a lot. So what's a good compromise? In the chunkfs case, here's my rules of thumb so far: 1. Detection: All metadata has magic numbers and checksums. 2. Scrubbing: Random check of chunks when possible. 3. Repair: When we detect corruption, either by checksum error, file system code assertion failure, or hardware tells us we have a bug, check the chunk containing the error and any outside-chunk information that could be affected by it. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] ChunkFS: fs fission for faster fsck
On Wed, Apr 25, 2007 at 05:38:34AM -0600, Andreas Dilger wrote: The case where only a fsck of the corrupt chunk is done would not find the cnode references. Maybe there needs to be per-chunk info which contains a list/bitmap of other chunks that have cnodes shared with each chunk? Yes, exactly. One might almost think you had solved this problem before. :):):) -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)
On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote: > On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote: > > > IMHO chunkfs could provide a much more promising approach. > > Agreed, that's one method of compartmentalising the problem. Agreed, the chunkfs design is only one way to implement repair-driven file system design - designing your file system to make file system check and repair fast and easy. I've written a paper on this idea, which includes some interesting projections estimating that fsck will take 10 times as long on the 2013 equivalent of a 2006 file system, due entirely to changes in disk hardware. So if your server currently takes 2 hours to fsck, an equivalent server in 2013 will take about 20 hours. Eek! Paper here: http://infohost.nmt.edu/~val/review/repair.pdf While I'm working on chunkfs, I also think that all file systems should strive for repair-driven design. XFS has already made big strides in this area (multi-threading fsck for multi-disk file systems, for example) and I'm excited to see what comes next. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Repair-driven file system design (was Re: ZFS with Linux: An Open Plea)
On Mon, Apr 16, 2007 at 01:07:05PM +1000, David Chinner wrote: On Sun, Apr 15, 2007 at 08:50:25PM -0400, Rik van Riel wrote: IMHO chunkfs could provide a much more promising approach. Agreed, that's one method of compartmentalising the problem. Agreed, the chunkfs design is only one way to implement repair-driven file system design - designing your file system to make file system check and repair fast and easy. I've written a paper on this idea, which includes some interesting projections estimating that fsck will take 10 times as long on the 2013 equivalent of a 2006 file system, due entirely to changes in disk hardware. So if your server currently takes 2 hours to fsck, an equivalent server in 2013 will take about 20 hours. Eek! Paper here: http://infohost.nmt.edu/~val/review/repair.pdf While I'm working on chunkfs, I also think that all file systems should strive for repair-driven design. XFS has already made big strides in this area (multi-threading fsck for multi-disk file systems, for example) and I'm excited to see what comes next. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/4] [TULIP] fix for Lite-On 82c168 PNIC
From: Guido Classen <[EMAIL PROTECTED]> This small patch fixes two issues with the Lite-On 82c168 PNIC adapters. I've tested it with two cards in different machines both chip rev 17 The first is the wrong register address CSR6 for writing the MII register which instead is 0xB8 (this may get a symbol too?) (see similar exisiting code at line 437) in tulip_core.c [Double-checked by Val Henson; yes, 0xB8 is correct register for autonegotiate on this card.] At least by my cards, the the bit 31 from the MII register seems to be somewhat unstable. This results in reading wrong values from the Phy-Registers und prevents the card from correct initialization. I've added a litte delay and an second test of the bit. If the bit is stil cleared the read/write process has definitely finished. [Original patch slightly massaged by Val Henson] Signed-off-by: Val Henson <[EMAIL PROTECTED]> Cc: Guido Classen <[EMAIL PROTECTED]> Signed-off-by: Grant Grundler <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- drivers/net/tulip/media.c | 31 +++ drivers/net/tulip/tulip_core.c |4 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c @@ -1701,8 +1701,8 @@ static int __devinit tulip_init_one (str tp->nwayset = 0; iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6); iowrite32(0x30, ioaddr + CSR12); - iowrite32(0x0001F078, ioaddr + CSR6); - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on autonegotiation. */ + iowrite32(0x0001F078, ioaddr + 0xB8); + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on autonegotiation. */ } break; case MX98713: --- tulip-2.6-mm-linux.orig/drivers/net/tulip/media.c +++ tulip-2.6-mm-linux/drivers/net/tulip/media.c @@ -76,8 +76,20 @@ int tulip_mdio_read(struct net_device *d ioread32(ioaddr + 0xA0); while (--i > 0) { barrier(); - if ( ! ((retval = ioread32(ioaddr + 0xA0)) & 0x8000)) - break; + if ( ! ((retval = ioread32(ioaddr + 0xA0)) +& 0x8000)) { + /* +* Possible bug in 82c168 rev 17 - + * sometimes bit 31 is unstable and + * clears before actually finished. + * Delay and check if bit 31 is still + * cleared before believing it. +*/ +udelay(10); +if ( ! ((retval = ioread32(ioaddr + 0xA0)) +& 0x8000)) +break; +} } spin_unlock_irqrestore(>mii_lock, flags); return retval & 0x; @@ -136,8 +148,19 @@ void tulip_mdio_write(struct net_device iowrite32(cmd, ioaddr + 0xA0); do { barrier(); - if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) - break; + if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) { + /* +* Possible bug in 82c168 rev 17 - + * sometimes bit 31 is unstable and + * clears before actually finished. + * Delay and check if bit 31 is still + * cleared before believing it. +*/ +udelay(10); +if ( ! (ioread32(ioaddr + 0xA0) +& 0x8000)) +break; +} } while (--i > 0); spin_unlock_irqrestore(>mii_lock, flags); return; -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/4] [TULIP] Rev tulip version
Rev tulip version... things have changed since 2002! Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- drivers/net/tulip/tulip_core.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c @@ -17,11 +17,11 @@ #define DRV_NAME "tulip" #ifdef CONFIG_TULIP_NAPI -#define DRV_VERSION"1.1.14-NAPI" /* Keep at least for test */ +#define DRV_VERSION"1.1.15-NAPI" /* Keep at least for test */ #else -#define DRV_VERSION"1.1.14" +#define DRV_VERSION"1.1.15" #endif -#define DRV_RELDATE"May 11, 2002" +#define DRV_RELDATE"Feb 27, 2007" #include -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/4] [TULIP] Tulip updates
This patch set includes a fix for Lite-on from Guido Classen, some minor debugging/typo fixes, and a long-need rev to the version (the last time this was done was 2002!). -VAL -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/4] [TULIP] Quiet down tulip_stop_rxtx
Only print out debugging info for tulip_stop_rxtx if debug is on. Many cards (including at least two of my own) fail to stop properly during initialization according to this test with no apparent ill effects. Worse, it tends to spam logs when the driver doesn't work. Signed-off-by: Val Henson <[EMAIL PROTECTED]> Signed-off-by: Grant Grundler <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- drivers/net/tulip/tulip.h |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h @@ -481,7 +481,7 @@ static inline void tulip_stop_rxtx(struc while (--i && (ioread32(ioaddr + CSR5) & (CSR5_TS|CSR5_RS))) udelay(10); - if (!i) + if (!i && (tulip_debug > 1)) printk(KERN_DEBUG "%s: tulip_stop_rxtx() failed" " (CSR5 0x%x CSR6 0x%x)\n", pci_name(tp->pdev), -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/4] [TULIP] Fix SytemError typo
Fix an annoying typo - SytemError -> SystemError Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> Cc: Jeff Garzik <[EMAIL PROTECTED]> --- drivers/net/tulip/interrupt.c |4 ++-- drivers/net/tulip/tulip.h |2 +- drivers/net/tulip/winbond-840.c |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/interrupt.c +++ tulip-2.6-mm-linux/drivers/net/tulip/interrupt.c @@ -675,7 +675,7 @@ irqreturn_t tulip_interrupt(int irq, voi if (tp->link_change) (tp->link_change)(dev, csr5); } - if (csr5 & SytemError) { + if (csr5 & SystemError) { int error = (csr5 >> 23) & 7; /* oops, we hit a PCI error. The code produced corresponds * to the reason: @@ -745,7 +745,7 @@ irqreturn_t tulip_interrupt(int irq, voi TxFIFOUnderflow | TxJabber | TPLnkFail | - SytemError )) != 0); + SystemError )) != 0); #else } while ((csr5 & (NormalIntr|AbnormalIntr)) != 0); --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h @@ -132,7 +132,7 @@ enum pci_cfg_driver_reg { /* The bits in the CSR5 status registers, mostly interrupt sources. */ enum status_bits { TimerInt = 0x800, - SytemError = 0x2000, + SystemError = 0x2000, TPLnkFail = 0x1000, TPLnkPass = 0x10, NormalIntr = 0x1, --- tulip-2.6-mm-linux.orig/drivers/net/tulip/winbond-840.c +++ tulip-2.6-mm-linux/drivers/net/tulip/winbond-840.c @@ -1148,7 +1148,7 @@ static irqreturn_t intr_handler(int irq, } /* Abnormal error summary/uncommon events handlers. */ - if (intr_status & (AbnormalIntr | TxFIFOUnderflow | SytemError | + if (intr_status & (AbnormalIntr | TxFIFOUnderflow | SystemError | TimerInt | TxDied)) netdev_error(dev, intr_status); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 0/4] [TULIP] Tulip updates
This patch set includes a fix for Lite-on from Guido Classen, some minor debugging/typo fixes, and a long-need rev to the version (the last time this was done was 2002!). -VAL -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 2/4] [TULIP] Quiet down tulip_stop_rxtx
Only print out debugging info for tulip_stop_rxtx if debug is on. Many cards (including at least two of my own) fail to stop properly during initialization according to this test with no apparent ill effects. Worse, it tends to spam logs when the driver doesn't work. Signed-off-by: Val Henson [EMAIL PROTECTED] Signed-off-by: Grant Grundler [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] --- drivers/net/tulip/tulip.h |2 +- 1 files changed, 1 insertion(+), 1 deletion(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h @@ -481,7 +481,7 @@ static inline void tulip_stop_rxtx(struc while (--i (ioread32(ioaddr + CSR5) (CSR5_TS|CSR5_RS))) udelay(10); - if (!i) + if (!i (tulip_debug 1)) printk(KERN_DEBUG %s: tulip_stop_rxtx() failed (CSR5 0x%x CSR6 0x%x)\n, pci_name(tp-pdev), -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 3/4] [TULIP] Fix SytemError typo
Fix an annoying typo - SytemError - SystemError Signed-off-by: Valerie Henson [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] --- drivers/net/tulip/interrupt.c |4 ++-- drivers/net/tulip/tulip.h |2 +- drivers/net/tulip/winbond-840.c |2 +- 3 files changed, 4 insertions(+), 4 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/interrupt.c +++ tulip-2.6-mm-linux/drivers/net/tulip/interrupt.c @@ -675,7 +675,7 @@ irqreturn_t tulip_interrupt(int irq, voi if (tp-link_change) (tp-link_change)(dev, csr5); } - if (csr5 SytemError) { + if (csr5 SystemError) { int error = (csr5 23) 7; /* oops, we hit a PCI error. The code produced corresponds * to the reason: @@ -745,7 +745,7 @@ irqreturn_t tulip_interrupt(int irq, voi TxFIFOUnderflow | TxJabber | TPLnkFail | - SytemError )) != 0); + SystemError )) != 0); #else } while ((csr5 (NormalIntr|AbnormalIntr)) != 0); --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip.h +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip.h @@ -132,7 +132,7 @@ enum pci_cfg_driver_reg { /* The bits in the CSR5 status registers, mostly interrupt sources. */ enum status_bits { TimerInt = 0x800, - SytemError = 0x2000, + SystemError = 0x2000, TPLnkFail = 0x1000, TPLnkPass = 0x10, NormalIntr = 0x1, --- tulip-2.6-mm-linux.orig/drivers/net/tulip/winbond-840.c +++ tulip-2.6-mm-linux/drivers/net/tulip/winbond-840.c @@ -1148,7 +1148,7 @@ static irqreturn_t intr_handler(int irq, } /* Abnormal error summary/uncommon events handlers. */ - if (intr_status (AbnormalIntr | TxFIFOUnderflow | SytemError | + if (intr_status (AbnormalIntr | TxFIFOUnderflow | SystemError | TimerInt | TxDied)) netdev_error(dev, intr_status); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 4/4] [TULIP] Rev tulip version
Rev tulip version... things have changed since 2002! Signed-off-by: Valerie Henson [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] --- drivers/net/tulip/tulip_core.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c @@ -17,11 +17,11 @@ #define DRV_NAME tulip #ifdef CONFIG_TULIP_NAPI -#define DRV_VERSION1.1.14-NAPI /* Keep at least for test */ +#define DRV_VERSION1.1.15-NAPI /* Keep at least for test */ #else -#define DRV_VERSION1.1.14 +#define DRV_VERSION1.1.15 #endif -#define DRV_RELDATEMay 11, 2002 +#define DRV_RELDATEFeb 27, 2007 #include linux/module.h -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch 1/4] [TULIP] fix for Lite-On 82c168 PNIC
From: Guido Classen [EMAIL PROTECTED] This small patch fixes two issues with the Lite-On 82c168 PNIC adapters. I've tested it with two cards in different machines both chip rev 17 The first is the wrong register address CSR6 for writing the MII register which instead is 0xB8 (this may get a symbol too?) (see similar exisiting code at line 437) in tulip_core.c [Double-checked by Val Henson; yes, 0xB8 is correct register for autonegotiate on this card.] At least by my cards, the the bit 31 from the MII register seems to be somewhat unstable. This results in reading wrong values from the Phy-Registers und prevents the card from correct initialization. I've added a litte delay and an second test of the bit. If the bit is stil cleared the read/write process has definitely finished. [Original patch slightly massaged by Val Henson] Signed-off-by: Val Henson [EMAIL PROTECTED] Cc: Guido Classen [EMAIL PROTECTED] Signed-off-by: Grant Grundler [EMAIL PROTECTED] Cc: Jeff Garzik [EMAIL PROTECTED] --- drivers/net/tulip/media.c | 31 +++ drivers/net/tulip/tulip_core.c |4 ++-- 2 files changed, 29 insertions(+), 6 deletions(-) --- tulip-2.6-mm-linux.orig/drivers/net/tulip/tulip_core.c +++ tulip-2.6-mm-linux/drivers/net/tulip/tulip_core.c @@ -1701,8 +1701,8 @@ static int __devinit tulip_init_one (str tp-nwayset = 0; iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6); iowrite32(0x30, ioaddr + CSR12); - iowrite32(0x0001F078, ioaddr + CSR6); - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on autonegotiation. */ + iowrite32(0x0001F078, ioaddr + 0xB8); + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on autonegotiation. */ } break; case MX98713: --- tulip-2.6-mm-linux.orig/drivers/net/tulip/media.c +++ tulip-2.6-mm-linux/drivers/net/tulip/media.c @@ -76,8 +76,20 @@ int tulip_mdio_read(struct net_device *d ioread32(ioaddr + 0xA0); while (--i 0) { barrier(); - if ( ! ((retval = ioread32(ioaddr + 0xA0)) 0x8000)) - break; + if ( ! ((retval = ioread32(ioaddr + 0xA0)) + 0x8000)) { + /* +* Possible bug in 82c168 rev 17 - + * sometimes bit 31 is unstable and + * clears before actually finished. + * Delay and check if bit 31 is still + * cleared before believing it. +*/ +udelay(10); +if ( ! ((retval = ioread32(ioaddr + 0xA0)) + 0x8000)) +break; +} } spin_unlock_irqrestore(tp-mii_lock, flags); return retval 0x; @@ -136,8 +148,19 @@ void tulip_mdio_write(struct net_device iowrite32(cmd, ioaddr + 0xA0); do { barrier(); - if ( ! (ioread32(ioaddr + 0xA0) 0x8000)) - break; + if ( ! (ioread32(ioaddr + 0xA0) 0x8000)) { + /* +* Possible bug in 82c168 rev 17 - + * sometimes bit 31 is unstable and + * clears before actually finished. + * Delay and check if bit 31 is still + * cleared before believing it. +*/ +udelay(10); +if ( ! (ioread32(ioaddr + 0xA0) + 0x8000)) +break; +} } while (--i 0); spin_unlock_irqrestore(tp-mii_lock, flags); return; -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Mon, Feb 12, 2007 at 09:53:18PM +0200, Petri Kaukasoina wrote: > On Mon, Feb 12, 2007 at 06:49:39PM +0100, Jan Engelhardt wrote: > > >The one problem with noatime is that mutt's 'new mail arrived' breaks > > > > Just why does not it use mtime then to check for New Mail Arrived, like > > I have always used: > > --enable-buffy-sizeUse file size attribute instead of access time > > Support was there at least in 1998, maybe before. Good point. However, this works for mutt because new mail is an append-only operation. Other apps don't have the guarantee that file modifications that they care about will change the file size. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Mon, Feb 12, 2007 at 10:40:10AM -0500, Dave Jones wrote: > > The one problem with noatime is that mutt's 'new mail arrived' breaks > as you mentioned in the relatime changelog, so I'm surprised that > they turned it on by default. With relatime fixing that however, > I'm also unaware of anything that breaks. I'd be curious to > do a Fedora test release with relatime, but I know the answer I'll > get when I recommend we add it to our generated fstabs.. > > "If it's good enough, why isn't it the kernel default" > > Hence my current line of questioning ;-) Okay, I have to admit I used the normal atime semantics, exactly once. Someone hacked my laptop about 4 years ago (back when I didn't have a firewall and a remotely exploitable samba server was on by default in some Red Hat install). I pulled the plug on the network (no wireless either) and figured out which files the attacker read, which gave me some peace of mind. :) Personally, I'd trade that for the performance/battery life/etc. of relatime. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Mon, Feb 12, 2007 at 10:40:10AM -0500, Dave Jones wrote: The one problem with noatime is that mutt's 'new mail arrived' breaks as you mentioned in the relatime changelog, so I'm surprised that they turned it on by default. With relatime fixing that however, I'm also unaware of anything that breaks. I'd be curious to do a Fedora test release with relatime, but I know the answer I'll get when I recommend we add it to our generated fstabs.. If it's good enough, why isn't it the kernel default Hence my current line of questioning ;-) Okay, I have to admit I used the normal atime semantics, exactly once. Someone hacked my laptop about 4 years ago (back when I didn't have a firewall and a remotely exploitable samba server was on by default in some Red Hat install). I pulled the plug on the network (no wireless either) and figured out which files the attacker read, which gave me some peace of mind. :) Personally, I'd trade that for the performance/battery life/etc. of relatime. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Mon, Feb 12, 2007 at 09:53:18PM +0200, Petri Kaukasoina wrote: On Mon, Feb 12, 2007 at 06:49:39PM +0100, Jan Engelhardt wrote: The one problem with noatime is that mutt's 'new mail arrived' breaks Just why does not it use mtime then to check for New Mail Arrived, like I have always used: --enable-buffy-sizeUse file size attribute instead of access time Support was there at least in 1998, maybe before. Good point. However, this works for mutt because new mail is an append-only operation. Other apps don't have the guarantee that file modifications that they care about will change the file size. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Sat, Feb 10, 2007 at 07:54:00PM -0500, Dave Jones wrote: > > Whilst on the subject of RELATIME, is there any good reason why > not to make this a default mount option ? Ubuntu has been shipping with noatime as the default for some time now, with no obvious problems (I'm running Ubuntu). I see relatime as an improvement on noatime. -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Sat, Feb 10, 2007 at 09:56:07AM -0800, Michael Kerrisk wrote: > Val, > > I'm just updating the mount(2) man page for MS_RELATIME, and this is the > text I've come up with: > >MS_RELATIME(Since Linux 2.6.20) > When a file on this file system is accessed, only > update the file's last accessed time (atime) if > the current value of atime is less than or equal > to the file's last modified (mtime) or last sta- > tus change time (ctime). This option is useful > for programs, such as mutt(1), that need to know > when a file has been read since it was last modi- > fied. > > This text is based on your comments accompanying the various patches, but > it differs in a respect. Your comments said that the atime would only be > updated if the atime is older than mtime/ctime. However, what the code > actually does is update atime if it is is <= mtime/ctime -- i.e., atime is > older than or *or equal to* mtime/ctime. > > I'm sure that the code implements your intention, but before incorporating > the above text I thought I just better check, since the code differs from > your comment. Can you just confirm that the proposed man page text is okay. That's correct, yes. Thanks! -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Sat, Feb 10, 2007 at 09:56:07AM -0800, Michael Kerrisk wrote: Val, I'm just updating the mount(2) man page for MS_RELATIME, and this is the text I've come up with: MS_RELATIME(Since Linux 2.6.20) When a file on this file system is accessed, only update the file's last accessed time (atime) if the current value of atime is less than or equal to the file's last modified (mtime) or last sta- tus change time (ctime). This option is useful for programs, such as mutt(1), that need to know when a file has been read since it was last modi- fied. This text is based on your comments accompanying the various patches, but it differs in a respect. Your comments said that the atime would only be updated if the atime is older than mtime/ctime. However, what the code actually does is update atime if it is is = mtime/ctime -- i.e., atime is older than or *or equal to* mtime/ctime. I'm sure that the code implements your intention, but before incorporating the above text I thought I just better check, since the code differs from your comment. Can you just confirm that the proposed man page text is okay. That's correct, yes. Thanks! -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Documenting MS_RELATIME
On Sat, Feb 10, 2007 at 07:54:00PM -0500, Dave Jones wrote: Whilst on the subject of RELATIME, is there any good reason why not to make this a default mount option ? Ubuntu has been shipping with noatime as the default for some time now, with no obvious problems (I'm running Ubuntu). I see relatime as an improvement on noatime. -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote: > That's the easy part. How are we going to get mount(8) patched? Karel, interested in taking a look at the following patch? The kernel bits are in -mm currently. -VAL Add the "relatime" (relative atime) option support to mount. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Cc: Adrian Bunk <[EMAIL PROTECTED]> Cc: Al Viro <[EMAIL PROTECTED]> Cc: Karel Zak <[EMAIL PROTECTED]> Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> --- mount/mount.8 |7 +++ mount/mount.c |6 ++ mount/mount_constants.h |4 3 files changed, 17 insertions(+) --- util-linux-2.13-pre7.orig/mount/mount.8 +++ util-linux-2.13-pre7/mount/mount.8 @@ -586,6 +586,13 @@ access on the news spool to speed up new .B nodiratime Do not update directory inode access times on this filesystem. .TP +.B relatime +Update inode access times relative to modify or change time. Access +time is only updated if the previous access time was earlier than the +current modify or change time. (Similar to noatime, but doesn't break +mutt or other applications that need to know if a file has been read +since the last time it was modified.) +.TP .B noauto Can only be mounted explicitly (i.e., the .B \-a --- util-linux-2.13-pre7.orig/mount/mount.c +++ util-linux-2.13-pre7/mount/mount.c @@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = { "diratime",0, 1, MS_NODIRATIME }, /* Update dir access times */ { "nodiratime", 0, 0, MS_NODIRATIME },/* Do not update dir access times */ #endif +#ifdef MS_RELATIME + { "relatime", 0, 0, MS_RELATIME }, /* Update access times relative to + mtime/ctime */ + { "norelatime", 0, 1, MS_RELATIME }, /* Update access time without regard + to mtime/ctime */ +#endif { NULL, 0, 0, 0 } }; --- util-linux-2.13-pre7.orig/mount/mount_constants.h +++ util-linux-2.13-pre7/mount/mount_constants.h @@ -57,6 +57,10 @@ if we have a stack or plain mount - moun #ifndef MS_VERBOSE #define MS_VERBOSE 0x8000 /* 32768 */ #endif +#ifndef MS_RELATIME +#define MS_RELATIME 0x20 /* 20: Update access times relative + to mtime/ctime */ +#endif /* * Magic mount flag number. Had to be or-ed to the flag values. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote: That's the easy part. How are we going to get mount(8) patched? Karel, interested in taking a look at the following patch? The kernel bits are in -mm currently. -VAL Add the relatime (relative atime) option support to mount. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Cc: Adrian Bunk [EMAIL PROTECTED] Cc: Al Viro [EMAIL PROTECTED] Cc: Karel Zak [EMAIL PROTECTED] Signed-off-by: Valerie Henson [EMAIL PROTECTED] --- mount/mount.8 |7 +++ mount/mount.c |6 ++ mount/mount_constants.h |4 3 files changed, 17 insertions(+) --- util-linux-2.13-pre7.orig/mount/mount.8 +++ util-linux-2.13-pre7/mount/mount.8 @@ -586,6 +586,13 @@ access on the news spool to speed up new .B nodiratime Do not update directory inode access times on this filesystem. .TP +.B relatime +Update inode access times relative to modify or change time. Access +time is only updated if the previous access time was earlier than the +current modify or change time. (Similar to noatime, but doesn't break +mutt or other applications that need to know if a file has been read +since the last time it was modified.) +.TP .B noauto Can only be mounted explicitly (i.e., the .B \-a --- util-linux-2.13-pre7.orig/mount/mount.c +++ util-linux-2.13-pre7/mount/mount.c @@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = { diratime,0, 1, MS_NODIRATIME }, /* Update dir access times */ { nodiratime, 0, 0, MS_NODIRATIME },/* Do not update dir access times */ #endif +#ifdef MS_RELATIME + { relatime, 0, 0, MS_RELATIME }, /* Update access times relative to + mtime/ctime */ + { norelatime, 0, 1, MS_RELATIME }, /* Update access time without regard + to mtime/ctime */ +#endif { NULL, 0, 0, 0 } }; --- util-linux-2.13-pre7.orig/mount/mount_constants.h +++ util-linux-2.13-pre7/mount/mount_constants.h @@ -57,6 +57,10 @@ if we have a stack or plain mount - moun #ifndef MS_VERBOSE #define MS_VERBOSE 0x8000 /* 32768 */ #endif +#ifndef MS_RELATIME +#define MS_RELATIME 0x20 /* 20: Update access times relative + to mtime/ctime */ +#endif /* * Magic mount flag number. Had to be or-ed to the flag values. */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)
Hi there, Guido, Jeff resurrected this patch from the misty depths of the past. I double-checked the docs and the first bug fix is definitely correct. The second part isn't in the docs, but seems reasonable. Is this still the patch you are using? Any comments you want to add? -VAL > From: Guido Classen <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED], [EMAIL PROTECTED], > linux-net@vger.kernel.org, linux-kernel@vger.kernel.org > Date: Fri, 01 Apr 2005 22:21:44 +0200 > Subject: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11) > > Hi, > > this small patch fixes two issues with the Lite-On 82c168 PNIC adapters. > I've tested it with two cards in different machines both chip rev 17 > > The first is the wrong register address CSR6 for writing the MII register > which instead is 0xB8 (this may get a symbol too?) (see similar exisiting > code > at line 437) in tulip_core.c > > At least by my cards, the the bit 31 from the MII register seems to be > somewhat unstable. This results in reading wrong values from the > Phy-Registers > und prevents the card from correct initialization. I've added a litte delay > and an second test of the bit. If the bit is stil cleared the read/write > process has definitely finished. > > Cheers > Guido > > Signed-off-by: Guido Classen <[EMAIL PROTECTED]> > > diff -ru linux-2.6.11-org/drivers/net/tulip/tulip_core.c > linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c > --- linux-2.6.11-org/drivers/net/tulip/tulip_core.c 2005-04-01 > 22:10:03.0 +0200 > +++ linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c 2005-03-31 > 23:14:11.0 +0200 > @@ -1701,8 +1701,8 @@ > tp->nwayset = 0; > iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6); > iowrite32(0x30, ioaddr + CSR12); > - iowrite32(0x0001F078, ioaddr + CSR6); > - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on > autonegotiation. */ > + iowrite32(0x0001F078, ioaddr + 0xB8); > + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on > autonegotiation. */ > } > break; > case MX98713: > diff -ru linux-2.6.11-org/drivers/net/tulip/media.c > linux-2.6.11.2-pentium/drivers/net/tulip/media.c > --- linux-2.6.11-org/drivers/net/tulip/media.c2005-04-01 > 22:10:03.0 +0200 > +++ linux-2.6.11.2-pentium/drivers/net/tulip/media.c 2005-04-01 > 22:05:31.0 +0200 > @@ -74,8 +74,17 @@ > ioread32(ioaddr + 0xA0); > while (--i > 0) { > barrier(); > - if ( ! ((retval = ioread32(ioaddr + 0xA0)) & > 0x8000)) > - break; > + if ( ! ((retval = ioread32(ioaddr + 0xA0)) > +& 0x8000)) { > +/* bug in 82c168 rev 17? > + * wait a little while and check if > + * bit 31 is still cleared */ > +udelay(10); > +if ( ! ((retval = ioread32(ioaddr + 0xA0)) > +& 0x8000)) { > +break; > +} > +} > } > spin_unlock_irqrestore(>mii_lock, flags); > return retval & 0x; > @@ -153,8 +162,16 @@ > iowrite32(cmd, ioaddr + 0xA0); > do { > barrier(); > - if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) > - break; > + if ( ! (ioread32(ioaddr + 0xA0) & 0x8000)) { > +/* bug in 82c168 rev 17? > + * wait a little while and check if > + * bit 31 is still cleared */ > +udelay(10); > +if ( ! (ioread32(ioaddr + 0xA0) > +& 0x8000)) { > +break; > +} > +} > } while (--i > 0); > spin_unlock_irqrestore(>mii_lock, flags); > return; > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11)
Hi there, Guido, Jeff resurrected this patch from the misty depths of the past. I double-checked the docs and the first bug fix is definitely correct. The second part isn't in the docs, but seems reasonable. Is this still the patch you are using? Any comments you want to add? -VAL From: Guido Classen [EMAIL PROTECTED] To: [EMAIL PROTECTED], [EMAIL PROTECTED], linux-net@vger.kernel.org, linux-kernel@vger.kernel.org Date: Fri, 01 Apr 2005 22:21:44 +0200 Subject: [PATCH] drivers/net/tulip/: fix for Lite-On 82c168 PNIC (2.6.11) Hi, this small patch fixes two issues with the Lite-On 82c168 PNIC adapters. I've tested it with two cards in different machines both chip rev 17 The first is the wrong register address CSR6 for writing the MII register which instead is 0xB8 (this may get a symbol too?) (see similar exisiting code at line 437) in tulip_core.c At least by my cards, the the bit 31 from the MII register seems to be somewhat unstable. This results in reading wrong values from the Phy-Registers und prevents the card from correct initialization. I've added a litte delay and an second test of the bit. If the bit is stil cleared the read/write process has definitely finished. Cheers Guido Signed-off-by: Guido Classen [EMAIL PROTECTED] diff -ru linux-2.6.11-org/drivers/net/tulip/tulip_core.c linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c --- linux-2.6.11-org/drivers/net/tulip/tulip_core.c 2005-04-01 22:10:03.0 +0200 +++ linux-2.6.11.2-pentium/drivers/net/tulip/tulip_core.c 2005-03-31 23:14:11.0 +0200 @@ -1701,8 +1701,8 @@ tp-nwayset = 0; iowrite32(csr6_ttm | csr6_ca, ioaddr + CSR6); iowrite32(0x30, ioaddr + CSR12); - iowrite32(0x0001F078, ioaddr + CSR6); - iowrite32(0x0201F078, ioaddr + CSR6); /* Turn on autonegotiation. */ + iowrite32(0x0001F078, ioaddr + 0xB8); + iowrite32(0x0201F078, ioaddr + 0xB8); /* Turn on autonegotiation. */ } break; case MX98713: diff -ru linux-2.6.11-org/drivers/net/tulip/media.c linux-2.6.11.2-pentium/drivers/net/tulip/media.c --- linux-2.6.11-org/drivers/net/tulip/media.c2005-04-01 22:10:03.0 +0200 +++ linux-2.6.11.2-pentium/drivers/net/tulip/media.c 2005-04-01 22:05:31.0 +0200 @@ -74,8 +74,17 @@ ioread32(ioaddr + 0xA0); while (--i 0) { barrier(); - if ( ! ((retval = ioread32(ioaddr + 0xA0)) 0x8000)) - break; + if ( ! ((retval = ioread32(ioaddr + 0xA0)) + 0x8000)) { +/* bug in 82c168 rev 17? + * wait a little while and check if + * bit 31 is still cleared */ +udelay(10); +if ( ! ((retval = ioread32(ioaddr + 0xA0)) + 0x8000)) { +break; +} +} } spin_unlock_irqrestore(tp-mii_lock, flags); return retval 0x; @@ -153,8 +162,16 @@ iowrite32(cmd, ioaddr + 0xA0); do { barrier(); - if ( ! (ioread32(ioaddr + 0xA0) 0x8000)) - break; + if ( ! (ioread32(ioaddr + 0xA0) 0x8000)) { +/* bug in 82c168 rev 17? + * wait a little while and check if + * bit 31 is still cleared */ +udelay(10); +if ( ! (ioread32(ioaddr + 0xA0) + 0x8000)) { +break; +} +} } while (--i 0); spin_unlock_irqrestore(tp-mii_lock, flags); return; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote: > > On Mon, 4 Dec 2006 16:36:20 -0800 Valerie Henson <[EMAIL PROTECTED]> wrote: > > Add "relatime" (relative atime) support. Relative atime only updates > > the atime if the previous atime is older than the mtime or ctime. > > Like noatime, but useful for applications like mutt that need to know > > when a file has been read since it was last modified. > > That seems like a good idea. > > I found touch_atime() to be rather putrid, so I hacked it around a bit. The > end result: I like that rather better - add my: Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> > That's the easy part. How are we going to get mount(8) patched? Well, the nodiratime documentation got in. (I was going to add that as part of this apatch, but lo and behold.) -VAL - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Tue, Dec 05, 2006 at 08:58:02PM -0800, Andrew Morton wrote: On Mon, 4 Dec 2006 16:36:20 -0800 Valerie Henson [EMAIL PROTECTED] wrote: Add relatime (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. That seems like a good idea. I found touch_atime() to be rather putrid, so I hacked it around a bit. The end result: I like that rather better - add my: Signed-off-by: Valerie Henson [EMAIL PROTECTED] That's the easy part. How are we going to get mount(8) patched? Well, the nodiratime documentation got in. (I was going to add that as part of this apatch, but lo and behold.) -VAL - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote: > On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote: > > Hi Steve, > > > > On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote: > > > > In the future, I'd like to see a "relative atime" mode, which functions > > > > in the manner described by Valerie Henson at: > > > > > > > > http://lkml.org/lkml/2006/8/25/380 > > > > > > > I'd like to second that. [adding Val Henson to the "to"] What (if > > > anything) remains to be done before the relative atime patch is ready to > > > go upstream? I'm happy to help out here if required, > > Last time I looked at them, things seemed to be in pretty good shape - it > > wasn't a very large patch series. And the userland part. -VAL Add the "relatime" (relative atime) option support to mount. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> --- mount/mount.8 |7 +++ mount/mount.c |6 ++ mount/mount_constants.h |4 3 files changed, 17 insertions(+) --- util-linux-2.13-pre7.orig/mount/mount.8 +++ util-linux-2.13-pre7/mount/mount.8 @@ -586,6 +586,13 @@ access on the news spool to speed up new .B nodiratime Do not update directory inode access times on this filesystem. .TP +.B relatime +Update inode access times relative to modify or change time. Access +time is only updated if the previous access time was earlier than the +current modify or change time. (Similar to noatime, but doesn't break +mutt or other applications that need to know if a file has been read +since the last time it was modified.) +.TP .B noauto Can only be mounted explicitly (i.e., the .B \-a --- util-linux-2.13-pre7.orig/mount/mount.c +++ util-linux-2.13-pre7/mount/mount.c @@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = { "diratime",0, 1, MS_NODIRATIME }, /* Update dir access times */ { "nodiratime", 0, 0, MS_NODIRATIME },/* Do not update dir access times */ #endif +#ifdef MS_RELATIME + { "relatime", 0, 0, MS_RELATIME }, /* Update access times relative to + mtime/ctime */ + { "norelatime", 0, 1, MS_RELATIME }, /* Update access time without regard + to mtime/ctime */ +#endif { NULL, 0, 0, 0 } }; --- util-linux-2.13-pre7.orig/mount/mount_constants.h +++ util-linux-2.13-pre7/mount/mount_constants.h @@ -57,6 +57,10 @@ if we have a stack or plain mount - moun #ifndef MS_VERBOSE #define MS_VERBOSE 0x8000 /* 32768 */ #endif +#ifndef MS_RELATIME +#define MS_RELATIME 0x20 /* 20: Update access times relative + to mtime/ctime */ +#endif /* * Magic mount flag number. Had to be or-ed to the flag values. */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Relative atime (was Re: What's in ocfs2.git)
On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote: > Hi Steve, > > On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote: > > > In the future, I'd like to see a "relative atime" mode, which functions > > > in the manner described by Valerie Henson at: > > > > > > http://lkml.org/lkml/2006/8/25/380 > > > > > I'd like to second that. [adding Val Henson to the "to"] What (if > > anything) remains to be done before the relative atime patch is ready to > > go upstream? I'm happy to help out here if required, > Last time I looked at them, things seemed to be in pretty good shape - it > wasn't a very large patch series. Yep, the relative atime patch is tiny and pretty much done - just needs some soak time in -mm and a little more review (cc'd Viro and fsdevel). Kernel patch against 2.6.18-rc4 appended, patch to mount following. (Note that my web server suffered a RAID failure and my patches page is unavailable till the restore finishes.) -VAL Add "relatime" (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Signed-off-by: Valerie Henson <[EMAIL PROTECTED]> --- fs/inode.c| 11 ++- fs/namespace.c|5 - include/linux/fs.h|1 + include/linux/mount.h |1 + 4 files changed, 16 insertions(+), 2 deletions(-) --- linux-2.6.18-rc4-relatime.orig/fs/inode.c +++ linux-2.6.18-rc4-relatime/fs/inode.c @@ -1200,7 +1200,16 @@ void touch_atime(struct vfsmount *mnt, s return; now = current_fs_time(inode->i_sb); - if (!timespec_equal(>i_atime, )) { + if (timespec_equal(>i_atime, )) + return; + /* +* With relative atime, only update atime if the previous +* atime is earlier than either the ctime or mtime. +*/ + if (!mnt || + !(mnt->mnt_flags & MNT_RELATIME) || + (timespec_compare(>i_atime, >i_mtime) < 0) || + (timespec_compare(>i_atime, >i_ctime) < 0)) { inode->i_atime = now; mark_inode_dirty_sync(inode); } --- linux-2.6.18-rc4-relatime.orig/fs/namespace.c +++ linux-2.6.18-rc4-relatime/fs/namespace.c @@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file * { MNT_NOEXEC, ",noexec" }, { MNT_NOATIME, ",noatime" }, { MNT_NODIRATIME, ",nodiratime" }, + { MNT_RELATIME, ",relatime" }, { 0, NULL } }; struct proc_fs_info *fs_infop; @@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_ mnt_flags |= MNT_NOATIME; if (flags & MS_NODIRATIME) mnt_flags |= MNT_NODIRATIME; + if (flags & MS_RELATIME) + mnt_flags |= MNT_RELATIME; flags &= ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | - MS_NOATIME | MS_NODIRATIME); + MS_NOATIME | MS_NODIRATIME | MS_RELATIME); /* ... and get the mountpoint */ retval = path_lookup(dir_name, LOOKUP_FOLLOW, ); --- linux-2.6.18-rc4-relatime.orig/include/linux/fs.h +++ linux-2.6.18-rc4-relatime/include/linux/fs.h @@ -119,6 +119,7 @@ extern int dir_notify_enable; #define MS_PRIVATE (1<<18) /* change to private */ #define MS_SLAVE (1<<19) /* change to slave */ #define MS_SHARED (1<<20) /* change to shared */ +#define MS_RELATIME(1<<21) /* Update atime relative to mtime/ctime. */ #define MS_ACTIVE (1<<30) #define MS_NOUSER (1<<31) --- linux-2.6.18-rc4-relatime.orig/include/linux/mount.h +++ linux-2.6.18-rc4-relatime/include/linux/mount.h @@ -27,6 +27,7 @@ struct namespace; #define MNT_NOEXEC 0x04 #define MNT_NOATIME0x08 #define MNT_NODIRATIME 0x10 +#define MNT_RELATIME 0x20 #define MNT_SHRINKABLE 0x100 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Relative atime (was Re: What's in ocfs2.git)
On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote: Hi Steve, On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote: In the future, I'd like to see a relative atime mode, which functions in the manner described by Valerie Henson at: http://lkml.org/lkml/2006/8/25/380 I'd like to second that. [adding Val Henson to the to] What (if anything) remains to be done before the relative atime patch is ready to go upstream? I'm happy to help out here if required, Last time I looked at them, things seemed to be in pretty good shape - it wasn't a very large patch series. Yep, the relative atime patch is tiny and pretty much done - just needs some soak time in -mm and a little more review (cc'd Viro and fsdevel). Kernel patch against 2.6.18-rc4 appended, patch to mount following. (Note that my web server suffered a RAID failure and my patches page is unavailable till the restore finishes.) -VAL Add relatime (relative atime) support. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Signed-off-by: Valerie Henson [EMAIL PROTECTED] --- fs/inode.c| 11 ++- fs/namespace.c|5 - include/linux/fs.h|1 + include/linux/mount.h |1 + 4 files changed, 16 insertions(+), 2 deletions(-) --- linux-2.6.18-rc4-relatime.orig/fs/inode.c +++ linux-2.6.18-rc4-relatime/fs/inode.c @@ -1200,7 +1200,16 @@ void touch_atime(struct vfsmount *mnt, s return; now = current_fs_time(inode-i_sb); - if (!timespec_equal(inode-i_atime, now)) { + if (timespec_equal(inode-i_atime, now)) + return; + /* +* With relative atime, only update atime if the previous +* atime is earlier than either the ctime or mtime. +*/ + if (!mnt || + !(mnt-mnt_flags MNT_RELATIME) || + (timespec_compare(inode-i_atime, inode-i_mtime) 0) || + (timespec_compare(inode-i_atime, inode-i_ctime) 0)) { inode-i_atime = now; mark_inode_dirty_sync(inode); } --- linux-2.6.18-rc4-relatime.orig/fs/namespace.c +++ linux-2.6.18-rc4-relatime/fs/namespace.c @@ -376,6 +376,7 @@ static int show_vfsmnt(struct seq_file * { MNT_NOEXEC, ,noexec }, { MNT_NOATIME, ,noatime }, { MNT_NODIRATIME, ,nodiratime }, + { MNT_RELATIME, ,relatime }, { 0, NULL } }; struct proc_fs_info *fs_infop; @@ -1413,9 +1414,11 @@ long do_mount(char *dev_name, char *dir_ mnt_flags |= MNT_NOATIME; if (flags MS_NODIRATIME) mnt_flags |= MNT_NODIRATIME; + if (flags MS_RELATIME) + mnt_flags |= MNT_RELATIME; flags = ~(MS_NOSUID | MS_NOEXEC | MS_NODEV | MS_ACTIVE | - MS_NOATIME | MS_NODIRATIME); + MS_NOATIME | MS_NODIRATIME | MS_RELATIME); /* ... and get the mountpoint */ retval = path_lookup(dir_name, LOOKUP_FOLLOW, nd); --- linux-2.6.18-rc4-relatime.orig/include/linux/fs.h +++ linux-2.6.18-rc4-relatime/include/linux/fs.h @@ -119,6 +119,7 @@ extern int dir_notify_enable; #define MS_PRIVATE (118) /* change to private */ #define MS_SLAVE (119) /* change to slave */ #define MS_SHARED (120) /* change to shared */ +#define MS_RELATIME(121) /* Update atime relative to mtime/ctime. */ #define MS_ACTIVE (130) #define MS_NOUSER (131) --- linux-2.6.18-rc4-relatime.orig/include/linux/mount.h +++ linux-2.6.18-rc4-relatime/include/linux/mount.h @@ -27,6 +27,7 @@ struct namespace; #define MNT_NOEXEC 0x04 #define MNT_NOATIME0x08 #define MNT_NODIRATIME 0x10 +#define MNT_RELATIME 0x20 #define MNT_SHRINKABLE 0x100 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relative atime (was Re: What's in ocfs2.git)
On Mon, Dec 04, 2006 at 04:36:20PM -0800, Valerie Henson wrote: On Mon, Dec 04, 2006 at 04:10:07PM -0800, Mark Fasheh wrote: Hi Steve, On Mon, Dec 04, 2006 at 10:54:53AM +, Steven Whitehouse wrote: In the future, I'd like to see a relative atime mode, which functions in the manner described by Valerie Henson at: http://lkml.org/lkml/2006/8/25/380 I'd like to second that. [adding Val Henson to the to] What (if anything) remains to be done before the relative atime patch is ready to go upstream? I'm happy to help out here if required, Last time I looked at them, things seemed to be in pretty good shape - it wasn't a very large patch series. And the userland part. -VAL Add the relatime (relative atime) option support to mount. Relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified. Signed-off-by: Valerie Henson [EMAIL PROTECTED] --- mount/mount.8 |7 +++ mount/mount.c |6 ++ mount/mount_constants.h |4 3 files changed, 17 insertions(+) --- util-linux-2.13-pre7.orig/mount/mount.8 +++ util-linux-2.13-pre7/mount/mount.8 @@ -586,6 +586,13 @@ access on the news spool to speed up new .B nodiratime Do not update directory inode access times on this filesystem. .TP +.B relatime +Update inode access times relative to modify or change time. Access +time is only updated if the previous access time was earlier than the +current modify or change time. (Similar to noatime, but doesn't break +mutt or other applications that need to know if a file has been read +since the last time it was modified.) +.TP .B noauto Can only be mounted explicitly (i.e., the .B \-a --- util-linux-2.13-pre7.orig/mount/mount.c +++ util-linux-2.13-pre7/mount/mount.c @@ -164,6 +164,12 @@ static const struct opt_map opt_map[] = { diratime,0, 1, MS_NODIRATIME }, /* Update dir access times */ { nodiratime, 0, 0, MS_NODIRATIME },/* Do not update dir access times */ #endif +#ifdef MS_RELATIME + { relatime, 0, 0, MS_RELATIME }, /* Update access times relative to + mtime/ctime */ + { norelatime, 0, 1, MS_RELATIME }, /* Update access time without regard + to mtime/ctime */ +#endif { NULL, 0, 0, 0 } }; --- util-linux-2.13-pre7.orig/mount/mount_constants.h +++ util-linux-2.13-pre7/mount/mount_constants.h @@ -57,6 +57,10 @@ if we have a stack or plain mount - moun #ifndef MS_VERBOSE #define MS_VERBOSE 0x8000 /* 32768 */ #endif +#ifndef MS_RELATIME +#define MS_RELATIME 0x20 /* 20: Update access times relative + to mtime/ctime */ +#endif /* * Magic mount flag number. Had to be or-ed to the flag values. */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/