Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
On Wed, 2007-11-07 at 17:10 -0500, Steven Rostedt wrote: > > > > It would be nice if sched_nr_migrate didn't exist, really. It's hard to > > imagine anyone wanting to tweak it, apart from developers. > > I'm not so sure about that. It is a tunable for RT. That is we can tweak > this value to be smaller if we don't like the latencies it gives us. > > This is one of those things that sacrifices performance for latency. > The higher the number, the better it can spread tasks around, but it > also causes large latencies. > > I've just included this patch into 2.6.23.1-rt11 and it brought down an > unbounded latency to just 42us. (previously we got into the > milliseconds!). > > Perhaps when this feature matures, we can come to a good defined value > that would be good for all. But until then, I recommend keeping this a > tunable. Why not use the latency-expectation infrastructure? Iterate under lock until (or before...) the system global latency is respected. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sched: avoid large irq-latencies in smp-balancing
On Wed, 2007-11-07 at 17:10 -0500, Steven Rostedt wrote: It would be nice if sched_nr_migrate didn't exist, really. It's hard to imagine anyone wanting to tweak it, apart from developers. I'm not so sure about that. It is a tunable for RT. That is we can tweak this value to be smaller if we don't like the latencies it gives us. This is one of those things that sacrifices performance for latency. The higher the number, the better it can spread tasks around, but it also causes large latencies. I've just included this patch into 2.6.23.1-rt11 and it brought down an unbounded latency to just 42us. (previously we got into the milliseconds!). Perhaps when this feature matures, we can come to a good defined value that would be good for all. But until then, I recommend keeping this a tunable. Why not use the latency-expectation infrastructure? Iterate under lock until (or before...) the system global latency is respected. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] stringbuf: A string buffer implementation
On Tue, 2007-10-23 at 20:35 -0600, Matthew Wilcox wrote: [...] > > Multiple string objects can share the same data, by increasing the nrefs > > count, a new data is allocated if the string is modified and nrefs > 1. > > If we were trying to get rid of char * throughout the kernel, that might > make some sense; stringbufs have a more limited target though. > [...] No contest, my suggestions only make sense for a general uses string library. I suspect most in-kernel string manipulations are limited to prepare buffers to be copied to (and read from) user-space. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] stringbuf: A string buffer implementation
On Tue, 2007-10-23 at 17:12 -0400, Matthew Wilcox wrote: > Consecutive calls to printk are non-atomic, which leads to various > implementations for accumulating strings which can be printed in one call. > This is a generic string buffer which can also be used for non-printk > purposes. There is no sb_scanf implementation yet as I haven't identified > a user for it. > > + > +struct stringbuf { > + char *s; > + int alloc; > + int len; > +}; > + I don't know if copy-on-write semantics are really useful for current in-kernel uses, but I've coded and used a C++ string class like this in the past: struct string_data { int nrefs; unsigned len; unsigned capacity; //char data[capacity]; /* allocated along string_data */ }; struct string /* or typedef in C... */ { struct string *data; }; [ struct string_data is a hidden implementation detail, only struct string is exposed ] Multiple string objects can share the same data, by increasing the nrefs count, a new data is allocated if the string is modified and nrefs > 1. Not having to iterate over the string to calculate it's length, allocating a larger buffer to eliminate re-allocation and copy-on-write semantics make a string like this a vast performance improvement over a normal C string for a minimal (about 3 ints per data buffer) memory cost. By using it correctly it can prevents buffer overflows. You still always null terminate the string stored in data to directly use it a normal C string. You also statically allocate an empty string which is shared by all "uninitialized" or empty strings. Even without copy-on-write semantics and reference counting, I think this approach is better because it uses 1 less "object" and allocation: struct string - "handle" (pointer really) to string data struct string_data - string data versus: struct stringbuf *sb - pointer to string object struct stringbuf - string object char *s (member of stringbuf) - string data Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] stringbuf: A string buffer implementation
On Tue, 2007-10-23 at 17:12 -0400, Matthew Wilcox wrote: Consecutive calls to printk are non-atomic, which leads to various implementations for accumulating strings which can be printed in one call. This is a generic string buffer which can also be used for non-printk purposes. There is no sb_scanf implementation yet as I haven't identified a user for it. + +struct stringbuf { + char *s; + int alloc; + int len; +}; + I don't know if copy-on-write semantics are really useful for current in-kernel uses, but I've coded and used a C++ string class like this in the past: struct string_data { int nrefs; unsigned len; unsigned capacity; //char data[capacity]; /* allocated along string_data */ }; struct string /* or typedef in C... */ { struct string *data; }; [ struct string_data is a hidden implementation detail, only struct string is exposed ] Multiple string objects can share the same data, by increasing the nrefs count, a new data is allocated if the string is modified and nrefs 1. Not having to iterate over the string to calculate it's length, allocating a larger buffer to eliminate re-allocation and copy-on-write semantics make a string like this a vast performance improvement over a normal C string for a minimal (about 3 ints per data buffer) memory cost. By using it correctly it can prevents buffer overflows. You still always null terminate the string stored in data to directly use it a normal C string. You also statically allocate an empty string which is shared by all uninitialized or empty strings. Even without copy-on-write semantics and reference counting, I think this approach is better because it uses 1 less object and allocation: struct string - handle (pointer really) to string data struct string_data - string data versus: struct stringbuf *sb - pointer to string object struct stringbuf - string object char *s (member of stringbuf) - string data Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] stringbuf: A string buffer implementation
On Tue, 2007-10-23 at 20:35 -0600, Matthew Wilcox wrote: [...] Multiple string objects can share the same data, by increasing the nrefs count, a new data is allocated if the string is modified and nrefs 1. If we were trying to get rid of char * throughout the kernel, that might make some sense; stringbufs have a more limited target though. [...] No contest, my suggestions only make sense for a general uses string library. I suspect most in-kernel string manipulations are limited to prepare buffers to be copied to (and read from) user-space. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..
On Tue, 2007-10-02 at 11:17 +0200, Thomas Gleixner wrote: [...] > I have uploaded an update of the arch/x86 tree based on -rc9 to > > git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86 > [...] > If there is anything we can help with the transition, please do not > hesitate to ask. > > Thanks, > > Thomas, Ingo Hi Thomas, This latest x86 branch build and boot without problem with my usual x86_64 config. If you remember our conversation one month ago, I was unable to build your tree. I've upgraded my Ubuntu distribution from 7.04 to 7.10 beta this week, maybe this fixed it. But I still had to do some manual fixes to get the packaging steps working: mkdir arch/x86_64/boot/ ln -s ../../../arch/x86/boot/bzImage arch/x86_64/boot/bzImage Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: yield API
On Tue, 2007-10-02 at 08:46 +0200, Ingo Molnar wrote: [...] > APIs that are not in any real, meaningful use, despite a decade of > presence are not really interesting to me personally. (especially in > this case where we know exactly _why_ the API is used so rarely.) Sure > we'll continue to support it in the best possible way, with the usual > kernel maintainance policy: without hurting other, more commonly used > APIs. That was the principle we followed in previous schedulers too. And > if anyone has a patch to make sched_yield() better than it is today, i'm > of course interested in it. Do you still have intentions to add a directed yield API? I remember seeing it in the earlier CFS patches. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH] Add sysfs control to modify a user's cpu share
On Mon, 2007-10-01 at 16:44 +0200, Ingo Molnar wrote: > > Adds tunables in sysfs to modify a user's cpu share. > > > > A directory is created in sysfs for each new user in the system. > > > > /sys/kernel/uids//cpu_share > > > > Reading this file returns the cpu shares granted for the user. > > Writing into this file modifies the cpu share for the user. Only an > > administrator is allowed to modify a user's cpu share. > > > > Ex: > > # cd /sys/kernel/uids/ > > # cat 512/cpu_share > > 1024 > > # echo 2048 > 512/cpu_share > > # cat 512/cpu_share > > 2048 > > # > > looks good to me! I think this API is pretty straightforward. I've put > this into my tree and have updated the sched-devel git tree: > While a sysfs interface is OK and somewhat orthogonal to the interface proposed the containers patches, I think maybe a new syscall should be considered. Since we now have a fair share cpu scheduler, maybe an interface to specify the cpu share directly (alternatively to priority) make sense. For processes, it may become more intuitive (and precise) to set the processing share directly than setting a priority which is converted to a share. Maybe something similar to ioprio_set() and ioprio_get() syscalls: - per user cpu share - per user group cpu share - per process cpu share - per process group cpu share Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC/PATCH] Add sysfs control to modify a user's cpu share
On Mon, 2007-10-01 at 16:44 +0200, Ingo Molnar wrote: Adds tunables in sysfs to modify a user's cpu share. A directory is created in sysfs for each new user in the system. /sys/kernel/uids/uid/cpu_share Reading this file returns the cpu shares granted for the user. Writing into this file modifies the cpu share for the user. Only an administrator is allowed to modify a user's cpu share. Ex: # cd /sys/kernel/uids/ # cat 512/cpu_share 1024 # echo 2048 512/cpu_share # cat 512/cpu_share 2048 # looks good to me! I think this API is pretty straightforward. I've put this into my tree and have updated the sched-devel git tree: While a sysfs interface is OK and somewhat orthogonal to the interface proposed the containers patches, I think maybe a new syscall should be considered. Since we now have a fair share cpu scheduler, maybe an interface to specify the cpu share directly (alternatively to priority) make sense. For processes, it may become more intuitive (and precise) to set the processing share directly than setting a priority which is converted to a share. Maybe something similar to ioprio_set() and ioprio_get() syscalls: - per user cpu share - per user group cpu share - per process cpu share - per process group cpu share Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: yield API
On Tue, 2007-10-02 at 08:46 +0200, Ingo Molnar wrote: [...] APIs that are not in any real, meaningful use, despite a decade of presence are not really interesting to me personally. (especially in this case where we know exactly _why_ the API is used so rarely.) Sure we'll continue to support it in the best possible way, with the usual kernel maintainance policy: without hurting other, more commonly used APIs. That was the principle we followed in previous schedulers too. And if anyone has a patch to make sched_yield() better than it is today, i'm of course interested in it. Do you still have intentions to add a directed yield API? I remember seeing it in the earlier CFS patches. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..
On Tue, 2007-10-02 at 11:17 +0200, Thomas Gleixner wrote: [...] I have uploaded an update of the arch/x86 tree based on -rc9 to git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86.git x86 [...] If there is anything we can help with the transition, please do not hesitate to ask. Thanks, Thomas, Ingo Hi Thomas, This latest x86 branch build and boot without problem with my usual x86_64 config. If you remember our conversation one month ago, I was unable to build your tree. I've upgraded my Ubuntu distribution from 7.04 to 7.10 beta this week, maybe this fixed it. But I still had to do some manual fixes to get the packaging steps working: mkdir arch/x86_64/boot/ ln -s ../../../arch/x86/boot/bzImage arch/x86_64/boot/bzImage Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm2 (vm-dont-run-touch_buffer-during-buffercache-lookups.patch)
On Wed, 2007-01-08 at 00:46 -0700, Andrew Morton wrote: > Or you could do something more real-worldly like start up OO, firefox and > friends, then run /etc/cron.daily/everything and see what the > before-and-after effects are. The aggregate info we're looking for is > captured in /proc/meminfo: swapped, Mapped, Cached, Buffers. IMO it will be harder to come with reproducible numbers, everyone desktop is different, as their filesystem contents. Anyway I will cook up something and post it. It might be useful for others to understand the updatedb problem. I intend to try only this specific patch not the full -mm, is there any other patch I need to apply too? - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm2 (vm-dont-run-touch_buffer-during-buffercache-lookups.patch)
On Tue, 2007-31-07 at 23:09 -0700, Andrew Morton wrote: > +vm-dont-run-touch_buffer-during-buffercache-lookups.patch > > A little VM experiment. See changelog for details. > We don't have any tests to determine the effects of this, and nobody will > bother setting one up, so ho hum, this remains in -mm for ever. > I don't think there's any point in doing this until we have some decent > testcases. Hi Andrew, For which problem this patch was coded? Is it a potential fix to the updatedb problem? Is the patch effective without the filesystem dependant change you talk about? (I use reiserfs) I've been thinking about a test case for the updatedb problem: 1. Script or program that create a large number of directories and zero sized files. Same setup for everyone to have reproducible results. 2. Run updatedb on those. 3. Observe the effects (with vmstat, slabinfo and meminfo) before, during and after the updatedb run. 4. Do something to trigger some reclaim like copying a large file. 5. See the effects. What do you think? What would be the ideal test case for the problem in your opinion? Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm2 (vm-dont-run-touch_buffer-during-buffercache-lookups.patch)
On Tue, 2007-31-07 at 23:09 -0700, Andrew Morton wrote: +vm-dont-run-touch_buffer-during-buffercache-lookups.patch A little VM experiment. See changelog for details. We don't have any tests to determine the effects of this, and nobody will bother setting one up, so ho hum, this remains in -mm for ever. I don't think there's any point in doing this until we have some decent testcases. Hi Andrew, For which problem this patch was coded? Is it a potential fix to the updatedb problem? Is the patch effective without the filesystem dependant change you talk about? (I use reiserfs) I've been thinking about a test case for the updatedb problem: 1. Script or program that create a large number of directories and zero sized files. Same setup for everyone to have reproducible results. 2. Run updatedb on those. 3. Observe the effects (with vmstat, slabinfo and meminfo) before, during and after the updatedb run. 4. Do something to trigger some reclaim like copying a large file. 5. See the effects. What do you think? What would be the ideal test case for the problem in your opinion? Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.23-rc1-mm2 (vm-dont-run-touch_buffer-during-buffercache-lookups.patch)
On Wed, 2007-01-08 at 00:46 -0700, Andrew Morton wrote: Or you could do something more real-worldly like start up OO, firefox and friends, then run /etc/cron.daily/everything and see what the before-and-after effects are. The aggregate info we're looking for is captured in /proc/meminfo: swapped, Mapped, Cached, Buffers. IMO it will be harder to come with reproducible numbers, everyone desktop is different, as their filesystem contents. Anyway I will cook up something and post it. It might be useful for others to understand the updatedb problem. I intend to try only this specific patch not the full -mm, is there any other patch I need to apply too? - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] fadvise POSIX_FADV_NOREUSE does nothing
Related to my other bug report today, calling posix_fadvise (which uses fadvise64) with the POSIX_FADV_NOREUSE flag does nothing. The pages are not dropped behind. I also tried call fadvise with POSIX_FADV_SEQUENTIAL first. This is expected as the POSIX_FADV_NOREUSE is a no-op in the recent kernels. Also, POSIX_FADV_SEQUENTIAL only does the readahead window. It doesn't hint the VM in any way to possibly drop-behind the pages. (See the previous bug report for more details of the test case) Relevant numbers: Copying (using fadvise_cp) a large file test: 1st run: 0m9.018s 2nd run: 0m3.444s Copying large file... 3rd run: 0m14.024s<<< page cache trashed 4th run: 0m3.449s Test programs and batch files are attached. - Eric #include #include #include #include int main(int argc, char *argv[]) { int in; int out; int pagesize; void *buf; off_t pos; if (argc != 3) { printf("Usage: %s \n", argv[0]); return EXIT_FAILURE; } in = open(argv[1], O_RDONLY, 0); out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0666); posix_fadvise(in, 0, 0, POSIX_FADV_SEQUENTIAL); posix_fadvise(out, 0, 0, POSIX_FADV_SEQUENTIAL); pagesize = getpagesize(); buf = malloc(pagesize); pos = 0; for (;;) { ssize_t count; count = read(in, buf, pagesize); if (!count || count == -1) break; write(out, buf, count); /* right usage pattern? */ posix_fadvise(in, pos, count, POSIX_FADV_NOREUSE); posix_fadvise(out, pos, count, POSIX_FADV_NOREUSE); pos += count; } free(buf); close(in); close(out); return EXIT_SUCCESS; } all: gcc fadvise_cp.c -o fadvise_cp gcc working_set_simul.c -o working_set_simul use-once-test.sh Description: application/shellscript #include #include #include #include #include #include int main(int argc, char *argv[]) { int fd; off_t size; char *mapping; unsigned r; unsigned i; if (argc != 2) { printf("Usage: %s \n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_RDONLY, 0); size = lseek(fd, 0, SEEK_END); mapping = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); /* access (read) the file a couple of times*/ for (r = 0; r < 4; r++) { for (i = 0; i < size; i++) { char t = mapping[i]; } } munmap(mapping, size); close(fd); return EXIT_SUCCESS; }
[BUG] Linux VM use-once mechanisms don't work (test case with numbers included)
Linux VM use-once mechanisms don't seem to work. Simple scenario like streaming a file much greater than physical RAM size should be identified to avoid trashing the page cache with useless data. I know the VM cannot predict the future or assume anything about the user's intent. But this workload is simple and common, it should be detected and better handled. Test case: Linux 2.6.20-16-lowlatency SMP PREEMPT x86_64 (also tried on 2.6.23-rc1) - A file of 1/3 the RAM size is created, mapped and frequently accessed (4 times). - The test is run multiple times (4 total) to time it's execution. - After the first run, other runs take much less time, because the file is cached. - A previously created file, 4 times the size of the RAM, is read or copied. - The test is re-run (2 times) to time it's execution. To test: $ make # ./use-once-test.sh Some big files will be created in your /tmp. They don't get erased after the test to speedup multiple runs. Results: - The test execution time greatly increase after reading or copying the large file. - Frequently used data got kick out of the page cache and replaced with useless read once data. - Both the read only and copy (read + write) cases don't work. I believe this clearly illustrate the slowdowns I experience after I copy large files around my system. All applications on my desktop are jerky for some moments after that. Watching a DVD is another example. Base test: 1st run: 0m8.958s 2nd run: 0m3.442s 3rd run: 0m3.452s 4th run: 0m3.443s Reading a large file test: 1st run: 0m8.997s 2nd run: 0m3.522s `/tmp/large_file' -> `/dev/null' 3rd run: 0m8.999s<<< page cache trashed 4th run: 0m3.440s Copying (using cp) a large file test: 1st run: 0m8.979s 2nd run: 0m3.442s `/tmp/large_file' -> `/tmp/large_file.copy' 3rd run: 0m13.814s<<< page cache trashed 4th run: 0m3.455s Copying (using fadvise_cp) a large file test: 1st run: 0m9.018s 2nd run: 0m3.444s Copying large file... 3rd run: 0m14.024s<<< page cache trashed 4th run: 0m3.449s Copying (using splice-cp) a large file test: 1st run: 0m8.977s 2nd run: 0m3.442s Copying large file... 3rd run: 0m14.118s<<< page cache trashed 4th run: 0m3.456s Possible solutions: Various patches to fix the use-once mechanisms were discussed in the past. Some more that 6 years ago and some more recently. http://lwn.net/2001/0726/a/2q.php3 http://lkml.org/lkml/2005/5/3/6 http://lkml.org/lkml/2006/7/17/192 http://lkml.org/lkml/2007/7/9/340 http://lkml.org/lkml/2007/7/21/219 (*1) (*1) I have tested Peter's patch with some success. It fix the read case, but no the copy case. Results: http://lkml.org/lkml/2007/7/24/527 Test programs and batch files are attached. - Eric #include #include #include #include int main(int argc, char *argv[]) { int in; int out; int pagesize; void *buf; off_t pos; if (argc != 3) { printf("Usage: %s \n", argv[0]); return EXIT_FAILURE; } in = open(argv[1], O_RDONLY, 0); out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0666); posix_fadvise(in, 0, 0, POSIX_FADV_SEQUENTIAL); posix_fadvise(out, 0, 0, POSIX_FADV_SEQUENTIAL); pagesize = getpagesize(); buf = malloc(pagesize); pos = 0; for (;;) { ssize_t count; count = read(in, buf, pagesize); if (!count || count == -1) break; write(out, buf, count); /* right usage pattern? */ posix_fadvise(in, pos, count, POSIX_FADV_NOREUSE); posix_fadvise(out, pos, count, POSIX_FADV_NOREUSE); pos += count; } free(buf); close(in); close(out); return EXIT_SUCCESS; } all: gcc fadvise_cp.c -o fadvise_cp gcc working_set_simul.c -o working_set_simul use-once-test.sh Description: application/shellscript #include #include #include #include #include #include int main(int argc, char *argv[]) { int fd; off_t size; char *mapping; unsigned r; unsigned i; if (argc != 2) { printf("Usage: %s \n", argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_RDONLY, 0); size = lseek(fd, 0, SEEK_END); mapping = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); /* access (read) the file a couple of times*/ for (r = 0; r < 4; r++) { for (i = 0; i < size; i++) { char t = mapping[i]; } } munmap(mapping, size); close(fd); return EXIT_SUCCESS; }
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 17:09 +1000, Nick Piggin wrote: > Eric St-Laurent wrote: > > I test this on my main system, so patches with basic testing and > > reasonable stability are preferred. I just want to avoid data corruption > > bugs. FYI, I used to run the -rt tree most of the time. > > OK here is one which just changes the rate that the active and inactive > lists get scanned. Data corruption bugs should be minimal ;) > Nick, I have tried your patch with my test case, unfortunately it doesn't help. Numbers did vary a little bit more, and it seemed drop_caches was not working as well as usual (used between the runs). Also, overall the runs took about .1s more to complete. Linux 2.6.23-rc1-nick PREEMPT x86_64 Base test: 1st run: 0m9.123s 2nd run: 0m3.565s 3rd run: 0m3.553s 4th run: 0m3.565s Reading a large file test: 1st run: 0m9.146s 2nd run: 0m3.560s `/tmp/large_file' -> `/dev/null' 3rd run: 0m19.759s 4th run: 0m3.515s Copying (using cp) a large file test: 1st run: 0m9.085s 2nd run: 0m3.522s `/tmp/large_file' -> `/tmp/large_file.copy' 3rd run: 0m9.977s 4th run: 0m3.518s Anyway, what is the theory behind the patch? - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 17:09 +1000, Nick Piggin wrote: Eric St-Laurent wrote: I test this on my main system, so patches with basic testing and reasonable stability are preferred. I just want to avoid data corruption bugs. FYI, I used to run the -rt tree most of the time. OK here is one which just changes the rate that the active and inactive lists get scanned. Data corruption bugs should be minimal ;) Nick, I have tried your patch with my test case, unfortunately it doesn't help. Numbers did vary a little bit more, and it seemed drop_caches was not working as well as usual (used between the runs). Also, overall the runs took about .1s more to complete. Linux 2.6.23-rc1-nick PREEMPT x86_64 Base test: 1st run: 0m9.123s 2nd run: 0m3.565s 3rd run: 0m3.553s 4th run: 0m3.565s Reading a large file test: 1st run: 0m9.146s 2nd run: 0m3.560s `/tmp/large_file' - `/dev/null' 3rd run: 0m19.759s 4th run: 0m3.515s Copying (using cp) a large file test: 1st run: 0m9.085s 2nd run: 0m3.522s `/tmp/large_file' - `/tmp/large_file.copy' 3rd run: 0m9.977s 4th run: 0m3.518s Anyway, what is the theory behind the patch? - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[BUG] Linux VM use-once mechanisms don't work (test case with numbers included)
Linux VM use-once mechanisms don't seem to work. Simple scenario like streaming a file much greater than physical RAM size should be identified to avoid trashing the page cache with useless data. I know the VM cannot predict the future or assume anything about the user's intent. But this workload is simple and common, it should be detected and better handled. Test case: Linux 2.6.20-16-lowlatency SMP PREEMPT x86_64 (also tried on 2.6.23-rc1) - A file of 1/3 the RAM size is created, mapped and frequently accessed (4 times). - The test is run multiple times (4 total) to time it's execution. - After the first run, other runs take much less time, because the file is cached. - A previously created file, 4 times the size of the RAM, is read or copied. - The test is re-run (2 times) to time it's execution. To test: $ make # ./use-once-test.sh Some big files will be created in your /tmp. They don't get erased after the test to speedup multiple runs. Results: - The test execution time greatly increase after reading or copying the large file. - Frequently used data got kick out of the page cache and replaced with useless read once data. - Both the read only and copy (read + write) cases don't work. I believe this clearly illustrate the slowdowns I experience after I copy large files around my system. All applications on my desktop are jerky for some moments after that. Watching a DVD is another example. Base test: 1st run: 0m8.958s 2nd run: 0m3.442s 3rd run: 0m3.452s 4th run: 0m3.443s Reading a large file test: 1st run: 0m8.997s 2nd run: 0m3.522s `/tmp/large_file' - `/dev/null' 3rd run: 0m8.999s page cache trashed 4th run: 0m3.440s Copying (using cp) a large file test: 1st run: 0m8.979s 2nd run: 0m3.442s `/tmp/large_file' - `/tmp/large_file.copy' 3rd run: 0m13.814s page cache trashed 4th run: 0m3.455s Copying (using fadvise_cp) a large file test: 1st run: 0m9.018s 2nd run: 0m3.444s Copying large file... 3rd run: 0m14.024s page cache trashed 4th run: 0m3.449s Copying (using splice-cp) a large file test: 1st run: 0m8.977s 2nd run: 0m3.442s Copying large file... 3rd run: 0m14.118s page cache trashed 4th run: 0m3.456s Possible solutions: Various patches to fix the use-once mechanisms were discussed in the past. Some more that 6 years ago and some more recently. http://lwn.net/2001/0726/a/2q.php3 http://lkml.org/lkml/2005/5/3/6 http://lkml.org/lkml/2006/7/17/192 http://lkml.org/lkml/2007/7/9/340 http://lkml.org/lkml/2007/7/21/219 (*1) (*1) I have tested Peter's patch with some success. It fix the read case, but no the copy case. Results: http://lkml.org/lkml/2007/7/24/527 Test programs and batch files are attached. - Eric #include fcntl.h #include stdio.h #include stdlib.h #include unistd.h int main(int argc, char *argv[]) { int in; int out; int pagesize; void *buf; off_t pos; if (argc != 3) { printf(Usage: %s src dest\n, argv[0]); return EXIT_FAILURE; } in = open(argv[1], O_RDONLY, 0); out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0666); posix_fadvise(in, 0, 0, POSIX_FADV_SEQUENTIAL); posix_fadvise(out, 0, 0, POSIX_FADV_SEQUENTIAL); pagesize = getpagesize(); buf = malloc(pagesize); pos = 0; for (;;) { ssize_t count; count = read(in, buf, pagesize); if (!count || count == -1) break; write(out, buf, count); /* right usage pattern? */ posix_fadvise(in, pos, count, POSIX_FADV_NOREUSE); posix_fadvise(out, pos, count, POSIX_FADV_NOREUSE); pos += count; } free(buf); close(in); close(out); return EXIT_SUCCESS; } all: gcc fadvise_cp.c -o fadvise_cp gcc working_set_simul.c -o working_set_simul use-once-test.sh Description: application/shellscript #include fcntl.h #include memory.h #include stdio.h #include stdlib.h #include sys/mman.h #include unistd.h int main(int argc, char *argv[]) { int fd; off_t size; char *mapping; unsigned r; unsigned i; if (argc != 2) { printf(Usage: %s file\n, argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_RDONLY, 0); size = lseek(fd, 0, SEEK_END); mapping = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); /* access (read) the file a couple of times*/ for (r = 0; r 4; r++) { for (i = 0; i size; i++) { char t = mapping[i]; } } munmap(mapping, size); close(fd); return EXIT_SUCCESS; }
[BUG] fadvise POSIX_FADV_NOREUSE does nothing
Related to my other bug report today, calling posix_fadvise (which uses fadvise64) with the POSIX_FADV_NOREUSE flag does nothing. The pages are not dropped behind. I also tried call fadvise with POSIX_FADV_SEQUENTIAL first. This is expected as the POSIX_FADV_NOREUSE is a no-op in the recent kernels. Also, POSIX_FADV_SEQUENTIAL only does the readahead window. It doesn't hint the VM in any way to possibly drop-behind the pages. (See the previous bug report for more details of the test case) Relevant numbers: Copying (using fadvise_cp) a large file test: 1st run: 0m9.018s 2nd run: 0m3.444s Copying large file... 3rd run: 0m14.024s page cache trashed 4th run: 0m3.449s Test programs and batch files are attached. - Eric #include fcntl.h #include stdio.h #include stdlib.h #include unistd.h int main(int argc, char *argv[]) { int in; int out; int pagesize; void *buf; off_t pos; if (argc != 3) { printf(Usage: %s src dest\n, argv[0]); return EXIT_FAILURE; } in = open(argv[1], O_RDONLY, 0); out = open(argv[2], O_CREAT | O_WRONLY | O_TRUNC, 0666); posix_fadvise(in, 0, 0, POSIX_FADV_SEQUENTIAL); posix_fadvise(out, 0, 0, POSIX_FADV_SEQUENTIAL); pagesize = getpagesize(); buf = malloc(pagesize); pos = 0; for (;;) { ssize_t count; count = read(in, buf, pagesize); if (!count || count == -1) break; write(out, buf, count); /* right usage pattern? */ posix_fadvise(in, pos, count, POSIX_FADV_NOREUSE); posix_fadvise(out, pos, count, POSIX_FADV_NOREUSE); pos += count; } free(buf); close(in); close(out); return EXIT_SUCCESS; } all: gcc fadvise_cp.c -o fadvise_cp gcc working_set_simul.c -o working_set_simul use-once-test.sh Description: application/shellscript #include fcntl.h #include memory.h #include stdio.h #include stdlib.h #include sys/mman.h #include unistd.h int main(int argc, char *argv[]) { int fd; off_t size; char *mapping; unsigned r; unsigned i; if (argc != 2) { printf(Usage: %s file\n, argv[0]); return EXIT_FAILURE; } fd = open(argv[1], O_RDONLY, 0); size = lseek(fd, 0, SEEK_END); mapping = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); /* access (read) the file a couple of times*/ for (r = 0; r 4; r++) { for (i = 0; i size; i++) { char t = mapping[i]; } } munmap(mapping, size); close(fd); return EXIT_SUCCESS; }
Re: [patch] sched: make cpu_clock() not use the rq clock
On Thu, 2007-26-07 at 11:00 +0200, Ingo Molnar wrote: > Subject: sched: make cpu_clock() not use the rq clock > From: Ingo Molnar <[EMAIL PROTECTED]> > > it is enough to disable interrupts to get the precise rq-clock > of the local CPU. Hi Ingo, Those new fast nanoseconds resolution clock APIs are nice but it seems to me that their naming and _where_ they are implemented in the tree is a little odd, IMO. We have: 1. sched_clock() is in kernel/sched.c (weak implementation) 2. sched_clock() is in arch/i386/kernel/tsc.c (architecture override) 3. rq_clock() is in kernel/sched.c 4. cpu_clock() is in kernel/sched.c I would suggest: 1. rename sched_clock() (remove sched_ as it's not sched specific anymore) and place it in kernel/time/... 2. rename the architecture specific version of it too This first function is the basic fast ns clock 3. base your rq_clock() on cpu_clock() (#4) or use the later directly. This is local to sched.c 4. move cpu_clock() in kernel/time/... This the per-cpu monotonic version. See my point? Base the scheduler clock from a general kernel API, not the other way around. Just a suggestion. Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] sched: make cpu_clock() not use the rq clock
On Thu, 2007-26-07 at 11:00 +0200, Ingo Molnar wrote: Subject: sched: make cpu_clock() not use the rq clock From: Ingo Molnar [EMAIL PROTECTED] it is enough to disable interrupts to get the precise rq-clock of the local CPU. Hi Ingo, Those new fast nanoseconds resolution clock APIs are nice but it seems to me that their naming and _where_ they are implemented in the tree is a little odd, IMO. We have: 1. sched_clock() is in kernel/sched.c (weak implementation) 2. sched_clock() is in arch/i386/kernel/tsc.c (architecture override) 3. rq_clock() is in kernel/sched.c 4. cpu_clock() is in kernel/sched.c I would suggest: 1. rename sched_clock() (remove sched_ as it's not sched specific anymore) and place it in kernel/time/... 2. rename the architecture specific version of it too This first function is the basic fast ns clock 3. base your rq_clock() on cpu_clock() (#4) or use the later directly. This is local to sched.c 4. move cpu_clock() in kernel/time/... This the per-cpu monotonic version. See my point? Base the scheduler clock from a general kernel API, not the other way around. Just a suggestion. Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 17:09 +1000, Nick Piggin wrote: > > A new list could be a possibility. One problem with adding lists is just > trying to work out how to balance scanning rates between them, another > problem is CPU overhead of moving pages from one to another... Disk sizes seem to increase more rapidly that the ability to read them quickly. Fortunately the processing power increase greatly too. It may be a good idea to spend more CPU cycles to better decide how the VM should juggle with this data. We've got to keep those multi-cores cpu busy. > but don't > let me stop you if you want to jump in and try something :) > Well I might try a few things along the way. But I prefer the thorough approach versus tinkering... - Read all research, check competition - Build test virtual machines, with benchmarks and typical workloads - Add (or use) some instrumentation to the pagecache - Code a simulator - Try all algorithms, tune them This is way overkill for a part-time hobby. If we don't see much work on this area it's surely because it's really not a problem anymore for most workloads. Database have their own cache management and disk scheduling, file servers just add more ram or processors, etc. > OK here is one which just changes the rate that the active and inactive > lists get scanned. Data corruption bugs should be minimal ;) > Will test. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 08:47 +0200, Mike Galbraith wrote: > Heh. Here we have a VM developer expressing his interest in the problem > space, and you offer him a steaming jug of STFU because he doesn't say > what you want to hear. I wonder how many killfiles you just entered. > Agreed. (a bit OT) People should understand that it's not (I think) about a desktop workload vs enterprise workloads war. I see it mostly as a progression versus regressions trade-off. And adding potentially useless or unmaintained code is a regression from the maintainers POV. The best way to justify a patch and have it integrated is to have a scientific testing method with repeatable numbers. Con has done so for his patch, his benchmark demonstrated good improvements. But I feel some of his supporters have indirectly harmed his cause by their comments. Also, the fact that Con recently stopped maintaining his work out of frustration also don't help having his patch merged. Again I'm not personally pushing this patch, I don't need it. Con has worked for many years on two area that still cause problems for desktop users: scheduler interactivity and pagecache trashing. Now that the scheduler has been fixed, let's have the VM fixed too. Sorry for the slightly OT post, and please don't start a flame war... - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 15:37 +1000, Nick Piggin wrote: > OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix > the updatedb problem very well, because if updatedb has caused swapout > then it has filled memory, and swap prefetch doesn't run unless there > is free memory (not to mention that updatedb would have paged out other > files as well). > > And drop behind doesn't fix your usual problem where you are downloading > from a server, because that is use-once write(2) data which is the > problem. And this readahead-based drop behind also doesn't help if data > you were reading happened to be a sequence of small files, or otherwise > not in good readahead order. > > Not to say that neither fix some problems, but for such conceptually > big changes, it should take a little more effort than a constructed test > case and no consideration of the alternatives to get it merged. Sorry for the confusion. For swap prefetch I should have said "some people claim that it fix their problem". I didn't want to hurt anybody feelings, some people are tired to hear others speak hypothetically about this patch, as it work-for-them (TM). I don't experience the problem. Can't help. For drop behind it fix half the problem. The read case is handled perfectly by Peter's patch. And the copy (read+write) is unchanged. My test case demonstrate it very easily, just look at the numbers. So, I agree with you that drop behind doesn't fix the write() case. Peter has said so himself when I offered to test his patch. As I do experience this problem, I have written a small test program and batch file to help push the patch for acceptance. I'm very willing to help improve the test cases, test patches and write code, time permitting. About this very subject, earlier this year this Andrew suggested me to came up with a test case to demonstrate my problem, well finally I've done so. http://lkml.org/lkml/2007/3/3/164 http://lkml.org/lkml/2007/3/3/166 Lastly, I would go as far to say that the use-once read then copy fix must also work with copies over NFS. I don't know if NFS change the workload on the client station versus the local case, and I don't know if it's still possible to consider data copied this way as use-once. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 15:19 +1000, Nick Piggin wrote: > What *I* think is supposed to happen is that newly read in pages get > put on the inactive list, and unless they get accessed againbefore > being reclaimed, they are allowed to fall off the end of the list > without disturbing active data too much. > > I think there is a missing piece here, that we used to ease the reclaim > pressure off the active list when the inactive list grows relatively > much larger than it (which could indicate a lot of use-once pages in > the system). Maybe a new list should be added to put newly read pages in it. If they are not used or used once after a certain period, they can be moved to the inactive list (or whatever). Newly read pages... - ... not used after this period are excessive readahead, we discard immediately. - ... used only once after this period, we discard soon. - ... used many/frequently are moved to active list. Surely the scan rate (do I make sense?) should be different for this newly-read list and the inactive list. I also remember your split mapped/unmapped active list patches from a while ago. Can someone point me to a up-to-date documentation about the Linux VM? The books and documents I've seen are outdated. > I think I've been banned from touching vmscan.c, but if you're keen to > try a patch, I might be convinced to come out of retirement :) I'm more than willing! Now that CFS is merged, redirect your energies from nicksched to nick-vm ;) Patches against any tree (stable, linus, mm, rt) are good. But I prefer the last stable release because it narrows down the possible problems that a moving target like the development tree may have. I test this on my main system, so patches with basic testing and reasonable stability are preferred. I just want to avoid data corruption bugs. FYI, I used to run the -rt tree most of the time. > One man's trash is another's treasure: some people will want the > files to remain in cache because they'll use them again (copy it > somewhere else, or start editing it after being copied or whatever). > > But yeah, we can probably do better at the sequential read/write > case. Sure, but there are many hints to detect this: *large* (> most of the RAM), *streaming*, *used once* But if a program mmap() a 3/4 of the RAM area and "play" in it, it's a good sign that the streaming code shouldn't be active. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 15:19 +1000, Nick Piggin wrote: What *I* think is supposed to happen is that newly read in pages get put on the inactive list, and unless they get accessed againbefore being reclaimed, they are allowed to fall off the end of the list without disturbing active data too much. I think there is a missing piece here, that we used to ease the reclaim pressure off the active list when the inactive list grows relatively much larger than it (which could indicate a lot of use-once pages in the system). Maybe a new list should be added to put newly read pages in it. If they are not used or used once after a certain period, they can be moved to the inactive list (or whatever). Newly read pages... - ... not used after this period are excessive readahead, we discard immediately. - ... used only once after this period, we discard soon. - ... used many/frequently are moved to active list. Surely the scan rate (do I make sense?) should be different for this newly-read list and the inactive list. I also remember your split mapped/unmapped active list patches from a while ago. Can someone point me to a up-to-date documentation about the Linux VM? The books and documents I've seen are outdated. I think I've been banned from touching vmscan.c, but if you're keen to try a patch, I might be convinced to come out of retirement :) I'm more than willing! Now that CFS is merged, redirect your energies from nicksched to nick-vm ;) Patches against any tree (stable, linus, mm, rt) are good. But I prefer the last stable release because it narrows down the possible problems that a moving target like the development tree may have. I test this on my main system, so patches with basic testing and reasonable stability are preferred. I just want to avoid data corruption bugs. FYI, I used to run the -rt tree most of the time. One man's trash is another's treasure: some people will want the files to remain in cache because they'll use them again (copy it somewhere else, or start editing it after being copied or whatever). But yeah, we can probably do better at the sequential read/write case. Sure, but there are many hints to detect this: *large* ( most of the RAM), *streaming*, *used once* But if a program mmap() a 3/4 of the RAM area and play in it, it's a good sign that the streaming code shouldn't be active. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 15:37 +1000, Nick Piggin wrote: OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix the updatedb problem very well, because if updatedb has caused swapout then it has filled memory, and swap prefetch doesn't run unless there is free memory (not to mention that updatedb would have paged out other files as well). And drop behind doesn't fix your usual problem where you are downloading from a server, because that is use-once write(2) data which is the problem. And this readahead-based drop behind also doesn't help if data you were reading happened to be a sequence of small files, or otherwise not in good readahead order. Not to say that neither fix some problems, but for such conceptually big changes, it should take a little more effort than a constructed test case and no consideration of the alternatives to get it merged. Sorry for the confusion. For swap prefetch I should have said some people claim that it fix their problem. I didn't want to hurt anybody feelings, some people are tired to hear others speak hypothetically about this patch, as it work-for-them (TM). I don't experience the problem. Can't help. For drop behind it fix half the problem. The read case is handled perfectly by Peter's patch. And the copy (read+write) is unchanged. My test case demonstrate it very easily, just look at the numbers. So, I agree with you that drop behind doesn't fix the write() case. Peter has said so himself when I offered to test his patch. As I do experience this problem, I have written a small test program and batch file to help push the patch for acceptance. I'm very willing to help improve the test cases, test patches and write code, time permitting. About this very subject, earlier this year this Andrew suggested me to came up with a test case to demonstrate my problem, well finally I've done so. http://lkml.org/lkml/2007/3/3/164 http://lkml.org/lkml/2007/3/3/166 Lastly, I would go as far to say that the use-once read then copy fix must also work with copies over NFS. I don't know if NFS change the workload on the client station versus the local case, and I don't know if it's still possible to consider data copied this way as use-once. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ck] Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 08:47 +0200, Mike Galbraith wrote: Heh. Here we have a VM developer expressing his interest in the problem space, and you offer him a steaming jug of STFU because he doesn't say what you want to hear. I wonder how many killfiles you just entered. Agreed. (a bit OT) People should understand that it's not (I think) about a desktop workload vs enterprise workloads war. I see it mostly as a progression versus regressions trade-off. And adding potentially useless or unmaintained code is a regression from the maintainers POV. The best way to justify a patch and have it integrated is to have a scientific testing method with repeatable numbers. Con has done so for his patch, his benchmark demonstrated good improvements. But I feel some of his supporters have indirectly harmed his cause by their comments. Also, the fact that Con recently stopped maintaining his work out of frustration also don't help having his patch merged. Again I'm not personally pushing this patch, I don't need it. Con has worked for many years on two area that still cause problems for desktop users: scheduler interactivity and pagecache trashing. Now that the scheduler has been fixed, let's have the VM fixed too. Sorry for the slightly OT post, and please don't start a flame war... - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Wed, 2007-25-07 at 17:09 +1000, Nick Piggin wrote: A new list could be a possibility. One problem with adding lists is just trying to work out how to balance scanning rates between them, another problem is CPU overhead of moving pages from one to another... Disk sizes seem to increase more rapidly that the ability to read them quickly. Fortunately the processing power increase greatly too. It may be a good idea to spend more CPU cycles to better decide how the VM should juggle with this data. We've got to keep those multi-cores cpu busy. but don't let me stop you if you want to jump in and try something :) Well I might try a few things along the way. But I prefer the thorough approach versus tinkering... - Read all research, check competition - Build test virtual machines, with benchmarks and typical workloads - Add (or use) some instrumentation to the pagecache - Code a simulator - Try all algorithms, tune them This is way overkill for a part-time hobby. If we don't see much work on this area it's surely because it's really not a problem anymore for most workloads. Database have their own cache management and disk scheduling, file servers just add more ram or processors, etc. OK here is one which just changes the rate that the active and inactive lists get scanned. Data corruption bugs should be minimal ;) Will test. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote: > It certainly doesn't run for me ever. Always kind of a "that's not the > point" comment but I just keep wondering whenever I see anyone complain > about updatedb why the _hell_ they are running it in the first place. If > anyone who never uses "locate" for anything simply disable updatedb, the > problem will for a large part be solved. > > This not just meant as a cheap comment; while I can think of a few similar > loads even on the desktop (scanning a browser cache, a media player indexing > a large amount of media files, ...) I've never heard of problems _other_ > than updatedb. So just junk that crap and be happy. >From my POV there's two different problems discussed recently: - updatedb type of workloads that add tons of inodes and dentries in the slab caches which of course use the pagecache. - streaming large files (read or copying) that fill the pagecache with useless used-once data swap prefetch fix the first case, drop-behind fix the second case. Both have the same symptoms but the cause is different. Personally updatedb doesn't really hurt me. But I don't have that many files on my desktop. I've tried the swap prefetch patch in the past and it was not so noticeable for me. (I don't doubt it's helpful for others) But every time I read or copy a large file around (usually from a server) the slowdown is noticeable for some moments. I just wanted to point this out, if it wasn't clean enough for everyone. I hope both problems get fixed. Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote: > I don't like this kind of conditional information going from something > like readahead into page reclaim. Unless it is for readahead _specific_ > data such as "I got these all wrong, so you can reclaim them" (which > this isn't). > > But I don't like it as a use-once thing. The VM should be able to get > that right. > Question: How work the use-once code in the current kernel? Is there any? I doesn't quite work for me... See my previous email today, I've done a small test case to demonstrate the problem and the effectiveness of Peter's patch. The only piece missing is the copy case (read once + write once). Regardless of how it's implemented, I think a similar mechanism must be added. This is a long standing issue. In the end, I think it's a pagecache resources allocation problem. the VM lacks fair-share limits between processes. The kernel doesn't have enough information to make the right decisions. You can refine or use more advanced page reclaim, but some fair-share splitting (like the CPU scheduler) between the processes must be present. Of course some process should have large or unlimited VM limits, like databases. Maybe the "containers" patchset and memory controller can help. With some specific configuration and/or a userspace daemon to adjust the limits on the fly. Independently, the basic large file streaming read (or copy) once cases should not trash the pagecache. Can we agree on that? I say, let's add some code to fix the problem. If we hear about any regression in some workloads, we can add a tunable to limit or disable its effects, _if_ a better compromised solution cannot be found. Surely it's possible to have a acceptable solution. Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote: > Use the read-ahead code to provide hints to page reclaim. > > This patch has the potential to solve the streaming-IO trashes my > desktop problem. > > It tries to aggressively reclaim pages that were loaded in a strong > sequential pattern and have been consumed. Thereby limiting the damage > to the current resident set. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> (sorry for the delay) Ok, I've done some tests with your patches, I came up with a test program that should approximate my use case. It simply mmap() and scan (read) a 375M file which represent the usual used memory on my desktop system. This data is frequently used, and should stay cached as much as possible in preference over the "used once" data read in the page cache when copying large files. I don't claim that the test program is perfect or even correct, I'm open for suggestions. Test system: - Linux x86_64 2.6.23-rc1 - 1G of RAM - I use the basic drop behind and sysctl patches. The readahead size patch is _not_ included. Setting up: dd if=/dev/zero of=/tmp/375M_file bs=1M count=375 dd if=/dev/zero of=/tmp/5G_file bs=1M count=5120 Tests with stock kernel (drop behind disabled): echo 0 >/proc/sys/vm/drop_behind Base test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.146s 2nd execution: 0m1.119s 3rd execution: 0m1.109s 4th execution: 0m1.105s Reading a large file test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /dev/null time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.224s 2nd execution: 0m1.114s 3rd execution: 0m7.178s <<< Much slower 4th execution: 0m1.115s Copying (read+write) a large file test: sync; echo 1 >/proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /tmp/copy_of_5G_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file rm /tmp/copy_of_5G_file 1st execution: 0m7.203s 2nd execution: 0m1.147s 3rd execution: 0m7.238s <<< Much slower 4th execution: 0m1.129s Tests with drop behind enabled: echo 1 >/proc/sys/vm/drop_behind Base test: [same tests as above] 1st execution: 0m7.206s 2nd execution: 0m1.110s 3rd execution: 0m1.102s 4th execution: 0m1.106s Reading a large file test: [same tests as above] 1st execution: 0m7.197s 2nd execution: 0m1.116s 3rd execution: 0m1.114s <<< Great!!! 4th execution: 0m1.111s Copying (read+write) a large file test: [same tests as above] 1st execution: 0m7.186s 2nd execution: 0m1.111s 3rd execution: 0m7.339s <<< Not fixed 4th execution: 0m1.121s Conclusion: - The drop-behind patch works and really prevents the page cache content from being fulled with useless read-once data. - It doesn't help the copy (read+write) case. This should also be fixed, as it's a common workload. Tested-By: Eric St-Laurent ([EMAIL PROTECTED]) Best regards, - Eric (*) Test program and batch file are attached. diff -urN linux-2.6/include/linux/swap.h linux-2.6-drop-behind/include/linux/swap.h --- linux-2.6/include/linux/swap.h 2007-07-21 18:26:00.0 -0400 +++ linux-2.6-drop-behind/include/linux/swap.h 2007-07-22 16:22:48.0 -0400 @@ -180,6 +180,7 @@ /* linux/mm/swap.c */ extern void FASTCALL(lru_cache_add(struct page *)); extern void FASTCALL(lru_cache_add_active(struct page *)); +extern void FASTCALL(lru_demote(struct page *)); extern void FASTCALL(activate_page(struct page *)); extern void FASTCALL(mark_page_accessed(struct page *)); extern void lru_add_drain(void); diff -urN linux-2.6/kernel/sysctl.c linux-2.6-drop-behind/kernel/sysctl.c --- linux-2.6/kernel/sysctl.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/kernel/sysctl.c 2007-07-22 16:20:27.0 -0400 @@ -163,6 +163,7 @@ extern int prove_locking; extern int lock_stat; +extern int sysctl_dropbehind; /* The default sysctl tables: */ @@ -1048,6 +1049,14 @@ .extra1 = , }, #endif + { + .ctl_name = CTL_UNNUMBERED, + .procname = "drop_behind", + .data = _dropbehind, + .maxlen = sizeof(sysctl_dropbehind), + .mode = 0644, + .proc_handler = _dointvec, + }, /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt diff -urN linux-2.6/mm/readahead.c linux-2.6-drop-behind/mm/readahead.c --- linux-2.6/mm/readahead.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/mm/readahead.c 2007-07-22 16:41:47.0 -0400 @@ -15,6 +15,7 @@ #include #include #include +#include void default
Re: [PATCH 1/3] readahead: drop behind
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote: Use the read-ahead code to provide hints to page reclaim. This patch has the potential to solve the streaming-IO trashes my desktop problem. It tries to aggressively reclaim pages that were loaded in a strong sequential pattern and have been consumed. Thereby limiting the damage to the current resident set. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] (sorry for the delay) Ok, I've done some tests with your patches, I came up with a test program that should approximate my use case. It simply mmap() and scan (read) a 375M file which represent the usual used memory on my desktop system. This data is frequently used, and should stay cached as much as possible in preference over the used once data read in the page cache when copying large files. I don't claim that the test program is perfect or even correct, I'm open for suggestions. Test system: - Linux x86_64 2.6.23-rc1 - 1G of RAM - I use the basic drop behind and sysctl patches. The readahead size patch is _not_ included. Setting up: dd if=/dev/zero of=/tmp/375M_file bs=1M count=375 dd if=/dev/zero of=/tmp/5G_file bs=1M count=5120 Tests with stock kernel (drop behind disabled): echo 0 /proc/sys/vm/drop_behind Base test: sync; echo 1 /proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.146s 2nd execution: 0m1.119s 3rd execution: 0m1.109s 4th execution: 0m1.105s Reading a large file test: sync; echo 1 /proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /dev/null time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file 1st execution: 0m7.224s 2nd execution: 0m1.114s 3rd execution: 0m7.178s Much slower 4th execution: 0m1.115s Copying (read+write) a large file test: sync; echo 1 /proc/sys/vm/drop_caches time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file cp /tmp/5G_file /tmp/copy_of_5G_file time ./large_app_load_simul /tmp/375M_file time ./large_app_load_simul /tmp/375M_file rm /tmp/copy_of_5G_file 1st execution: 0m7.203s 2nd execution: 0m1.147s 3rd execution: 0m7.238s Much slower 4th execution: 0m1.129s Tests with drop behind enabled: echo 1 /proc/sys/vm/drop_behind Base test: [same tests as above] 1st execution: 0m7.206s 2nd execution: 0m1.110s 3rd execution: 0m1.102s 4th execution: 0m1.106s Reading a large file test: [same tests as above] 1st execution: 0m7.197s 2nd execution: 0m1.116s 3rd execution: 0m1.114s Great!!! 4th execution: 0m1.111s Copying (read+write) a large file test: [same tests as above] 1st execution: 0m7.186s 2nd execution: 0m1.111s 3rd execution: 0m7.339s Not fixed 4th execution: 0m1.121s Conclusion: - The drop-behind patch works and really prevents the page cache content from being fulled with useless read-once data. - It doesn't help the copy (read+write) case. This should also be fixed, as it's a common workload. Tested-By: Eric St-Laurent ([EMAIL PROTECTED]) Best regards, - Eric (*) Test program and batch file are attached. diff -urN linux-2.6/include/linux/swap.h linux-2.6-drop-behind/include/linux/swap.h --- linux-2.6/include/linux/swap.h 2007-07-21 18:26:00.0 -0400 +++ linux-2.6-drop-behind/include/linux/swap.h 2007-07-22 16:22:48.0 -0400 @@ -180,6 +180,7 @@ /* linux/mm/swap.c */ extern void FASTCALL(lru_cache_add(struct page *)); extern void FASTCALL(lru_cache_add_active(struct page *)); +extern void FASTCALL(lru_demote(struct page *)); extern void FASTCALL(activate_page(struct page *)); extern void FASTCALL(mark_page_accessed(struct page *)); extern void lru_add_drain(void); diff -urN linux-2.6/kernel/sysctl.c linux-2.6-drop-behind/kernel/sysctl.c --- linux-2.6/kernel/sysctl.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/kernel/sysctl.c 2007-07-22 16:20:27.0 -0400 @@ -163,6 +163,7 @@ extern int prove_locking; extern int lock_stat; +extern int sysctl_dropbehind; /* The default sysctl tables: */ @@ -1048,6 +1049,14 @@ .extra1 = zero, }, #endif + { + .ctl_name = CTL_UNNUMBERED, + .procname = drop_behind, + .data = sysctl_dropbehind, + .maxlen = sizeof(sysctl_dropbehind), + .mode = 0644, + .proc_handler = proc_dointvec, + }, /* * NOTE: do not add new entries to this table unless you have read * Documentation/sysctl/ctl_unnumbered.txt diff -urN linux-2.6/mm/readahead.c linux-2.6-drop-behind/mm/readahead.c --- linux-2.6/mm/readahead.c 2007-07-21 18:26:01.0 -0400 +++ linux-2.6-drop-behind/mm/readahead.c 2007-07-22 16:41:47.0 -0400 @@ -15,6 +15,7 @@ #include linux/backing-dev.h #include linux/task_io_accounting_ops.h #include linux/pagevec.h +#include linux/swap.h void default_unplug_io_fn(struct backing_dev_info *bdi, struct
Re: [PATCH 0/3] readahead drop behind and size adjustment
On Mon, 2007-23-07 at 19:00 +1000, Nick Piggin wrote: I don't like this kind of conditional information going from something like readahead into page reclaim. Unless it is for readahead _specific_ data such as I got these all wrong, so you can reclaim them (which this isn't). But I don't like it as a use-once thing. The VM should be able to get that right. Question: How work the use-once code in the current kernel? Is there any? I doesn't quite work for me... See my previous email today, I've done a small test case to demonstrate the problem and the effectiveness of Peter's patch. The only piece missing is the copy case (read once + write once). Regardless of how it's implemented, I think a similar mechanism must be added. This is a long standing issue. In the end, I think it's a pagecache resources allocation problem. the VM lacks fair-share limits between processes. The kernel doesn't have enough information to make the right decisions. You can refine or use more advanced page reclaim, but some fair-share splitting (like the CPU scheduler) between the processes must be present. Of course some process should have large or unlimited VM limits, like databases. Maybe the containers patchset and memory controller can help. With some specific configuration and/or a userspace daemon to adjust the limits on the fly. Independently, the basic large file streaming read (or copy) once cases should not trash the pagecache. Can we agree on that? I say, let's add some code to fix the problem. If we hear about any regression in some workloads, we can add a tunable to limit or disable its effects, _if_ a better compromised solution cannot be found. Surely it's possible to have a acceptable solution. Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: -mm merge plans for 2.6.23
On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote: It certainly doesn't run for me ever. Always kind of a that's not the point comment but I just keep wondering whenever I see anyone complain about updatedb why the _hell_ they are running it in the first place. If anyone who never uses locate for anything simply disable updatedb, the problem will for a large part be solved. This not just meant as a cheap comment; while I can think of a few similar loads even on the desktop (scanning a browser cache, a media player indexing a large amount of media files, ...) I've never heard of problems _other_ than updatedb. So just junk that crap and be happy. From my POV there's two different problems discussed recently: - updatedb type of workloads that add tons of inodes and dentries in the slab caches which of course use the pagecache. - streaming large files (read or copying) that fill the pagecache with useless used-once data swap prefetch fix the first case, drop-behind fix the second case. Both have the same symptoms but the cause is different. Personally updatedb doesn't really hurt me. But I don't have that many files on my desktop. I've tried the swap prefetch patch in the past and it was not so noticeable for me. (I don't doubt it's helpful for others) But every time I read or copy a large file around (usually from a server) the slowdown is noticeable for some moments. I just wanted to point this out, if it wasn't clean enough for everyone. I hope both problems get fixed. Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote: > plain text document attachment (readahead-useonce.patch) > Use the read-ahead code to provide hints to page reclaim. > > This patch has the potential to solve the streaming-IO trashes my > desktop problem. > > It tries to aggressively reclaim pages that were loaded in a strong > sequential pattern and have been consumed. Thereby limiting the damage > to the current resident set. > > Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> With the fadvise change, it looks like the right solution to me. The patches are for which kernel? They doesn't apply cleanly to 2.6.22.1. It would be useful to have a temporary /proc tunable to enable/disable the heuristic to help test the effects. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
> They are against git of a few hours ago and the latest readahead patches > from Wu (which don't apply cleanly either, but the rejects are trivial). > > > It would be useful to have a temporary /proc tunable to enable/disable > > the heuristic to help test the effects. > > Right, I had such a patch somewhere,.. won't apply cleanly but should be > obvious.. Thanks, I will merge theses and report back with some results. After copying large files, I find my system sluggish. I hope your changes will help. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
They are against git of a few hours ago and the latest readahead patches from Wu (which don't apply cleanly either, but the rejects are trivial). It would be useful to have a temporary /proc tunable to enable/disable the heuristic to help test the effects. Right, I had such a patch somewhere,.. won't apply cleanly but should be obvious.. Thanks, I will merge theses and report back with some results. After copying large files, I find my system sluggish. I hope your changes will help. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] readahead: drop behind
On Sat, 2007-21-07 at 23:00 +0200, Peter Zijlstra wrote: plain text document attachment (readahead-useonce.patch) Use the read-ahead code to provide hints to page reclaim. This patch has the potential to solve the streaming-IO trashes my desktop problem. It tries to aggressively reclaim pages that were loaded in a strong sequential pattern and have been consumed. Thereby limiting the damage to the current resident set. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] With the fadvise change, it looks like the right solution to me. The patches are for which kernel? They doesn't apply cleanly to 2.6.22.1. It would be useful to have a temporary /proc tunable to enable/disable the heuristic to help test the effects. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: v2.6.21.4-rt11
On Tue, 2007-12-06 at 06:00 -0700, Pallipadi, Venkatesh wrote: > > >-Original Message- > Yes. Force_hpet part is should have worked.. > Eric: Can you send me the output of 'lspci -n on your system. > We need to double check we are covering all ICH7 ids. Here it is: 00:00.0 0600: 8086:2770 (rev 02) 00:02.0 0300: 8086:2772 (rev 02) 00:1b.0 0403: 8086:27d8 (rev 01) 00:1c.0 0604: 8086:27d0 (rev 01) 00:1c.1 0604: 8086:27d2 (rev 01) 00:1d.0 0c03: 8086:27c8 (rev 01) 00:1d.1 0c03: 8086:27c9 (rev 01) 00:1d.2 0c03: 8086:27ca (rev 01) 00:1d.3 0c03: 8086:27cb (rev 01) 00:1d.7 0c03: 8086:27cc (rev 01) 00:1e.0 0604: 8086:244e (rev e1) 00:1f.0 0601: 8086:27b8 (rev 01) 00:1f.1 0101: 8086:27df (rev 01) 00:1f.2 0101: 8086:27c0 (rev 01) 00:1f.3 0c05: 8086:27da (rev 01) 01:0a.0 0604: 3388:0021 (rev 11) 02:0c.0 0c03: 1033:0035 (rev 41) 02:0c.1 0c03: 1033:0035 (rev 41) 02:0c.2 0c03: 1033:00e0 (rev 02) 02:0d.0 0c00: 1106:3044 (rev 46) 03:00.0 0200: 8086:109a Adding the id for PCI_DEVICE_ID_INTEL_ICH7_0 (27b8) should do the trick. I've patched my kernel and was ready to test it, but in the meantime I did a BIOS upgrade (bad idea...) and with the new version the HPET timer is detected via ACPI. Unfortunately it seems that downgrading the BIOS is a lot more trouble than upgrading it. So I cannot easily test the force enable anymore. Anyway it works now. Here is my patch if it's any use to you: diff -uprN linux-2.6.21.4.orig/arch/i386/kernel/quirks.c linux-2.6.21.4/arch/i386/kernel/quirks.c --- linux-2.6.21.4.orig/arch/i386/kernel/quirks.c Tue Jun 12 10:03:18 2007 +++ linux-2.6.21.4/arch/i386/kernel/quirks.cTue Jun 12 10:08:02 2007 @@ -149,6 +149,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_0, + ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: v2.6.21.4-rt11
On Sat, 2007-09-06 at 23:05 +0200, Ingo Molnar wrote: > i'm pleased to announce the v2.6.21.4-rt11 kernel, which can be > downloaded from the usual place: > I'm running 2.6.21.4-rt12-cfs-v17 (x86_64), so far no problems. I like this kernel a lot, it's feels quite smooth. One little thing, no HPET timer is detected. By looking at the patch, even the force detect code is there, it should work. The hpet timer is not available as a clocksource and only one hpet related message is present in dmesg: PM: Adding info for No Bus:hpet This is on a Asus P5LD2-VM motherboard (ICH7) Relevant config bits: CONFIG_HPET_TIMER=y # CONFIG_HPET_EMULATE_RTC is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_HPET_MMAP=y Should I enable one of the two other options? Any ideas? Best regards, - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: v2.6.21.4-rt11
On Sat, 2007-09-06 at 23:05 +0200, Ingo Molnar wrote: i'm pleased to announce the v2.6.21.4-rt11 kernel, which can be downloaded from the usual place: I'm running 2.6.21.4-rt12-cfs-v17 (x86_64), so far no problems. I like this kernel a lot, it's feels quite smooth. One little thing, no HPET timer is detected. By looking at the patch, even the force detect code is there, it should work. The hpet timer is not available as a clocksource and only one hpet related message is present in dmesg: PM: Adding info for No Bus:hpet This is on a Asus P5LD2-VM motherboard (ICH7) Relevant config bits: CONFIG_HPET_TIMER=y # CONFIG_HPET_EMULATE_RTC is not set CONFIG_HPET=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_HPET_MMAP=y Should I enable one of the two other options? Any ideas? Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: v2.6.21.4-rt11
On Tue, 2007-12-06 at 06:00 -0700, Pallipadi, Venkatesh wrote: -Original Message- Yes. Force_hpet part is should have worked.. Eric: Can you send me the output of 'lspci -n on your system. We need to double check we are covering all ICH7 ids. Here it is: 00:00.0 0600: 8086:2770 (rev 02) 00:02.0 0300: 8086:2772 (rev 02) 00:1b.0 0403: 8086:27d8 (rev 01) 00:1c.0 0604: 8086:27d0 (rev 01) 00:1c.1 0604: 8086:27d2 (rev 01) 00:1d.0 0c03: 8086:27c8 (rev 01) 00:1d.1 0c03: 8086:27c9 (rev 01) 00:1d.2 0c03: 8086:27ca (rev 01) 00:1d.3 0c03: 8086:27cb (rev 01) 00:1d.7 0c03: 8086:27cc (rev 01) 00:1e.0 0604: 8086:244e (rev e1) 00:1f.0 0601: 8086:27b8 (rev 01) 00:1f.1 0101: 8086:27df (rev 01) 00:1f.2 0101: 8086:27c0 (rev 01) 00:1f.3 0c05: 8086:27da (rev 01) 01:0a.0 0604: 3388:0021 (rev 11) 02:0c.0 0c03: 1033:0035 (rev 41) 02:0c.1 0c03: 1033:0035 (rev 41) 02:0c.2 0c03: 1033:00e0 (rev 02) 02:0d.0 0c00: 1106:3044 (rev 46) 03:00.0 0200: 8086:109a Adding the id for PCI_DEVICE_ID_INTEL_ICH7_0 (27b8) should do the trick. I've patched my kernel and was ready to test it, but in the meantime I did a BIOS upgrade (bad idea...) and with the new version the HPET timer is detected via ACPI. Unfortunately it seems that downgrading the BIOS is a lot more trouble than upgrading it. So I cannot easily test the force enable anymore. Anyway it works now. Here is my patch if it's any use to you: diff -uprN linux-2.6.21.4.orig/arch/i386/kernel/quirks.c linux-2.6.21.4/arch/i386/kernel/quirks.c --- linux-2.6.21.4.orig/arch/i386/kernel/quirks.c Tue Jun 12 10:03:18 2007 +++ linux-2.6.21.4/arch/i386/kernel/quirks.cTue Jun 12 10:08:02 2007 @@ -149,6 +149,8 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_I ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH6_1, ich_force_enable_hpet); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_0, + ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_1, ich_force_enable_hpet); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ICH7_31, Best regards, - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Tue, 2007-20-03 at 10:15 +0100, Arjan van de Ven wrote: > disabling that is a BAD idea. I'm no fan of SMM myself, but it's there, > and we have to live with it. Disabling it without knowing what it does > on your system is madness. > Like Lee said, for "debugging", mainly trying to resolve unexplained long latencies. I've had a laptop that caused latency spikes with the cpu fan was turn on. I tried disabling SMI to diagnose the problem with no success. My current system has a BIOS feature to control fans speed according to temperature. I presume this must a SMI to work right? In this case it should be possible to find and disable the related SMI and replace the fan control with a user space software. Of course it's not wise to blindly disable SMIs as we don't precisely know what they do. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Tue, 2007-20-03 at 10:15 +0100, Arjan van de Ven wrote: disabling that is a BAD idea. I'm no fan of SMM myself, but it's there, and we have to live with it. Disabling it without knowing what it does on your system is madness. Like Lee said, for debugging, mainly trying to resolve unexplained long latencies. I've had a laptop that caused latency spikes with the cpu fan was turn on. I tried disabling SMI to diagnose the problem with no success. My current system has a BIOS feature to control fans speed according to temperature. I presume this must a SMI to work right? In this case it should be possible to find and disable the related SMI and replace the fan control with a user space software. Of course it's not wise to blindly disable SMIs as we don't precisely know what they do. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Tue, 2007-20-03 at 01:04 -0400, Lee Revell wrote: > I think CONFIG_TRY_TO_DISABLE_SMI would be excellent for debugging, > not to mention people trying to spec out hardware for RT > applications... There is a SMI disabling module in RTAI, check the smi-module.c in this: https://www.rtai.org/RTAI/rtai-3.5.tar.bz2 More infos: http://www.captain.at/rtai-smi-high-latency.php http://www.captain.at/xenomai-smi-high-latency.php It might make sense to merge this code, at least in the -rt tree. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUG] 2.6.21-rc1,2,3 regressions on my system that I found so far
On Tue, 2007-20-03 at 01:04 -0400, Lee Revell wrote: I think CONFIG_TRY_TO_DISABLE_SMI would be excellent for debugging, not to mention people trying to spec out hardware for RT applications... There is a SMI disabling module in RTAI, check the smi-module.c in this: https://www.rtai.org/RTAI/rtai-3.5.tar.bz2 More infos: http://www.captain.at/rtai-smi-high-latency.php http://www.captain.at/xenomai-smi-high-latency.php It might make sense to merge this code, at least in the -rt tree. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: userspace pagecache management tool
On Sat, 2007-03-03 at 12:29 -0800, Andrew Morton wrote: > There is much more which could be done to make this code smarter, but I > think the lesson here is that we can produce a far, far better result doing > this work in userspace than we could ever hope to do with an in-kernel > implementation. There are some enhancement suggestions in the > documentation file. While I think that more user space applications should use fadvise() to avoid polluting the page cache with unneeded data, I still think the kernel should be more fair in regard to page cache management. Personally, I've experienced some sluggish performance after copying large files around. Even more when using NFS. It's difficult to file a bug report for "interactive feel", I don't know how to measure it. I just feel it's a weak aspect of the OS. Surely it's possible to make the kernel a little bit better to protect the page cache from abuse, from simple or badly designed applications. Why fairness is provided by the process scheduler with good results, yet it somewhat easy for a process to cause slowdowns from page cache usage. My personal opinion is that the VM seem tuned for database types workloads. Of course, making the page cache more fair to prevent one process to use most of it will most likely slowdown database type applications. Maybe the situation should be reversed, much like the process scheduler. Fairness by default, and the possibility to request for more system resources by asking for them with necessary privileges. Much like SCHED_FIFO policy. - Eric - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: userspace pagecache management tool
On Sat, 2007-03-03 at 12:29 -0800, Andrew Morton wrote: There is much more which could be done to make this code smarter, but I think the lesson here is that we can produce a far, far better result doing this work in userspace than we could ever hope to do with an in-kernel implementation. There are some enhancement suggestions in the documentation file. While I think that more user space applications should use fadvise() to avoid polluting the page cache with unneeded data, I still think the kernel should be more fair in regard to page cache management. Personally, I've experienced some sluggish performance after copying large files around. Even more when using NFS. It's difficult to file a bug report for interactive feel, I don't know how to measure it. I just feel it's a weak aspect of the OS. Surely it's possible to make the kernel a little bit better to protect the page cache from abuse, from simple or badly designed applications. Why fairness is provided by the process scheduler with good results, yet it somewhat easy for a process to cause slowdowns from page cache usage. My personal opinion is that the VM seem tuned for database types workloads. Of course, making the page cache more fair to prevent one process to use most of it will most likely slowdown database type applications. Maybe the situation should be reversed, much like the process scheduler. Fairness by default, and the possibility to request for more system resources by asking for them with necessary privileges. Much like SCHED_FIFO policy. - Eric - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Fri, 2005-07-15 at 12:58 -0700, Stephen Pollei wrote: > But If I understand Linus's points he wants jiffies to remain a memory > fetch, and make sure it doesn't turn into a singing dancing christmas > tree. It seems it relatively easy to support dynamic tick, the ARM architecture has it. But with the numerous users of jiffies through the code, it seems to me that it's hard to ensure that everyone of them will continue to work correctly if the jiffies_increment is changed during runtime. As Linus noted, the current tick code is flexible and powerful, but it can be hard to get it right in all case. WinCE developers have similar problems/concerns: http://blogs.msdn.com/ce_base/archive/2005/06/08/426762.aspx With the previous cleanup like time_after()/time_before(), msleep() and friends, unit conversion helpers, etc. it's a step in the right direction. I just wanted to point out that while it's good to preserve the current efficient tick implementation, it may be worthwhile to add a relative timeout API like Alan Cox proposed a year ago to better hide the implementation details. - Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Fri, 2005-07-15 at 12:58 -0700, Stephen Pollei wrote: But If I understand Linus's points he wants jiffies to remain a memory fetch, and make sure it doesn't turn into a singing dancing christmas tree. It seems it relatively easy to support dynamic tick, the ARM architecture has it. But with the numerous users of jiffies through the code, it seems to me that it's hard to ensure that everyone of them will continue to work correctly if the jiffies_increment is changed during runtime. As Linus noted, the current tick code is flexible and powerful, but it can be hard to get it right in all case. WinCE developers have similar problems/concerns: http://blogs.msdn.com/ce_base/archive/2005/06/08/426762.aspx With the previous cleanup like time_after()/time_before(), msleep() and friends, unit conversion helpers, etc. it's a step in the right direction. I just wanted to point out that while it's good to preserve the current efficient tick implementation, it may be worthwhile to add a relative timeout API like Alan Cox proposed a year ago to better hide the implementation details. - Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Thu, 2005-07-14 at 17:24 -0700, Linus Torvalds wrote: > > On Thu, 14 Jul 2005, Lee Revell wrote: > > Trust me. When I say that the right thing to do is to just have a fixed > (but high) HZ value, and just changing the timer rate, I'm -right-. > > I'm always right. This time I'm just even more right than usual. Of course you are, jiffies are simple and efficient. But it may be worthwhile to provide better/simpler API for relative timeouts and also better hide the implementation details of the tick system. If i sum-up the discussion from my POV: - use a 32-bit tick counter on 32-bit platforms and use a 64-bit counter on 64-bit platforms - keep the constant HZ=1000 (mS resolution) on 32-bit platforms - remove the assumption that timer interrupts and jiffies are 1:1 thing (jiffies may be incremented by >1 ticks at timer interrupt) - determine jiffies_increment at boot - have a slow clock mode to help power management (adjust jiffies_increment by the slowdown factor) - it may be useful to bump up HZ to 1e6 (uS res.) or 1e9 (nS res.) on 64-bit platforms, if there are benefits such as better accuracy during time units conversions or if a higher frequency timer hardware is available/viable. - it may be also useful to bump HZ on -RT (Real-time) kernels, or with -HRT (High-resolution timers support). Users of those kernel are willing to pay the cost of the overhead to have better resolution - avoid direct usage of the jiffies variable, instead use jiffies() (inline or MACRO), IMO monotonic_clock() would be a better name - provide a relative timeout API (see my previous post, or Alan's suggestions) - remove most of the direct use of jiffies through the code and replace them with msleep(), relative timer, etc - use human units for those APIs - Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Thu, 2005-07-14 at 23:37 +0100, Alan Cox wrote: > In actual fact you also want to fix users of > > while(time_before(foo, jiffies)) { whack(mole); } > > to become > > init_timeout(); > timeout.expires = jiffies + n > add_timeout(); > while(!timeout_expired()) {} > > Which is a trivial wrapper around timers as we have them now Or something like this: struct timeout_timer { unsigned long expires; }; static inline void timeout_set(struct timeout_timer *timer, unsigned int msecs) { timer->expires = jiffies + msecs_to_jiffies(msecs); } static inline int timeout_expired(struct timeout_timer *timer) { return (time_after(jiffies, timer->expires)); } It provides a nice API for relative timeouts without adding overhead. - Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Thu, 2005-07-14 at 23:37 +0100, Alan Cox wrote: In actual fact you also want to fix users of while(time_before(foo, jiffies)) { whack(mole); } to become init_timeout(timeout); timeout.expires = jiffies + n add_timeout(timeout); while(!timeout_expired(timeout)) {} Which is a trivial wrapper around timers as we have them now Or something like this: struct timeout_timer { unsigned long expires; }; static inline void timeout_set(struct timeout_timer *timer, unsigned int msecs) { timer-expires = jiffies + msecs_to_jiffies(msecs); } static inline int timeout_expired(struct timeout_timer *timer) { return (time_after(jiffies, timer-expires)); } It provides a nice API for relative timeouts without adding overhead. - Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Thu, 2005-07-14 at 17:24 -0700, Linus Torvalds wrote: On Thu, 14 Jul 2005, Lee Revell wrote: Trust me. When I say that the right thing to do is to just have a fixed (but high) HZ value, and just changing the timer rate, I'm -right-. I'm always right. This time I'm just even more right than usual. Of course you are, jiffies are simple and efficient. But it may be worthwhile to provide better/simpler API for relative timeouts and also better hide the implementation details of the tick system. If i sum-up the discussion from my POV: - use a 32-bit tick counter on 32-bit platforms and use a 64-bit counter on 64-bit platforms - keep the constant HZ=1000 (mS resolution) on 32-bit platforms - remove the assumption that timer interrupts and jiffies are 1:1 thing (jiffies may be incremented by 1 ticks at timer interrupt) - determine jiffies_increment at boot - have a slow clock mode to help power management (adjust jiffies_increment by the slowdown factor) - it may be useful to bump up HZ to 1e6 (uS res.) or 1e9 (nS res.) on 64-bit platforms, if there are benefits such as better accuracy during time units conversions or if a higher frequency timer hardware is available/viable. - it may be also useful to bump HZ on -RT (Real-time) kernels, or with -HRT (High-resolution timers support). Users of those kernel are willing to pay the cost of the overhead to have better resolution - avoid direct usage of the jiffies variable, instead use jiffies() (inline or MACRO), IMO monotonic_clock() would be a better name - provide a relative timeout API (see my previous post, or Alan's suggestions) - remove most of the direct use of jiffies through the code and replace them with msleep(), relative timer, etc - use human units for those APIs - Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Mon, 2005-07-11 at 16:08 +0200, Arjan van de Ven wrote: > Alan: you worked on this before, where did you end up with ? > The last patch i've seen is 1 year old. http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0643.html Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On Mon, 2005-07-11 at 16:08 +0200, Arjan van de Ven wrote: Alan: you worked on this before, where did you end up with ? The last patch i've seen is 1 year old. http://www.ussg.iu.edu/hypermail/linux/kernel/0407.3/0643.html Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Dynamic tick, version 050127-1
On Tue, 2005-02-01 at 15:20 -0500, Lee Revell wrote: > I was wondering how Windows handles high res timers, if at all. The > reason I ask is because I have been reverse engineering a Windows ASIO > driver, and I find that if the latency is set below about 5ms, by By default, Windows "multimedia" timers have 10ms resolution (this depends on the exact version of Windows used...). You can call the timeBeginPeriod() function to lower the resolution to 1ms. This resolution seem related to the task scheduler timeslice. After you call this function, the Sleep() call has also a resolution of 1ms instead of 10ms. I remember reading that the multimedia timers are implemented as a high priority thread. You can found more details on this site : http://www.geisswerks.com/ryan/FAQS/timing.html Best regards, Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Dynamic tick, version 050127-1
On Tue, 2005-02-01 at 15:20 -0500, Lee Revell wrote: I was wondering how Windows handles high res timers, if at all. The reason I ask is because I have been reverse engineering a Windows ASIO driver, and I find that if the latency is set below about 5ms, by By default, Windows multimedia timers have 10ms resolution (this depends on the exact version of Windows used...). You can call the timeBeginPeriod() function to lower the resolution to 1ms. This resolution seem related to the task scheduler timeslice. After you call this function, the Sleep() call has also a resolution of 1ms instead of 10ms. I remember reading that the multimedia timers are implemented as a high priority thread. You can found more details on this site : http://www.geisswerks.com/ryan/FAQS/timing.html Best regards, Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/13] Qsort
On Mon, 2005-01-24 at 21:43 -0300, Horst von Brand wrote: > AFAICS, this is just a badly implemented Shellsort (the 10/13 increment > sequence starting with the number of elements is probably not very good, > besides swapping stuff is inefficient (just juggling like Shellsort does > gives you almost a third less copies)). > > Have you found a proof for the O(n log n) claim? "Why a Comb Sort is NOT a Shell Sort A shell sort completely sorts the data for each gap size. A comb sort takes a more optimistic approach and doesn't require data be completely sorted at a gap size. The comb sort assumes that out-of-order data will be cleaned-up by smaller gap sizes as the sort proceeds. " Reference: http://world.std.com/~jdveale/combsort.htm Another good reference: http://yagni.com/combsort/index.php Personally, i've used it in the past because of it's small size. With C++ templates you can have a copy of the routine generated for a specific datatype, thus skipping the costly function call used for each compare. With some C macro magic, i presume something similar can be done, for time-critical applications. Best regards, Eric St-Laurent - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 1/13] Qsort
On Mon, 2005-01-24 at 21:43 -0300, Horst von Brand wrote: AFAICS, this is just a badly implemented Shellsort (the 10/13 increment sequence starting with the number of elements is probably not very good, besides swapping stuff is inefficient (just juggling like Shellsort does gives you almost a third less copies)). Have you found a proof for the O(n log n) claim? Why a Comb Sort is NOT a Shell Sort A shell sort completely sorts the data for each gap size. A comb sort takes a more optimistic approach and doesn't require data be completely sorted at a gap size. The comb sort assumes that out-of-order data will be cleaned-up by smaller gap sizes as the sort proceeds. Reference: http://world.std.com/~jdveale/combsort.htm Another good reference: http://yagni.com/combsort/index.php Personally, i've used it in the past because of it's small size. With C++ templates you can have a copy of the routine generated for a specific datatype, thus skipping the costly function call used for each compare. With some C macro magic, i presume something similar can be done, for time-critical applications. Best regards, Eric St-Laurent - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/