Re: "BUG: held lock freed!" lock validator tripped by kswapd & xfs
On 12/1/06, Mike Mattie <[EMAIL PROTECTED]> wrote: In an attempt to debug another kernel issue I turned on the lock validator and managed to generate this report. As a side note the first attempt to boot with the lock validator failed with a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace I patched sched.h: MAX_LOCK_DEPTH to 60. Dec 1 08:35:41 reforged [ 3052.513931] = Dec 1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ] Dec 1 08:35:41 reforged [ 3052.513939] - Dec 1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory c3458000-c3458fff, with a lock still held there! Dec 1 08:35:41 reforged [ 3052.513947] (&(>i_iolock)->mr_lock){}, at: [] xfs_ilock+0x20/0x75 Dec 1 08:35:41 reforged [ 3052.513959] 28 locks held by kswapd0/183: Dec 1 08:35:41 reforged [ 3052.513961] #0: (&(>i_iolock)->mr_lock){}, at: [] xfs_ilock+0x20/0x75 Dec 1 08:35:41 reforged [ 3052.513968] #1: (&(>i_lock)->mr_lock){}, at: [] xfs_ilock+0x52/0x75 Dec 1 08:35:41 reforged [ 3052.513975] seems to alternate between same two locks. But both c089 and c0bb are not between the page(oxfff=4095 or about 4k) which kswapd is trying to get rid of. I think this trace is on crack somehow. [ 3052.514136] stack backtrace: Dec 1 08:35:41 reforged [ 3052.514139] [] show_trace+0x16/0x19 Dec 1 08:35:41 reforged [ 3052.514146] [] dump_stack+0x1a/0x1f Dec 1 08:35:41 reforged [ 3052.514150] [] debug_check_no_locks_freed+0xe0/0xff Dec 1 08:35:41 reforged [ 3052.514159] [] free_hot_cold_page+0x96/0x109 Dec 1 08:35:41 reforged [ 3052.514166] [] __pagevec_free+0x1c/0x27 Dec 1 08:35:41 reforged [ 3052.514170] [] __pagevec_release_nonlru+0x65/0x71 Dec 1 08:35:41 reforged [ 3052.514176] [] shrink_inactive_list+0x4b1/0x722 Dec 1 08:35:41 reforged [ 3052.514181] [] shrink_zone+0xba/0xd9 Dec 1 08:35:41 reforged [ 3052.514185] [] kswapd+0x26a/0x361 Dec 1 08:35:41 reforged [ 3052.514189] [] kthread+0xb0/0xe1 Dec 1 08:35:41 reforged [ 3052.514192] [] kernel_thread_helper+0x5/0xb reforged log # Linux reforged 2.6.18.3 #4 PREEMPT Fri Dec 1 06:15:05 PST 2006 i686 AMD Athlon(tm) XP 3000+ AuthenticAMD GNU/Linux I know you are running preempt on up machine. I'd try running 2.6.18.4 with a small patch like this and see if you can't cause it to recrash for you. print_freed_lock_bug uses printk which in theory might be causing a preempt . diff -urp linux-2.6.18.4/include/linux/sched.h linux-debug/include/linux/sched.h --- linux-2.6.18.4/include/linux/sched.h2006-11-29 11:28:40.0 -0800 +++ linux-debug/include/linux/sched.h 2006-12-01 13:25:23.0 -0800 @@ -936,7 +936,7 @@ struct task_struct { int softirq_context; #endif #ifdef CONFIG_LOCKDEP -# define MAX_LOCK_DEPTH 30UL +# define MAX_LOCK_DEPTH (60UL) u64 curr_chain_key; int lockdep_depth; struct held_lock held_locks[MAX_LOCK_DEPTH]; diff -urp linux-2.6.18.4/kernel/lockdep.c linux-debug/kernel/lockdep.c --- linux-2.6.18.4/kernel/lockdep.c 2006-11-29 11:28:40.0 -0800 +++ linux-debug/kernel/lockdep.c2006-12-01 14:22:14.0 -0800 @@ -2608,6 +2608,7 @@ void debug_check_no_locks_freed(const vo return; local_irq_save(flags); + preempt_disable(); for (i = 0; i < curr->lockdep_depth; i++) { hlock = curr->held_locks + i; @@ -2621,6 +2622,7 @@ void debug_check_no_locks_freed(const vo print_freed_lock_bug(curr, mem_from, mem_to, hlock); break; } + preempt_enable(); local_irq_restore(flags); } -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] Tigran Aivazian: remove bouncing email addresses
On 12/1/06, Arjan van de Ven <[EMAIL PROTECTED]> wrote: On Thu, 2006-11-30 at 22:00 -0800, Hua Zhong wrote: > I am curious, what's the point? > > These email addresses serve a "historical" purpose: they tell when the contribution was made, what the author's email addresses > were at that point. Approximately when I wish the copyright dates were comma separated iso8601 date ranges myself. I also am not likely to typically care what their email address was then, I want current information in the current kernel sources. If I want old email address I got old tarballs I can get at least. .. and which company owns the copyright. Not in the USA according to http://www.copyright.gov/title17/92chap4.html#401 . [[ ... § 401. Notice of copyright: Visually perceptible copies ... b) Form of Notice. — If a notice appears on the copies, it shall consist of the following three elements: (1) the symbol (c) (the letter C in a circle), or the word "Copyright", or the abbreviation "Copr."; and (2) the year of first publication of the work; in the case of compilations or derivative works incorporating previously published material, the year date of first publication of the compilation or derivative work is sufficient. The year date may be omitted where a pictorial, graphic, or sculptural work, with accompanying text matter, if any, is reproduced in or on greeting cards, postcards, stationery, jewelry, dolls, toys, or any useful articles; and (3) the name of the owner of copyright in the work, or an abbreviation by which the name can be recognized, or a generally known alternative designation of the owner. ]] For source code generally there are a few changes for typical copyright notices: They use "Copyright (C)" because ASCII and EBCDIC didn't have native copyright symbol like unicode does now. They include years in which they were published and not just the first year in which in this version was published. The name of copyright owner typically also includes an email address. Copyright (C) 1999,2000 Tigran Aivazian <[EMAIL PROTECTED]> Copyright (C) 1999 Tigran Aivazian <[EMAIL PROTECTED]> etc seems like only copyright notices changed effect Tigran and if Tigran meant for it to be copyrighted by veritas he would have done Copyright (C) 1999 Veritas Inc. http://www.veritas.com/ However he did not do so. Of course I'd prefer something closer to Copyright (C) 1999-07-05/2000-03-12 Tigran Aivazian <[EMAIL PROTECTED]> or at least Copyright (C) 1999-07-05/2000-03 Tigran Aivazian <[EMAIL PROTECTED]> Especially if the laws ever get changed to make copyright durations shorter. Like 14 years instead of 50 years ,70 years, or as old as Disney's Steam Boat Willie. Lets not remove historical email addresses. Just make sure there's a current one in MODULE_AUTHOR / MAINTAINERS. I think whoever should either remove or update the email addresses. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6 patch] Tigran Aivazian: remove bouncing email addresses
On 12/1/06, Arjan van de Ven [EMAIL PROTECTED] wrote: On Thu, 2006-11-30 at 22:00 -0800, Hua Zhong wrote: I am curious, what's the point? These email addresses serve a historical purpose: they tell when the contribution was made, what the author's email addresses were at that point. Approximately when I wish the copyright dates were comma separated iso8601 date ranges myself. I also am not likely to typically care what their email address was then, I want current information in the current kernel sources. If I want old email address I got old tarballs I can get at least. .. and which company owns the copyright. Not in the USA according to http://www.copyright.gov/title17/92chap4.html#401 . [[ ... § 401. Notice of copyright: Visually perceptible copies ... b) Form of Notice. — If a notice appears on the copies, it shall consist of the following three elements: (1) the symbol (c) (the letter C in a circle), or the word Copyright, or the abbreviation Copr.; and (2) the year of first publication of the work; in the case of compilations or derivative works incorporating previously published material, the year date of first publication of the compilation or derivative work is sufficient. The year date may be omitted where a pictorial, graphic, or sculptural work, with accompanying text matter, if any, is reproduced in or on greeting cards, postcards, stationery, jewelry, dolls, toys, or any useful articles; and (3) the name of the owner of copyright in the work, or an abbreviation by which the name can be recognized, or a generally known alternative designation of the owner. ]] For source code generally there are a few changes for typical copyright notices: They use Copyright (C) because ASCII and EBCDIC didn't have native copyright symbol like unicode does now. They include years in which they were published and not just the first year in which in this version was published. The name of copyright owner typically also includes an email address. Copyright (C) 1999,2000 Tigran Aivazian [EMAIL PROTECTED] Copyright (C) 1999 Tigran Aivazian [EMAIL PROTECTED] etc seems like only copyright notices changed effect Tigran and if Tigran meant for it to be copyrighted by veritas he would have done Copyright (C) 1999 Veritas Inc. http://www.veritas.com/ However he did not do so. Of course I'd prefer something closer to Copyright (C) 1999-07-05/2000-03-12 Tigran Aivazian [EMAIL PROTECTED] or at least Copyright (C) 1999-07-05/2000-03 Tigran Aivazian [EMAIL PROTECTED] Especially if the laws ever get changed to make copyright durations shorter. Like 14 years instead of 50 years ,70 years, or as old as Disney's Steam Boat Willie. Lets not remove historical email addresses. Just make sure there's a current one in MODULE_AUTHOR / MAINTAINERS. I think whoever should either remove or update the email addresses. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: BUG: held lock freed! lock validator tripped by kswapd xfs
On 12/1/06, Mike Mattie [EMAIL PROTECTED] wrote: In an attempt to debug another kernel issue I turned on the lock validator and managed to generate this report. As a side note the first attempt to boot with the lock validator failed with a message indicating I had exceeded MAX_LOCK_DEPTH. To get this trace I patched sched.h: MAX_LOCK_DEPTH to 60. Dec 1 08:35:41 reforged [ 3052.513931] = Dec 1 08:35:41 reforged [ 3052.513937] [ BUG: held lock freed! ] Dec 1 08:35:41 reforged [ 3052.513939] - Dec 1 08:35:41 reforged [ 3052.513943] kswapd0/183 is freeing memory c3458000-c3458fff, with a lock still held there! Dec 1 08:35:41 reforged [ 3052.513947] ((ip-i_iolock)-mr_lock){}, at: [c089] xfs_ilock+0x20/0x75 Dec 1 08:35:41 reforged [ 3052.513959] 28 locks held by kswapd0/183: Dec 1 08:35:41 reforged [ 3052.513961] #0: ((ip-i_iolock)-mr_lock){}, at: [c089] xfs_ilock+0x20/0x75 Dec 1 08:35:41 reforged [ 3052.513968] #1: ((ip-i_lock)-mr_lock){}, at: [c0bb] xfs_ilock+0x52/0x75 Dec 1 08:35:41 reforged [ 3052.513975] seems to alternate between same two locks. But both c089 and c0bb are not between the page(oxfff=4095 or about 4k) which kswapd is trying to get rid of. I think this trace is on crack somehow. [ 3052.514136] stack backtrace: Dec 1 08:35:41 reforged [ 3052.514139] [c0103cb9] show_trace+0x16/0x19 Dec 1 08:35:41 reforged [ 3052.514146] [c01040f7] dump_stack+0x1a/0x1f Dec 1 08:35:41 reforged [ 3052.514150] [c012be74] debug_check_no_locks_freed+0xe0/0xff Dec 1 08:35:41 reforged [ 3052.514159] [c014122d] free_hot_cold_page+0x96/0x109 Dec 1 08:35:41 reforged [ 3052.514166] [c01412bc] __pagevec_free+0x1c/0x27 Dec 1 08:35:41 reforged [ 3052.514170] [c01435dc] __pagevec_release_nonlru+0x65/0x71 Dec 1 08:35:41 reforged [ 3052.514176] [c0144702] shrink_inactive_list+0x4b1/0x722 Dec 1 08:35:41 reforged [ 3052.514181] [c0144a2d] shrink_zone+0xba/0xd9 Dec 1 08:35:41 reforged [ 3052.514185] [c0144e9e] kswapd+0x26a/0x361 Dec 1 08:35:41 reforged [ 3052.514189] [c012742b] kthread+0xb0/0xe1 Dec 1 08:35:41 reforged [ 3052.514192] [c0101005] kernel_thread_helper+0x5/0xb reforged log # Linux reforged 2.6.18.3 #4 PREEMPT Fri Dec 1 06:15:05 PST 2006 i686 AMD Athlon(tm) XP 3000+ AuthenticAMD GNU/Linux I know you are running preempt on up machine. I'd try running 2.6.18.4 with a small patch like this and see if you can't cause it to recrash for you. print_freed_lock_bug uses printk which in theory might be causing a preempt . diff -urp linux-2.6.18.4/include/linux/sched.h linux-debug/include/linux/sched.h --- linux-2.6.18.4/include/linux/sched.h2006-11-29 11:28:40.0 -0800 +++ linux-debug/include/linux/sched.h 2006-12-01 13:25:23.0 -0800 @@ -936,7 +936,7 @@ struct task_struct { int softirq_context; #endif #ifdef CONFIG_LOCKDEP -# define MAX_LOCK_DEPTH 30UL +# define MAX_LOCK_DEPTH (60UL) u64 curr_chain_key; int lockdep_depth; struct held_lock held_locks[MAX_LOCK_DEPTH]; diff -urp linux-2.6.18.4/kernel/lockdep.c linux-debug/kernel/lockdep.c --- linux-2.6.18.4/kernel/lockdep.c 2006-11-29 11:28:40.0 -0800 +++ linux-debug/kernel/lockdep.c2006-12-01 14:22:14.0 -0800 @@ -2608,6 +2608,7 @@ void debug_check_no_locks_freed(const vo return; local_irq_save(flags); + preempt_disable(); for (i = 0; i curr-lockdep_depth; i++) { hlock = curr-held_locks + i; @@ -2621,6 +2622,7 @@ void debug_check_no_locks_freed(const vo print_freed_lock_bug(curr, mem_from, mem_to, hlock); break; } + preempt_enable(); local_irq_restore(flags); } -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/14/05, Lee Revell <[EMAIL PROTECTED]> wrote: > I know the alternatives are available. That doesn't make it any less > idiotic to use non ASCII characters as operators. I think it's a very > slippery slope. We write code in ASCII, dammit. Yes you and I might write 99.9% of our code in good'ol **American** Standard Code for Information Interchange -- however not all the world is USA. For instance notice the http://de.wikipedia.org/wiki/Umlaut/ in "Löwis"... Seems like lots of Europeans might want a bigger charset, not to mention Asians, Hindus, and whomever else. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/14/05, Lee Revell [EMAIL PROTECTED] wrote: I know the alternatives are available. That doesn't make it any less idiotic to use non ASCII characters as operators. I think it's a very slippery slope. We write code in ASCII, dammit. Yes you and I might write 99.9% of our code in good'ol **American** Standard Code for Information Interchange -- however not all the world is USA. For instance notice the http://de.wikipedia.org/wiki/Umlaut/ in Löwis... Seems like lots of Europeans might want a bigger charset, not to mention Asians, Hindus, and whomever else. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/13/05, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > This patch adds support for UTF-8 signatures (aka BOM, byte order > mark) to binfmt_script. > With such support, creating scripts that reliably carry non-ASCII > characters is simplified. > the approach would naturally extend to Perl to enhance/replace > the "use utf8" pragma. Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... So I'd imagine that a lot of perl6 scripts would be utf8. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Patch] Support UTF-8 scripts
On 8/13/05, Martin v. Löwis [EMAIL PROTECTED] wrote: This patch adds support for UTF-8 signatures (aka BOM, byte order mark) to binfmt_script. With such support, creating scripts that reliably carry non-ASCII characters is simplified. the approach would naturally extend to Perl to enhance/replace the use utf8 pragma. Thats great for the perl6 people. http://dev.perl.org/perl6/doc/design/syn/S03.html says they are going to be using « and » as operators... So I'd imagine that a lot of perl6 scripts would be utf8. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: use kcalloc instead kmalloc/memset
On 8/5/05, Christoph Lameter <[EMAIL PROTECTED]> wrote: > Hmm. If we had kcmalloc then we may be able to add a zero bit to the slab > allocator. If we would obtain zeroed pages for the slab then we may skip > zeroing of individual entries. However, the cache warming effect of the > current zeroing is then not occurring. Not sure if this would make sense > but this is a possible optimization if we had kcmalloc. Well there is kzalloc and kcalloc. I just thought a safe non-zeroing version would be nice. You could warm the cache with prefetch, but you'd need to profile the diferent cases to see what is worth doing and what isn't. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: use kcalloc instead kmalloc/memset
On 8/5/05, Roman Zippel <[EMAIL PROTECTED]> wrote: > On Fri, 5 Aug 2005, Arjan van de Ven wrote: > > > This would imply a similiar kmalloc() would be useful as well. > > > Second, how relevant is it for the kernel? > > we've had a non-negliable amount of security holes because of this > So why don't we have a similiar kmalloc()? You mean something like: static void __bad_kmalloc_safe_nonconstant_size(void); static void __bad_kmalloc_safe_zero_size(void); static void __bad_kmalloc_safe_too_large_size(void); static void __bad_kmalloc_safe_too_large(void); static inline void *kmalloc_safe(size_t nmemb, size_t size,int flags) { if (!__builtin_constant_p(size)) __bad_kmalloc_safe_nonconstant_size(); if ( !size ) __bad_kmalloc_safe_zero_size(); if ( size > 0x1) __bad_kmalloc_safe_too_large_size(); if (__builtin_constant_p(nmemb) && nmemb > 0x2/size) __bad_kmalloc_safe_too_large(); if (nmemb <= 0x2/size) return kmalloc(nmemb*size,flags); else return 0; } -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: use kcalloc instead kmalloc/memset
On 8/5/05, Roman Zippel [EMAIL PROTECTED] wrote: On Fri, 5 Aug 2005, Arjan van de Ven wrote: This would imply a similiar kmalloc() would be useful as well. Second, how relevant is it for the kernel? we've had a non-negliable amount of security holes because of this So why don't we have a similiar kmalloc()? You mean something like: static void __bad_kmalloc_safe_nonconstant_size(void); static void __bad_kmalloc_safe_zero_size(void); static void __bad_kmalloc_safe_too_large_size(void); static void __bad_kmalloc_safe_too_large(void); static inline void *kmalloc_safe(size_t nmemb, size_t size,int flags) { if (!__builtin_constant_p(size)) __bad_kmalloc_safe_nonconstant_size(); if ( !size ) __bad_kmalloc_safe_zero_size(); if ( size 0x1) __bad_kmalloc_safe_too_large_size(); if (__builtin_constant_p(nmemb) nmemb 0x2/size) __bad_kmalloc_safe_too_large(); if (nmemb = 0x2/size) return kmalloc(nmemb*size,flags); else return 0; } -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] kernel: use kcalloc instead kmalloc/memset
On 8/5/05, Christoph Lameter [EMAIL PROTECTED] wrote: Hmm. If we had kcmalloc then we may be able to add a zero bit to the slab allocator. If we would obtain zeroed pages for the slab then we may skip zeroing of individual entries. However, the cache warming effect of the current zeroing is then not occurring. Not sure if this would make sense but this is a possible optimization if we had kcmalloc. Well there is kzalloc and kcalloc. I just thought a safe non-zeroing version would be nice. You could warm the cache with prefetch, but you'd need to profile the diferent cases to see what is worth doing and what isn't. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: QoS scheduler
On 7/29/05, Vitor Curado <[EMAIL PROTECTED]> wrote: > You assumed right, Stephen: I'm interested in QoS process scheduling, > sorry for not specifying it... > > I'm taking a deeper look at the qlinux, ckrm and the plugsched > schedulers, if you have any more links, please send them to me... Also you didn't specify what kind of clustering you are doing and for what ultimate purpose. http://www.beowulf.org/ http://www-unix.mcs.anl.gov/mpi/implementations.html http://www.csm.ornl.gov/pvm/pvm_home.html http://www.open-mpi.org/ http://openmosix.sourceforge.net/ http://www.mosix.org/ http://www.remote-dba.cc/teas_aegis_rac06.htm http://www.dba-oracle.com/bp/bp_book1_rac.htm Oracle DB Real Application Clusters (RAC) transparent application failover (TAF) http://pgcluster.projects.postgresql.org/feature.html http://dev.mysql.com/doc/mysql/en/replication.html High Availability (HA) High Performance Computing (HPC) That can strongly effect what solutions you would want to look at. For instance if you were running a render farm, or a scientific compute beowulf cluster, then your "scheduling" will be handled more in the MPI or PVM code perhaps. The running processes themselves would most likely be using something like SCHED_BATCH, with larger than usual time-slices. Maybe you monitor how many mips actually get consumed and then adjust which nodes get scheduled with what, or how many work units get handed out to get back to fairness. clock_t times(struct tms *buf); int getrusage(int who, struct rusage *usage); to track system and user time is about on track, but I think someone might be able to fool you, if thats all you could use to account for cpu time taken from another userland process. So maybe you just need better reporting/accounting hooks and then you can do the rest in userland? > On 7/28/05, Wes Felter <[EMAIL PROTECTED]> wrote: > > Vitor Curado wrote: > > > I'm working on a research about QoS schedulers for Linux clusters. > > > Moreover, the ideal would be that the scheduler is implemented > > > altering the native kernel scheduler. I'm kind of having trouble to > > > find such schedulers, can anybody help me out? > > > > http://lass.cs.umass.edu/software/qlinux/ > > http://ckrm.sourceforge.net/ That qlinux one is new to me. I notice that the 2.6 kernel has support for modular plugable disk I/O and network schedulers now. So a Hierarchical Start Time Fair Queuing (H-SFQ) network packet scheduler module could be made. I wonder how that Cello scheduler would stack-up to AS, Deadline, cfq, noop, etc etc. The qlinux cpu scheduler would be best to use plugsched for use with 2.6.x -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ZyXEL Kernel /BusyBox GPL violation?
On 7/25/05, Lee Revell <[EMAIL PROTECTED]> wrote: > On Mon, 2005-07-25 at 23:21 -0400, Mace Moneta wrote: > > The response seems meaningless; does this constitute a violation of > > GPL? > > If so what, if any, action needs to be taken? http://gpl-violations.org/ http://www.fsf.org/licensing/licenses/gpl-faq.html#ReportingViolation http://www.fsf.org/licensing/licenses/gpl-violation.html [[[You should report it. First, check the facts as best you can. Then tell the publisher or copyright holder of the specific GPL-covered program. If that is the Free Software Foundation, write to <[EMAIL PROTECTED]>. Otherwise, the program's maintainer may be the copyright holder, or else could tell you how to contact the copyright holder, so report it to the maintainer.]]] > Also if they didn't modify the kernel, they don't have to give you > source, they can just refer you to kernel.org. Wrong. http://www.fsf.org/licensing/licenses/gpl-faq.html#DistributeWithSourceOnInternet [[[I want to distribute binaries without accompanying sources. Can I provide source code by FTP instead of by mail order? You're supposed to provide the source code by mail-order on a physical medium, if someone orders it. You are welcome to offer people a way to copy the corresponding source code by FTP, in addition to the mail-order option, but FTP access to the source is not sufficient to satisfy section 3 of the GPL. When a user orders the source, you have to make sure to get the source to that user. If a particular user can conveniently get the source from you by anonymous FTP, fine--that does the job. But not every user can do such a download. The rest of the users are just as entitled to get the source code from you, which means you must be prepared to send it to them by post. If the FTP access is convenient enough, perhaps no one will choose to mail-order a copy. If so, you will never have to ship one. But you cannot assume that. Of course, it's easiest to just send the source with the binary in the first place. ]]] http://www.fsf.org/licensing/licenses/gpl-faq.html#TOCSourceAndBinaryOnDifferentSites [[[Can I put the binaries on my Internet server and put the source on a different Internet site? The GPL says you must offer access to copy the source code "from the same place"; that is, next to the binaries. However, if you make arrangements with another site to keep the necessary source code available, and put a link or cross-reference to the source code next to the binaries, we think that qualifies as "from the same place". Note, however, that it is not enough to find some site that happens to have the appropriate source code today, and tell people to look there. Tomorrow that site may have deleted that source code, or simply replaced it with a newer version of the same program. Then you would no longer be complying with the GPL requirements. To make a reasonable effort to comply, you need to make a positive arrangement with the other site, and thus ensure that the source will be available there for as long as you keep the binaries available. ]]] http://www.fsf.org/licensing/licenses/gpl.html Section 3 mentions three choices of what you must do to copy and distribute: a) Have it from the same location. They have not. b) Have written offer good for three years None such mentioned. c) Be noncommercial plus send some information. zyxel.com "seller of routers" sounds like a commercial enterprise to me. So no they must assume responsibility to have the sources availible even if they didn't modify them. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: ZyXEL Kernel /BusyBox GPL violation?
On 7/25/05, Lee Revell [EMAIL PROTECTED] wrote: On Mon, 2005-07-25 at 23:21 -0400, Mace Moneta wrote: The response seems meaningless; does this constitute a violation of GPL? If so what, if any, action needs to be taken? http://gpl-violations.org/ http://www.fsf.org/licensing/licenses/gpl-faq.html#ReportingViolation http://www.fsf.org/licensing/licenses/gpl-violation.html [[[You should report it. First, check the facts as best you can. Then tell the publisher or copyright holder of the specific GPL-covered program. If that is the Free Software Foundation, write to [EMAIL PROTECTED]. Otherwise, the program's maintainer may be the copyright holder, or else could tell you how to contact the copyright holder, so report it to the maintainer.]]] Also if they didn't modify the kernel, they don't have to give you source, they can just refer you to kernel.org. Wrong. http://www.fsf.org/licensing/licenses/gpl-faq.html#DistributeWithSourceOnInternet [[[I want to distribute binaries without accompanying sources. Can I provide source code by FTP instead of by mail order? You're supposed to provide the source code by mail-order on a physical medium, if someone orders it. You are welcome to offer people a way to copy the corresponding source code by FTP, in addition to the mail-order option, but FTP access to the source is not sufficient to satisfy section 3 of the GPL. When a user orders the source, you have to make sure to get the source to that user. If a particular user can conveniently get the source from you by anonymous FTP, fine--that does the job. But not every user can do such a download. The rest of the users are just as entitled to get the source code from you, which means you must be prepared to send it to them by post. If the FTP access is convenient enough, perhaps no one will choose to mail-order a copy. If so, you will never have to ship one. But you cannot assume that. Of course, it's easiest to just send the source with the binary in the first place. ]]] http://www.fsf.org/licensing/licenses/gpl-faq.html#TOCSourceAndBinaryOnDifferentSites [[[Can I put the binaries on my Internet server and put the source on a different Internet site? The GPL says you must offer access to copy the source code from the same place; that is, next to the binaries. However, if you make arrangements with another site to keep the necessary source code available, and put a link or cross-reference to the source code next to the binaries, we think that qualifies as from the same place. Note, however, that it is not enough to find some site that happens to have the appropriate source code today, and tell people to look there. Tomorrow that site may have deleted that source code, or simply replaced it with a newer version of the same program. Then you would no longer be complying with the GPL requirements. To make a reasonable effort to comply, you need to make a positive arrangement with the other site, and thus ensure that the source will be available there for as long as you keep the binaries available. ]]] http://www.fsf.org/licensing/licenses/gpl.html Section 3 mentions three choices of what you must do to copy and distribute: a) Have it from the same location. They have not. b) Have written offer good for three years None such mentioned. c) Be noncommercial plus send some information. zyxel.com seller of routers sounds like a commercial enterprise to me. So no they must assume responsibility to have the sources availible even if they didn't modify them. -- http://dmoz.org/profiles/pollei.html http://sourceforge.net/users/stephen_pollei/ http://www.orkut.com/Profile.aspx?uid=2455954990164098214 http://stephen_pollei.home.comcast.net/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On 7/14/05, Eric St-Laurent <[EMAIL PROTECTED]> wrote: > On Thu, 2005-07-14 at 17:24 -0700, Linus Torvalds wrote: > > Trust me. When I say that the right thing to do is to just have a fixed > > (but high) HZ value, and just changing the timer rate, I'm -right-. > Of course you are, jiffies are simple and efficient. > If i sum-up the discussion from my POV: > - use a 32-bit tick counter on 32-bit platforms and use a 64-bit counter > on 64-bit platforms If the 64bit counter doesn't have any overhead then sure. > - keep the constant HZ=1000 (mS resolution) on 32-bit platforms Which HZ Is that? CONFIG_JIFFIES_HZ or CONFIG_FIXED_PIT_HZ ? I think you meant CONFIG_JIFFIES_HZ which I think for even 32bit counters could go up to 1e4 to 5e4 , with some patching going on in some places of course. > - remove the assumption that timer interrupts and jiffies are 1:1 thing > (jiffies may be incremented by >1 ticks at timer interrupt) Yes maybe nuke CONFIG_HZ and replace it with CONFIG_JIFFIES_HZ and CONFIG_(FIXED|DEFAULT|DYNAMIC)_PIT_HZ . Starting with just CONFIG_FIXED_PIT_HZ, add others as needed. Extreme might be to also just nuke HZ and replace it with JHZ and PHZ, or whatever so that people are *crystal* clear about the difference. > - determine jiffies_increment at boot So CONFIG__PIT_HZ could be a per boot time thing maybe. So you'd have CONFIG_DEFAULT_PIT_HZ if it was a per per boot or runtime thing. CONFIG_DYNAMIC_PIT_HZ if it was changable as the system is running -- like windows. CONFIG_FIXED_PIT_HZ if it is a compile time constant. Or something like the that? > - have a slow clock mode to help power management (adjust > jiffies_increment by the slowdown factor) CONFIG_DYNAMIC_PIT_HZ unless it's overhead is so low that everyone just wants it by default. > - it may be useful to bump up HZ to 1e6 (uS res.) or 1e9 (nS res.) on > 64-bit platforms, if there are benefits such as better accuracy during > time units conversions or if a higher frequency timer hardware is > available/viable. Too high starts to cause other troubles. I think that the real time people want 10uS scheduling, but even the ipipe and rt-preempt has 18us-70uS delays at times IIRC. So 5e4 to 1e5 is about the extreme end of the road for CONFIG_JIFFIES_HZ . I think even long term that 1e5 to 1e6 would be extreme because of speed of light issues, etc. Hpet is only 1.4e7 IIRC. I think that you should start with: 1) CONFIG_FIXED_PIT_HZ=50 CONFIG_JIFFIES_HZ=2000 2) try it out and fix any bugs, send the fixes to Linus to see if how much he bitches. 3) if you still need CONFIG_JIFFIES_HZ to be larger, double it and then goto 2. 4) enjoy your higher frequency jiffies I bet that even that going to somewhere between 2e3 through 1e5 will make you want to change a few things for performance and sanity reasons. So I'd focus on that before I even thought about 1e6 through 1e10 . Plus I think the interest level really fails off to go that extreme. Just making JIFFIES_HZ != PIT_HZ will require patches. Dynamic pit hz or lazy update of jiffies based on tsc/hpet/other are other patches. > - it may be also useful to bump HZ on -RT (Real-time) kernels, yes they sound like they want JIFFIES_HZ to be 1e3 through 1e5 depending on task. They also want hpet(or other), vertical retrace interrupts(so xsync works for video), perhaps a nist mini atomic clock, and a few other goodies AFAIK. > -HRT (High-resolution timers support). Yes tsc or hpet or whatever users might benefit in several ways. 1) both tsc and hpet might be able to bump up to a more accurate value on entry to idle and then test to see if anything got scheduled. 2) hpet can set set one shot timers for the next up coming event on idle if it's sooner than when the PIT interrupt is suppose to come in. Of course update the jiffies when that hpet interrupt comes. >Users of those kernel are willing > to pay the cost of the overhead to have better resolution Yes realtime users with something like hpet might not vary the pit timer, but place hooks to update the jiffies between pit interrupts like idle, scheduler(task switch), etc. And use the hpet one shot interrupts as well. > - avoid direct usage of the jiffies variable, instead use jiffies() > (inline or MACRO), IMO monotonic_clock() would be a better name I don't know I think it could remain a variable you usual just want it to be a light-weight memory read not a call out to an hpet and then a math conversion, or a call out to tsc that then has to known about if the tsc represents work or time, and if the cpu has been slowed for power save reasons etc etc etc. I think you want a symbol exported gpl of something like void force_update_jiffies(void); that you can call in different hook locations to force the update of jiffies from non-interupt sources. Actually you might want more than one version of that function or have it take an argument, becuase some people might want to be super lazy and only update it when the enter or leave idle, while
Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt
On 7/14/05, Eric St-Laurent [EMAIL PROTECTED] wrote: On Thu, 2005-07-14 at 17:24 -0700, Linus Torvalds wrote: Trust me. When I say that the right thing to do is to just have a fixed (but high) HZ value, and just changing the timer rate, I'm -right-. Of course you are, jiffies are simple and efficient. If i sum-up the discussion from my POV: - use a 32-bit tick counter on 32-bit platforms and use a 64-bit counter on 64-bit platforms If the 64bit counter doesn't have any overhead then sure. - keep the constant HZ=1000 (mS resolution) on 32-bit platforms Which HZ Is that? CONFIG_JIFFIES_HZ or CONFIG_FIXED_PIT_HZ ? I think you meant CONFIG_JIFFIES_HZ which I think for even 32bit counters could go up to 1e4 to 5e4 , with some patching going on in some places of course. - remove the assumption that timer interrupts and jiffies are 1:1 thing (jiffies may be incremented by 1 ticks at timer interrupt) Yes maybe nuke CONFIG_HZ and replace it with CONFIG_JIFFIES_HZ and CONFIG_(FIXED|DEFAULT|DYNAMIC)_PIT_HZ . Starting with just CONFIG_FIXED_PIT_HZ, add others as needed. Extreme might be to also just nuke HZ and replace it with JHZ and PHZ, or whatever so that people are *crystal* clear about the difference. - determine jiffies_increment at boot So CONFIG_foo_PIT_HZ could be a per boot time thing maybe. So you'd have CONFIG_DEFAULT_PIT_HZ if it was a per per boot or runtime thing. CONFIG_DYNAMIC_PIT_HZ if it was changable as the system is running -- like windows. CONFIG_FIXED_PIT_HZ if it is a compile time constant. Or something like the that? - have a slow clock mode to help power management (adjust jiffies_increment by the slowdown factor) CONFIG_DYNAMIC_PIT_HZ unless it's overhead is so low that everyone just wants it by default. - it may be useful to bump up HZ to 1e6 (uS res.) or 1e9 (nS res.) on 64-bit platforms, if there are benefits such as better accuracy during time units conversions or if a higher frequency timer hardware is available/viable. Too high starts to cause other troubles. I think that the real time people want 10uS scheduling, but even the ipipe and rt-preempt has 18us-70uS delays at times IIRC. So 5e4 to 1e5 is about the extreme end of the road for CONFIG_JIFFIES_HZ . I think even long term that 1e5 to 1e6 would be extreme because of speed of light issues, etc. Hpet is only 1.4e7 IIRC. I think that you should start with: 1) CONFIG_FIXED_PIT_HZ=50 CONFIG_JIFFIES_HZ=2000 2) try it out and fix any bugs, send the fixes to Linus to see if how much he bitches. 3) if you still need CONFIG_JIFFIES_HZ to be larger, double it and then goto 2. 4) enjoy your higher frequency jiffies I bet that even that going to somewhere between 2e3 through 1e5 will make you want to change a few things for performance and sanity reasons. So I'd focus on that before I even thought about 1e6 through 1e10 . Plus I think the interest level really fails off to go that extreme. Just making JIFFIES_HZ != PIT_HZ will require patches. Dynamic pit hz or lazy update of jiffies based on tsc/hpet/other are other patches. - it may be also useful to bump HZ on -RT (Real-time) kernels, yes they sound like they want JIFFIES_HZ to be 1e3 through 1e5 depending on task. They also want hpet(or other), vertical retrace interrupts(so xsync works for video), perhaps a nist mini atomic clock, and a few other goodies AFAIK. -HRT (High-resolution timers support). Yes tsc or hpet or whatever users might benefit in several ways. 1) both tsc and hpet might be able to bump up to a more accurate value on entry to idle and then test to see if anything got scheduled. 2) hpet can set set one shot timers for the next up coming event on idle if it's sooner than when the PIT interrupt is suppose to come in. Of course update the jiffies when that hpet interrupt comes. Users of those kernel are willing to pay the cost of the overhead to have better resolution Yes realtime users with something like hpet might not vary the pit timer, but place hooks to update the jiffies between pit interrupts like idle, scheduler(task switch), etc. And use the hpet one shot interrupts as well. - avoid direct usage of the jiffies variable, instead use jiffies() (inline or MACRO), IMO monotonic_clock() would be a better name I don't know I think it could remain a variable you usual just want it to be a light-weight memory read not a call out to an hpet and then a math conversion, or a call out to tsc that then has to known about if the tsc represents work or time, and if the cpu has been slowed for power save reasons etc etc etc. I think you want a symbol exported gpl of something like void force_update_jiffies(void); that you can call in different hook locations to force the update of jiffies from non-interupt sources. Actually you might want more than one version of that function or have it take an argument, becuase some people might want to be super lazy and only update it when the enter or leave idle, while others(real timers) might want