Re: [PATCH] add some drop_caches documentation and info messsge
Hi! > > > hmpf. This patch worries me. If there are people out there who are > > > regularly using drop_caches because the VM sucks, it seems pretty > > > obnoxious of us to go dumping stuff into their syslog. What are they > > > supposed to do? Stop using drop_caches? > > > > People use drop_caches because they _think_ the VM sucks, or they > > _think_ they're "tuning" their system. _They_ are supposed to stop > > using drop_caches. :) > > Well who knows. Could be that people's vm *does* suck. Or they have > some particularly peculiar worklosd or requirement[*]. Or their VM > *used* to suck, and the drop_caches is not really needed any more but > it's there in vendor-provided code and they can't practically prevent > it. Or they have ipw wifi that does order 5 allocation :-). I seen drop_caches used in some android code, as part of SD card handling IIRC. > > What kind of interface _is_ it in the first place? Is it really a > > production-level thing that we expect users to be poking at? Or, is it > > a rarely-used debugging and benchmarking knob which is fair game for us > > to tweak like this? > > It was a rarely-used mainly-developer-only thing which, apparently, real > people found useful at some point in the past. Perhaps we should never > have offered it. And yes, documentation would be good. IIRC you claimed that drop_caches is not safe to use year-or-so-ago, is that still true? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
Hi! hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? People use drop_caches because they _think_ the VM sucks, or they _think_ they're tuning their system. _They_ are supposed to stop using drop_caches. :) Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. Or they have ipw wifi that does order 5 allocation :-). I seen drop_caches used in some android code, as part of SD card handling IIRC. What kind of interface _is_ it in the first place? Is it really a production-level thing that we expect users to be poking at? Or, is it a rarely-used debugging and benchmarking knob which is fair game for us to tweak like this? It was a rarely-used mainly-developer-only thing which, apparently, real people found useful at some point in the past. Perhaps we should never have offered it. And yes, documentation would be good. IIRC you claimed that drop_caches is not safe to use year-or-so-ago, is that still true? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 31, 2012 at 06:31:54PM +0100, Pavel Machek wrote: > Hmm? When I resume from hibernate, I want to use my machine. Well, in my case with a workstation with 8 Gb, the only time the swapin is noticeable is when I try to use firefox with a couple of dozens tabs open. Once that thing is swapped in, system perf is back to normal. I'll bet that even this slowdown would disappear if I use an SSD. But I can imagine some workloads where swapping everything back in could be discomforting. > Kernel will not normally swap anything in automatically. Some people > do swapoff -a; swapon -a to work around that. (And yes, maybe some > automatic-swap-in-when-there's-plenty-of-RAM would be useful.). That's a good idea, actually. So, in any case, the current situation is fine as it is, I'd say: people can decide whether they want to drop caches before suspending or not. Problem solved. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon 2012-10-29 10:58:19, Borislav Petkov wrote: > On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote: > > You might or might not want to do that. Dropping caches around suspend > > makes the hibernation process itself faster, but the realtime response > > of the applications afterwards is worse, as everything touched by user > > has to be paged in again. Also note that page-in is slower than reading hibernation image, because it is not compressed, and involves seeking. > Right, do you know of a real use-case where people hibernate, then > resume and still care about applications response time right afterwards? Hmm? When I resume from hibernate, I want to use my machine. *Everyone* cares about resume time afterwards. You move your mouse, and you don't want to wait for X to be paged-in. > Besides, once everything is swapped back in, perf. is back to normal, > i.e. like before suspending. Kernel will not normally swap anything in automatically. Some people do swapoff -a; swapon -a to work around that. (And yes, maybe some automatic-swap-in-when-there's-plenty-of-RAM would be useful.). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon 2012-10-29 10:58:19, Borislav Petkov wrote: On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote: You might or might not want to do that. Dropping caches around suspend makes the hibernation process itself faster, but the realtime response of the applications afterwards is worse, as everything touched by user has to be paged in again. Also note that page-in is slower than reading hibernation image, because it is not compressed, and involves seeking. Right, do you know of a real use-case where people hibernate, then resume and still care about applications response time right afterwards? Hmm? When I resume from hibernate, I want to use my machine. *Everyone* cares about resume time afterwards. You move your mouse, and you don't want to wait for X to be paged-in. Besides, once everything is swapped back in, perf. is back to normal, i.e. like before suspending. Kernel will not normally swap anything in automatically. Some people do swapoff -a; swapon -a to work around that. (And yes, maybe some automatic-swap-in-when-there's-plenty-of-RAM would be useful.). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 31, 2012 at 06:31:54PM +0100, Pavel Machek wrote: Hmm? When I resume from hibernate, I want to use my machine. Well, in my case with a workstation with 8 Gb, the only time the swapin is noticeable is when I try to use firefox with a couple of dozens tabs open. Once that thing is swapped in, system perf is back to normal. I'll bet that even this slowdown would disappear if I use an SSD. But I can imagine some workloads where swapping everything back in could be discomforting. Kernel will not normally swap anything in automatically. Some people do swapoff -a; swapon -a to work around that. (And yes, maybe some automatic-swap-in-when-there's-plenty-of-RAM would be useful.). That's a good idea, actually. So, in any case, the current situation is fine as it is, I'd say: people can decide whether they want to drop caches before suspending or not. Problem solved. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, Oct 29, 2012 at 11:01:59AM +0100, Jiri Kosina wrote: > Well if the point of dropping caches is lowering the resume time, then > the point is rendered moot as soon as you switch to your browser and > have to wait noticeable amount of time until it starts reacting. Not the resume time - the suspend time. If, say, one has 8Gb of memory and Linux nicely spreads all over it in caches, you don't want to wait too long for the suspend image creation. And nowadays, since you can have 8Gb in a laptop, you really want to keep that image minimal so that suspend-to-disk is quick. The penalty of faulting everything back in is a cost we'd be willing to pay, I guess. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, 29 Oct 2012, Borislav Petkov wrote: > > You might or might not want to do that. Dropping caches around suspend > > makes the hibernation process itself faster, but the realtime response > > of the applications afterwards is worse, as everything touched by user > > has to be paged in again. > > Right, do you know of a real use-case where people hibernate, then > resume and still care about applications response time right afterwards? Well if the point of dropping caches is lowering the resume time, then the point is rendered moot as soon as you switch to your browser and have to wait noticeable amount of time until it starts reacting. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote: > You might or might not want to do that. Dropping caches around suspend > makes the hibernation process itself faster, but the realtime response > of the applications afterwards is worse, as everything touched by user > has to be paged in again. Right, do you know of a real use-case where people hibernate, then resume and still care about applications response time right afterwards? Besides, once everything is swapped back in, perf. is back to normal, i.e. like before suspending. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012, Andrew Morton wrote: > > > > I have drop_caches in my suspend-to-disk script so that the hibernation > > > > image is kept at minimum and suspend times are as small as possible. > > > > > > hm, that sounds smart. > > > > > > > Would that be a valid use-case? > > > > > > I'd say so, unless we change the kernel to do that internally. We do > > > have the hibernation-specific shrink_all_memory() in the vmscan code. > > > We didn't see fit to document _why_ that exists, but IIRC it's there to > > > create enough free memory for hibernation to be able to successfully > > > complete, but no more. > > > > That's correct. > > Well, my point was: how about the idea of reclaiming clean pagecache > (and inodes, dentries, etc) before hibernation so we read/write less > disk data? You might or might not want to do that. Dropping caches around suspend makes the hibernation process itself faster, but the realtime response of the applications afterwards is worse, as everything touched by user has to be paged in again. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012, Andrew Morton wrote: I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. That's correct. Well, my point was: how about the idea of reclaiming clean pagecache (and inodes, dentries, etc) before hibernation so we read/write less disk data? You might or might not want to do that. Dropping caches around suspend makes the hibernation process itself faster, but the realtime response of the applications afterwards is worse, as everything touched by user has to be paged in again. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, Oct 29, 2012 at 09:59:59AM +0100, Jiri Kosina wrote: You might or might not want to do that. Dropping caches around suspend makes the hibernation process itself faster, but the realtime response of the applications afterwards is worse, as everything touched by user has to be paged in again. Right, do you know of a real use-case where people hibernate, then resume and still care about applications response time right afterwards? Besides, once everything is swapped back in, perf. is back to normal, i.e. like before suspending. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, 29 Oct 2012, Borislav Petkov wrote: You might or might not want to do that. Dropping caches around suspend makes the hibernation process itself faster, but the realtime response of the applications afterwards is worse, as everything touched by user has to be paged in again. Right, do you know of a real use-case where people hibernate, then resume and still care about applications response time right afterwards? Well if the point of dropping caches is lowering the resume time, then the point is rendered moot as soon as you switch to your browser and have to wait noticeable amount of time until it starts reacting. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Mon, Oct 29, 2012 at 11:01:59AM +0100, Jiri Kosina wrote: Well if the point of dropping caches is lowering the resume time, then the point is rendered moot as soon as you switch to your browser and have to wait noticeable amount of time until it starts reacting. Not the resume time - the suspend time. If, say, one has 8Gb of memory and Linux nicely spreads all over it in caches, you don't want to wait too long for the suspend image creation. And nowadays, since you can have 8Gb in a laptop, you really want to keep that image minimal so that suspend-to-disk is quick. The penalty of faulting everything back in is a cost we'd be willing to pay, I guess. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > Dave Hansen wrote: > > What kind of interface _is_ it in the first place? Is it really a > > production-level thing that we expect users to be poking at? Or, is it > > a rarely-used debugging and benchmarking knob which is fair game for us > > to tweak like this? > > It was a rarely-used mainly-developer-only thing which, apparently, real > people found useful at some point in the past. Perhaps we should never > have offered it. I've found it useful on occasion when generating large public keys. When key generation hangs due to not-enough-entropy, dropping all caches (followed by an intensive read) has allowed the system to collect enough entropy to let the key generation finish. Usefulness of the trick is probably going the way of the dodo, thanks to SSD's becoming more common. -- Mika Boström Individualisti, eksistentialisti, www.iki.fi/bostik rationalisti ja mulkvisti GPG: 0x2AED22CC; 6FC9 8375 31B7 3BA2 B5DC 484E F19F 8AD6 2AED 22CC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Dave Hansen d...@linux.vnet.ibm.com wrote: What kind of interface _is_ it in the first place? Is it really a production-level thing that we expect users to be poking at? Or, is it a rarely-used debugging and benchmarking knob which is fair game for us to tweak like this? It was a rarely-used mainly-developer-only thing which, apparently, real people found useful at some point in the past. Perhaps we should never have offered it. I've found it useful on occasion when generating large public keys. When key generation hangs due to not-enough-entropy, dropping all caches (followed by an intensive read) has allowed the system to collect enough entropy to let the key generation finish. Usefulness of the trick is probably going the way of the dodo, thanks to SSD's becoming more common. -- Mika Boström Individualisti, eksistentialisti, www.iki.fi/bostik rationalisti ja mulkvisti GPG: 0x2AED22CC; 6FC9 8375 31B7 3BA2 B5DC 484E F19F 8AD6 2AED 22CC -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wednesday, October 24, 2012 06:17:52 PM Andrew Morton wrote: > On Thu, 25 Oct 2012 00:04:46 +0200 "Rafael J. Wysocki" wrote: > > > On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: > > > On Wed, 24 Oct 2012 23:06:00 +0200 > > > Borislav Petkov wrote: > > > > > > > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > > > > > Well who knows. Could be that people's vm *does* suck. Or they have > > > > > some particularly peculiar worklosd or requirement[*]. Or their VM > > > > > *used* to suck, and the drop_caches is not really needed any more but > > > > > it's there in vendor-provided code and they can't practically prevent > > > > > it. > > > > > > > > I have drop_caches in my suspend-to-disk script so that the hibernation > > > > image is kept at minimum and suspend times are as small as possible. > > > > > > hm, that sounds smart. > > > > > > > Would that be a valid use-case? > > > > > > I'd say so, unless we change the kernel to do that internally. We do > > > have the hibernation-specific shrink_all_memory() in the vmscan code. > > > We didn't see fit to document _why_ that exists, but IIRC it's there to > > > create enough free memory for hibernation to be able to successfully > > > complete, but no more. > > > > That's correct. > > Well, my point was: how about the idea of reclaiming clean pagecache > (and inodes, dentries, etc) before hibernation so we read/write less > disk data? We may actually want to write more into the image to improve post-resume responsiveness. > Given that it's so easy to do from the hibernation script, I guess > there's not much point... Well, I'd say so. :-) -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu 25-10-12 04:57:11, Dave Hansen wrote: [...] > Here's the problem: Joe Kernel Developer gets a bug report, usually > something like "the kernel is slow", or "the kernel is eating up all my > memory". We then start going and digging in to the problem with the > usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, > but less likely, that we get things like vmstat along with such a bug > report. > > Joe Kernel Developer digs in the statistics or the dmesg and tries to > figure out what happened. I've run in to a couple of cases in practice > (and I assume Michal has too) where the bug reporter was using > drop_caches _heavily_ and did not realize the implications. It was > quite hard to track down exactly how the page cache and dentries/inodes > were getting purged. Yes, very same here. Not that I would meet issues like that often but it happened in the past few times and it was always a lot of burnt time. > There are rarely oopses involved in these scenarios. > > The primary goal of this patch is to make debugging those scenarios > easier so that we can quickly realize that drop_caches is the reason our > caches went away, not some anomalous VM activity. A secondary goal is > to tell the user: "Hey, maybe this isn't something you want to be doing > all the time." -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu, Oct 25, 2012 at 04:57:11AM -0700, Dave Hansen wrote: > On 10/25/2012 02:24 AM, Borislav Petkov wrote: > > But let's discuss this a bit further. So, for the benchmarking aspect, > > you're either going to have to always require dmesg along with > > benchmarking results or /proc/vmstat, depending on where the drop_caches > > stats end up. > > > > Is this how you envision it? > > > > And then there are the VM bug cases, where you might not always get > > full dmesg from a panicked system. In that case, you'd want the kernel > > tainting thing too, so that it at least appears in the oops backtrace. > > > > Although the tainting thing might not be enough - a user could > > drop_caches at some point in time and the oops happening much later > > could be unrelated but that can't be expressed in taint flags. > > Here's the problem: Joe Kernel Developer gets a bug report, usually > something like "the kernel is slow", or "the kernel is eating up all my > memory". We then start going and digging in to the problem with the > usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, > but less likely, that we get things like vmstat along with such a bug > report. > > Joe Kernel Developer digs in the statistics or the dmesg and tries to > figure out what happened. I've run in to a couple of cases in practice > (and I assume Michal has too) where the bug reporter was using > drop_caches _heavily_ and did not realize the implications. It was > quite hard to track down exactly how the page cache and dentries/inodes > were getting purged. > > There are rarely oopses involved in these scenarios. > > The primary goal of this patch is to make debugging those scenarios > easier so that we can quickly realize that drop_caches is the reason our > caches went away, not some anomalous VM activity. A secondary goal is > to tell the user: "Hey, maybe this isn't something you want to be doing > all the time." Ok, understood. So you will be requiring dmesg, ok, then it makes sense. This way you're also getting timestamps of when exactly and how many times drop_caches was used. For that, though, you'll need to add the timestamp explicitly to the printk because CONFIG_PRINTK_TIME is not always enabled. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed 24-10-12 18:35:43, KOSAKI Motohiro wrote: > >> I have drop_caches in my suspend-to-disk script so that the hibernation > >> image is kept at minimum and suspend times are as small as possible. > > > > hm, that sounds smart. > > > >> Would that be a valid use-case? > > > > I'd say so, unless we change the kernel to do that internally. We do > > have the hibernation-specific shrink_all_memory() in the vmscan code. > > We didn't see fit to document _why_ that exists, but IIRC it's there to > > create enough free memory for hibernation to be able to successfully > > complete, but no more. > > shrink_all_memory() drop minimum memory to be needed from hibernation. > that's trade off matter. > > - drop all page cache > pros. >speed up hibernation time > cons. >after go back from hibernation, system works very slow a while until >system will get enough file cache. > > - drop minimum page cache > pros. >system works quickly when go back from hibernation. > cons. >relative large hibernation time > > > So, I'm not fun change hibernation default. hmmm... Does adding > tracepint instead of printk makes sense? I guess you mean trace_printk. I have seen that one for debugging purposes only but it seems like it could be used here. CONFIG_TRACING seems to be enabled on the most distribution kernels. I am just worried it needs debugfs mounted and my recollection is that this has some security implications so there might be some pushback on mounting it on production systems which would defeat the primary motivation. Maybe this concern is not that important wrt. excessive logging, though. I can live with this solution as well if people really hate logging approach. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed 24-10-12 12:54:39, Andrew Morton wrote: > On Wed, 24 Oct 2012 08:29:45 +0200 > Michal Hocko wrote: [...] > hmpf. This patch worries me. If there are people out there who are > regularly using drop_caches because the VM sucks, it seems pretty > obnoxious of us to go dumping stuff into their syslog. What are they > supposed to do? Stop using drop_caches? But that would unfix the > problem which they fixed with drop_caches in the first case. > > And they might not even have control over the code - they need to go > back to their supplier and say "please send me a new version", along > with all the additional costs and risks involed in an update. I understand your worries and that's why I suggested a higher log level which is under admin's control. Does even that sound too excessive? > > > More friendly alternatives might be: > > > > > > - Taint the kernel. But that will only become apparent with an oops > > > trace or similar. > > > > > > - Add a drop_caches counter and make that available in /proc/vmstat, > > > show_mem() output and perhaps other places. > > > > We would loose timing and originating process name in both cases which > > can be really helpful while debugging. It is fair to say that we could > > deduce the timing if we are collecting /proc/meminfo or /proc/vmstat > > already and we do collect them often but this is not the case all of the > > time and sometimes it is important to know _who_ is doing all this. > > But how important is all that? The main piece of information the > kernel developer wants is "this guy is using drop_caches a lot". All > the other info is peripheral and can be gathered by other means if so > desired. Well, I have experienced a debugging session where I suspected that an excessive drop_caches is going on but I had hard time to prove who is doing that (customer, of course, claimed they are not doing anything like that) so we went through many loops until we could point the finger. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/25/2012 02:24 AM, Borislav Petkov wrote: > But let's discuss this a bit further. So, for the benchmarking aspect, > you're either going to have to always require dmesg along with > benchmarking results or /proc/vmstat, depending on where the drop_caches > stats end up. > > Is this how you envision it? > > And then there are the VM bug cases, where you might not always get > full dmesg from a panicked system. In that case, you'd want the kernel > tainting thing too, so that it at least appears in the oops backtrace. > > Although the tainting thing might not be enough - a user could > drop_caches at some point in time and the oops happening much later > could be unrelated but that can't be expressed in taint flags. Here's the problem: Joe Kernel Developer gets a bug report, usually something like "the kernel is slow", or "the kernel is eating up all my memory". We then start going and digging in to the problem with the usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, but less likely, that we get things like vmstat along with such a bug report. Joe Kernel Developer digs in the statistics or the dmesg and tries to figure out what happened. I've run in to a couple of cases in practice (and I assume Michal has too) where the bug reporter was using drop_caches _heavily_ and did not realize the implications. It was quite hard to track down exactly how the page cache and dentries/inodes were getting purged. There are rarely oopses involved in these scenarios. The primary goal of this patch is to make debugging those scenarios easier so that we can quickly realize that drop_caches is the reason our caches went away, not some anomalous VM activity. A secondary goal is to tell the user: "Hey, maybe this isn't something you want to be doing all the time." -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 08:56:45PM -0400, KOSAKI Motohiro wrote: > > That effectively means removing it from the kernel since distros ship > > with those config options off. We don't want to do that since there > > _are_ valid, occasional uses like benchmarking that we want to be > > consistent. > > Agreed. we don't want to remove valid interface never. Ok, duly noted. But let's discuss this a bit further. So, for the benchmarking aspect, you're either going to have to always require dmesg along with benchmarking results or /proc/vmstat, depending on where the drop_caches stats end up. Is this how you envision it? And then there are the VM bug cases, where you might not always get full dmesg from a panicked system. In that case, you'd want the kernel tainting thing too, so that it at least appears in the oops backtrace. Although the tainting thing might not be enough - a user could drop_caches at some point in time and the oops happening much later could be unrelated but that can't be expressed in taint flags. So you'd need some sort of a drop_caches counter, I'd guess. Or a last drop_caches timestamp something. Am I understanding the intent correctly? Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 08:56:45PM -0400, KOSAKI Motohiro wrote: That effectively means removing it from the kernel since distros ship with those config options off. We don't want to do that since there _are_ valid, occasional uses like benchmarking that we want to be consistent. Agreed. we don't want to remove valid interface never. Ok, duly noted. But let's discuss this a bit further. So, for the benchmarking aspect, you're either going to have to always require dmesg along with benchmarking results or /proc/vmstat, depending on where the drop_caches stats end up. Is this how you envision it? And then there are the VM bug cases, where you might not always get full dmesg from a panicked system. In that case, you'd want the kernel tainting thing too, so that it at least appears in the oops backtrace. Although the tainting thing might not be enough - a user could drop_caches at some point in time and the oops happening much later could be unrelated but that can't be expressed in taint flags. So you'd need some sort of a drop_caches counter, I'd guess. Or a last drop_caches timestamp something. Am I understanding the intent correctly? Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/25/2012 02:24 AM, Borislav Petkov wrote: But let's discuss this a bit further. So, for the benchmarking aspect, you're either going to have to always require dmesg along with benchmarking results or /proc/vmstat, depending on where the drop_caches stats end up. Is this how you envision it? And then there are the VM bug cases, where you might not always get full dmesg from a panicked system. In that case, you'd want the kernel tainting thing too, so that it at least appears in the oops backtrace. Although the tainting thing might not be enough - a user could drop_caches at some point in time and the oops happening much later could be unrelated but that can't be expressed in taint flags. Here's the problem: Joe Kernel Developer gets a bug report, usually something like the kernel is slow, or the kernel is eating up all my memory. We then start going and digging in to the problem with the usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, but less likely, that we get things like vmstat along with such a bug report. Joe Kernel Developer digs in the statistics or the dmesg and tries to figure out what happened. I've run in to a couple of cases in practice (and I assume Michal has too) where the bug reporter was using drop_caches _heavily_ and did not realize the implications. It was quite hard to track down exactly how the page cache and dentries/inodes were getting purged. There are rarely oopses involved in these scenarios. The primary goal of this patch is to make debugging those scenarios easier so that we can quickly realize that drop_caches is the reason our caches went away, not some anomalous VM activity. A secondary goal is to tell the user: Hey, maybe this isn't something you want to be doing all the time. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed 24-10-12 12:54:39, Andrew Morton wrote: On Wed, 24 Oct 2012 08:29:45 +0200 Michal Hocko mho...@suse.cz wrote: [...] hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? But that would unfix the problem which they fixed with drop_caches in the first case. And they might not even have control over the code - they need to go back to their supplier and say please send me a new version, along with all the additional costs and risks involed in an update. I understand your worries and that's why I suggested a higher log level which is under admin's control. Does even that sound too excessive? More friendly alternatives might be: - Taint the kernel. But that will only become apparent with an oops trace or similar. - Add a drop_caches counter and make that available in /proc/vmstat, show_mem() output and perhaps other places. We would loose timing and originating process name in both cases which can be really helpful while debugging. It is fair to say that we could deduce the timing if we are collecting /proc/meminfo or /proc/vmstat already and we do collect them often but this is not the case all of the time and sometimes it is important to know _who_ is doing all this. But how important is all that? The main piece of information the kernel developer wants is this guy is using drop_caches a lot. All the other info is peripheral and can be gathered by other means if so desired. Well, I have experienced a debugging session where I suspected that an excessive drop_caches is going on but I had hard time to prove who is doing that (customer, of course, claimed they are not doing anything like that) so we went through many loops until we could point the finger. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed 24-10-12 18:35:43, KOSAKI Motohiro wrote: I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. shrink_all_memory() drop minimum memory to be needed from hibernation. that's trade off matter. - drop all page cache pros. speed up hibernation time cons. after go back from hibernation, system works very slow a while until system will get enough file cache. - drop minimum page cache pros. system works quickly when go back from hibernation. cons. relative large hibernation time So, I'm not fun change hibernation default. hmmm... Does adding tracepint instead of printk makes sense? I guess you mean trace_printk. I have seen that one for debugging purposes only but it seems like it could be used here. CONFIG_TRACING seems to be enabled on the most distribution kernels. I am just worried it needs debugfs mounted and my recollection is that this has some security implications so there might be some pushback on mounting it on production systems which would defeat the primary motivation. Maybe this concern is not that important wrt. excessive logging, though. I can live with this solution as well if people really hate logging approach. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu, Oct 25, 2012 at 04:57:11AM -0700, Dave Hansen wrote: On 10/25/2012 02:24 AM, Borislav Petkov wrote: But let's discuss this a bit further. So, for the benchmarking aspect, you're either going to have to always require dmesg along with benchmarking results or /proc/vmstat, depending on where the drop_caches stats end up. Is this how you envision it? And then there are the VM bug cases, where you might not always get full dmesg from a panicked system. In that case, you'd want the kernel tainting thing too, so that it at least appears in the oops backtrace. Although the tainting thing might not be enough - a user could drop_caches at some point in time and the oops happening much later could be unrelated but that can't be expressed in taint flags. Here's the problem: Joe Kernel Developer gets a bug report, usually something like the kernel is slow, or the kernel is eating up all my memory. We then start going and digging in to the problem with the usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, but less likely, that we get things like vmstat along with such a bug report. Joe Kernel Developer digs in the statistics or the dmesg and tries to figure out what happened. I've run in to a couple of cases in practice (and I assume Michal has too) where the bug reporter was using drop_caches _heavily_ and did not realize the implications. It was quite hard to track down exactly how the page cache and dentries/inodes were getting purged. There are rarely oopses involved in these scenarios. The primary goal of this patch is to make debugging those scenarios easier so that we can quickly realize that drop_caches is the reason our caches went away, not some anomalous VM activity. A secondary goal is to tell the user: Hey, maybe this isn't something you want to be doing all the time. Ok, understood. So you will be requiring dmesg, ok, then it makes sense. This way you're also getting timestamps of when exactly and how many times drop_caches was used. For that, though, you'll need to add the timestamp explicitly to the printk because CONFIG_PRINTK_TIME is not always enabled. Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu 25-10-12 04:57:11, Dave Hansen wrote: [...] Here's the problem: Joe Kernel Developer gets a bug report, usually something like the kernel is slow, or the kernel is eating up all my memory. We then start going and digging in to the problem with the usual tools. We almost *ALWAYS* get dmesg, and it's reasonably common, but less likely, that we get things like vmstat along with such a bug report. Joe Kernel Developer digs in the statistics or the dmesg and tries to figure out what happened. I've run in to a couple of cases in practice (and I assume Michal has too) where the bug reporter was using drop_caches _heavily_ and did not realize the implications. It was quite hard to track down exactly how the page cache and dentries/inodes were getting purged. Yes, very same here. Not that I would meet issues like that often but it happened in the past few times and it was always a lot of burnt time. There are rarely oopses involved in these scenarios. The primary goal of this patch is to make debugging those scenarios easier so that we can quickly realize that drop_caches is the reason our caches went away, not some anomalous VM activity. A secondary goal is to tell the user: Hey, maybe this isn't something you want to be doing all the time. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wednesday, October 24, 2012 06:17:52 PM Andrew Morton wrote: On Thu, 25 Oct 2012 00:04:46 +0200 Rafael J. Wysocki r...@sisk.pl wrote: On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: On Wed, 24 Oct 2012 23:06:00 +0200 Borislav Petkov b...@alien8.de wrote: On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. That's correct. Well, my point was: how about the idea of reclaiming clean pagecache (and inodes, dentries, etc) before hibernation so we read/write less disk data? We may actually want to write more into the image to improve post-resume responsiveness. Given that it's so easy to do from the hibernation script, I guess there's not much point... Well, I'd say so. :-) -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu, 25 Oct 2012 00:04:46 +0200 "Rafael J. Wysocki" wrote: > On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: > > On Wed, 24 Oct 2012 23:06:00 +0200 > > Borislav Petkov wrote: > > > > > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > > > > Well who knows. Could be that people's vm *does* suck. Or they have > > > > some particularly peculiar worklosd or requirement[*]. Or their VM > > > > *used* to suck, and the drop_caches is not really needed any more but > > > > it's there in vendor-provided code and they can't practically prevent > > > > it. > > > > > > I have drop_caches in my suspend-to-disk script so that the hibernation > > > image is kept at minimum and suspend times are as small as possible. > > > > hm, that sounds smart. > > > > > Would that be a valid use-case? > > > > I'd say so, unless we change the kernel to do that internally. We do > > have the hibernation-specific shrink_all_memory() in the vmscan code. > > We didn't see fit to document _why_ that exists, but IIRC it's there to > > create enough free memory for hibernation to be able to successfully > > complete, but no more. > > That's correct. Well, my point was: how about the idea of reclaiming clean pagecache (and inodes, dentries, etc) before hibernation so we read/write less disk data? Given that it's so easy to do from the hibernation script, I guess there's not much point... -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 6:57 PM, Dave Hansen wrote: > On 10/24/2012 03:48 PM, Borislav Petkov wrote: >> On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: >>> Sounds fairly valid to me. But, it's also one that would not be harmed >>> or disrupted in any way because of a single additional printk() during >>> each suspend-to-disk operation. >> >> back to the drop_caches patch. How about we hide the drop_caches >> interface behind some mm debugging option in "Kernel Hacking"? Assuming >> we don't need it otherwise on production kernels. Probably make it >> depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. >> >> And then also add it to /proc/vmstat, in addition. > > That effectively means removing it from the kernel since distros ship > with those config options off. We don't want to do that since there > _are_ valid, occasional uses like benchmarking that we want to be > consistent. Agreed. we don't want to remove valid interface never. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 03:48 PM, Borislav Petkov wrote: > On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: >> Sounds fairly valid to me. But, it's also one that would not be harmed >> or disrupted in any way because of a single additional printk() during >> each suspend-to-disk operation. > > back to the drop_caches patch. How about we hide the drop_caches > interface behind some mm debugging option in "Kernel Hacking"? Assuming > we don't need it otherwise on production kernels. Probably make it > depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. > > And then also add it to /proc/vmstat, in addition. That effectively means removing it from the kernel since distros ship with those config options off. We don't want to do that since there _are_ valid, occasional uses like benchmarking that we want to be consistent. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: > Sounds fairly valid to me. But, it's also one that would not be harmed > or disrupted in any way because of a single additional printk() during > each suspend-to-disk operation. Btw, back to the drop_caches patch. How about we hide the drop_caches interface behind some mm debugging option in "Kernel Hacking"? Assuming we don't need it otherwise on production kernels. Probably make it depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. And then also add it to /proc/vmstat, in addition. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
>> I have drop_caches in my suspend-to-disk script so that the hibernation >> image is kept at minimum and suspend times are as small as possible. > > hm, that sounds smart. > >> Would that be a valid use-case? > > I'd say so, unless we change the kernel to do that internally. We do > have the hibernation-specific shrink_all_memory() in the vmscan code. > We didn't see fit to document _why_ that exists, but IIRC it's there to > create enough free memory for hibernation to be able to successfully > complete, but no more. shrink_all_memory() drop minimum memory to be needed from hibernation. that's trade off matter. - drop all page cache pros. speed up hibernation time cons. after go back from hibernation, system works very slow a while until system will get enough file cache. - drop minimum page cache pros. system works quickly when go back from hibernation. cons. relative large hibernation time So, I'm not fun change hibernation default. hmmm... Does adding tracepint instead of printk makes sense? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: > On Wed, 24 Oct 2012 23:06:00 +0200 > Borislav Petkov wrote: > > > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > > > Well who knows. Could be that people's vm *does* suck. Or they have > > > some particularly peculiar worklosd or requirement[*]. Or their VM > > > *used* to suck, and the drop_caches is not really needed any more but > > > it's there in vendor-provided code and they can't practically prevent > > > it. > > > > I have drop_caches in my suspend-to-disk script so that the hibernation > > image is kept at minimum and suspend times are as small as possible. > > hm, that sounds smart. > > > Would that be a valid use-case? > > I'd say so, unless we change the kernel to do that internally. We do > have the hibernation-specific shrink_all_memory() in the vmscan code. > We didn't see fit to document _why_ that exists, but IIRC it's there to > create enough free memory for hibernation to be able to successfully > complete, but no more. That's correct. > Who owns hibernaton nowadays? Rafael, I guess? I'm still maintaining it. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 02:06 PM, Borislav Petkov wrote: > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: >> Well who knows. Could be that people's vm *does* suck. Or they have >> some particularly peculiar worklosd or requirement[*]. Or their VM >> *used* to suck, and the drop_caches is not really needed any more but >> it's there in vendor-provided code and they can't practically prevent >> it. > > I have drop_caches in my suspend-to-disk script so that the hibernation > image is kept at minimum and suspend times are as small as possible. > > Would that be a valid use-case? Sounds fairly valid to me. But, it's also one that would not be harmed or disrupted in any way because of a single additional printk() during each suspend-to-disk operation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 23:06:00 +0200 Borislav Petkov wrote: > On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > > Well who knows. Could be that people's vm *does* suck. Or they have > > some particularly peculiar worklosd or requirement[*]. Or their VM > > *used* to suck, and the drop_caches is not really needed any more but > > it's there in vendor-provided code and they can't practically prevent > > it. > > I have drop_caches in my suspend-to-disk script so that the hibernation > image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. > Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. Who owns hibernaton nowadays? Rafael, I guess? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: > Well who knows. Could be that people's vm *does* suck. Or they have > some particularly peculiar worklosd or requirement[*]. Or their VM > *used* to suck, and the drop_caches is not really needed any more but > it's there in vendor-provided code and they can't practically prevent > it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. Would that be a valid use-case? -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 13:28:19 -0700 Dave Hansen wrote: > On 10/24/2012 12:54 PM, Andrew Morton wrote: > > hmpf. This patch worries me. If there are people out there who are > > regularly using drop_caches because the VM sucks, it seems pretty > > obnoxious of us to go dumping stuff into their syslog. What are they > > supposed to do? Stop using drop_caches? > > People use drop_caches because they _think_ the VM sucks, or they > _think_ they're "tuning" their system. _They_ are supposed to stop > using drop_caches. :) Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. [*] If your workload consists of having to handle large bursts of data with minimum latency and then waiting around for another burst, it makes sense to drop all your cached data between bursts. > What kind of interface _is_ it in the first place? Is it really a > production-level thing that we expect users to be poking at? Or, is it > a rarely-used debugging and benchmarking knob which is fair game for us > to tweak like this? It was a rarely-used mainly-developer-only thing which, apparently, real people found useful at some point in the past. Perhaps we should never have offered it. > Do we have any valid uses of drop_caches where the printk() would truly > _be_ disruptive? Are those cases where we _also_ have real kernel bugs > or issues that we should be working? If it disrupts them and they go to > their vendor or the community directly, it gives us at least a shot at > fixing the real problems (or fixing the "invalid" use). Heaven knows - I'm just going from what Michal has told me and various rumors which keep surfacing on the internet ;) > Adding taint, making this a single-shot printk, or adding vmstat > counters are all good ideas. I guess I think the disruption is a > feature because I hope it will draw some folks out of the woodwork. I had a "send mail to a...@zip.com.au" printk in 3c59x.c many years ago. For about two months. It took *years* before I stopped getting emails ;) Gee, I dunno. I have issues with it :( We could do printk_ratelimited(one-hour) but I suspect that would defeat Michal's purpose. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 12:54 PM, Andrew Morton wrote: > hmpf. This patch worries me. If there are people out there who are > regularly using drop_caches because the VM sucks, it seems pretty > obnoxious of us to go dumping stuff into their syslog. What are they > supposed to do? Stop using drop_caches? People use drop_caches because they _think_ the VM sucks, or they _think_ they're "tuning" their system. _They_ are supposed to stop using drop_caches. :) What kind of interface _is_ it in the first place? Is it really a production-level thing that we expect users to be poking at? Or, is it a rarely-used debugging and benchmarking knob which is fair game for us to tweak like this? Do we have any valid uses of drop_caches where the printk() would truly _be_ disruptive? Are those cases where we _also_ have real kernel bugs or issues that we should be working? If it disrupts them and they go to their vendor or the community directly, it gives us at least a shot at fixing the real problems (or fixing the "invalid" use). Adding taint, making this a single-shot printk, or adding vmstat counters are all good ideas. I guess I think the disruption is a feature because I hope it will draw some folks out of the woodwork. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 08:29:45 +0200 Michal Hocko wrote: > > > > > > + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n", > > > + current->comm, task_pid_nr(current), > > > sysctl_drop_caches); > > > > urgh. Are we really sure we want to do this? The system operators who > > are actually using this thing will hate us :( > > I have no problems with lowering the priority (how do you see > KERN_INFO?) but shouldn't this message kick them that they are doing > something wrong? Or if somebody uses that for "benchmarking" to have a > clean table before start is this really that invasive? hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? But that would unfix the problem which they fixed with drop_caches in the first case. And they might not even have control over the code - they need to go back to their supplier and say "please send me a new version", along with all the additional costs and risks involed in an update. > > More friendly alternatives might be: > > > > - Taint the kernel. But that will only become apparent with an oops > > trace or similar. > > > > - Add a drop_caches counter and make that available in /proc/vmstat, > > show_mem() output and perhaps other places. > > We would loose timing and originating process name in both cases which > can be really helpful while debugging. It is fair to say that we could > deduce the timing if we are collecting /proc/meminfo or /proc/vmstat > already and we do collect them often but this is not the case all of the > time and sometimes it is important to know _who_ is doing all this. But how important is all that? The main piece of information the kernel developer wants is "this guy is using drop_caches a lot". All the other info is peripheral and can be gathered by other means if so desired. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Tue 23-10-12 16:45:46, Andrew Morton wrote: > On Fri, 12 Oct 2012 14:57:08 +0200 > Michal Hocko wrote: > > > Hi, > > I would like to resurrect the following Dave's patch. The last time it > > has been posted was here https://lkml.org/lkml/2010/9/16/250 and there > > didn't seem to be any strong opposition. > > Kosaki was worried about possible excessive logging when somebody drops > > caches too often (but then he claimed he didn't have a strong opinion > > on that) but I would say opposite. If somebody does that then I would > > really like to know that from the log when supporting a system because > > it almost for sure means that there is something fishy going on. It is > > also worth mentioning that only root can write drop caches so this is > > not an flooding attack vector. > > I am bringing that up again because this can be really helpful when > > chasing strange performance issues which (surprise surprise) turn out to > > be related to artificially dropped caches done because the admin thinks > > this would help... > > > > I have just refreshed the original patch on top of the current mm tree > > but I could live with KERN_INFO as well if people think that KERN_NOTICE > > is too hysterical. > > --- > > >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 > > From: Dave Hansen > > Date: Fri, 12 Oct 2012 14:30:54 +0200 > > Subject: [PATCH] add some drop_caches documentation and info messsge > > > > There is plenty of anecdotal evidence and a load of blog posts > > suggesting that using "drop_caches" periodically keeps your system > > running in "tip top shape". Perhaps adding some kernel > > documentation will increase the amount of accurate data on its use. > > > > If we are not shrinking caches effectively, then we have real bugs. > > Using drop_caches will simply mask the bugs and make them harder > > to find, but certainly does not fix them, nor is it an appropriate > > "workaround" to limit the size of the caches. > > > > It's a great debugging tool, and is really handy for doing things > > like repeatable benchmark runs. So, add a bit more documentation > > about it, and add a little KERN_NOTICE. It should help developers > > who are chasing down reclaim-related bugs. > > > > ... > > > > + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n", > > + current->comm, task_pid_nr(current), > > sysctl_drop_caches); > > urgh. Are we really sure we want to do this? The system operators who > are actually using this thing will hate us :( I have no problems with lowering the priority (how do you see KERN_INFO?) but shouldn't this message kick them that they are doing something wrong? Or if somebody uses that for "benchmarking" to have a clean table before start is this really that invasive? > More friendly alternatives might be: > > - Taint the kernel. But that will only become apparent with an oops > trace or similar. > > - Add a drop_caches counter and make that available in /proc/vmstat, > show_mem() output and perhaps other places. We would loose timing and originating process name in both cases which can be really helpful while debugging. It is fair to say that we could deduce the timing if we are collecting /proc/meminfo or /proc/vmstat already and we do collect them often but this is not the case all of the time and sometimes it is important to know _who_ is doing all this. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Tue 23-10-12 16:45:46, Andrew Morton wrote: On Fri, 12 Oct 2012 14:57:08 +0200 Michal Hocko mho...@suse.cz wrote: Hi, I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. I am bringing that up again because this can be really helpful when chasing strange performance issues which (surprise surprise) turn out to be related to artificially dropped caches done because the admin thinks this would help... I have just refreshed the original patch on top of the current mm tree but I could live with KERN_INFO as well if people think that KERN_NOTICE is too hysterical. --- From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 From: Dave Hansen d...@linux.vnet.ibm.com Date: Fri, 12 Oct 2012 14:30:54 +0200 Subject: [PATCH] add some drop_caches documentation and info messsge There is plenty of anecdotal evidence and a load of blog posts suggesting that using drop_caches periodically keeps your system running in tip top shape. Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate workaround to limit the size of the caches. It's a great debugging tool, and is really handy for doing things like repeatable benchmark runs. So, add a bit more documentation about it, and add a little KERN_NOTICE. It should help developers who are chasing down reclaim-related bugs. ... + printk(KERN_NOTICE %s (%d): dropped kernel caches: %d\n, + current-comm, task_pid_nr(current), sysctl_drop_caches); urgh. Are we really sure we want to do this? The system operators who are actually using this thing will hate us :( I have no problems with lowering the priority (how do you see KERN_INFO?) but shouldn't this message kick them that they are doing something wrong? Or if somebody uses that for benchmarking to have a clean table before start is this really that invasive? More friendly alternatives might be: - Taint the kernel. But that will only become apparent with an oops trace or similar. - Add a drop_caches counter and make that available in /proc/vmstat, show_mem() output and perhaps other places. We would loose timing and originating process name in both cases which can be really helpful while debugging. It is fair to say that we could deduce the timing if we are collecting /proc/meminfo or /proc/vmstat already and we do collect them often but this is not the case all of the time and sometimes it is important to know _who_ is doing all this. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 08:29:45 +0200 Michal Hocko mho...@suse.cz wrote: + printk(KERN_NOTICE %s (%d): dropped kernel caches: %d\n, + current-comm, task_pid_nr(current), sysctl_drop_caches); urgh. Are we really sure we want to do this? The system operators who are actually using this thing will hate us :( I have no problems with lowering the priority (how do you see KERN_INFO?) but shouldn't this message kick them that they are doing something wrong? Or if somebody uses that for benchmarking to have a clean table before start is this really that invasive? hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? But that would unfix the problem which they fixed with drop_caches in the first case. And they might not even have control over the code - they need to go back to their supplier and say please send me a new version, along with all the additional costs and risks involed in an update. More friendly alternatives might be: - Taint the kernel. But that will only become apparent with an oops trace or similar. - Add a drop_caches counter and make that available in /proc/vmstat, show_mem() output and perhaps other places. We would loose timing and originating process name in both cases which can be really helpful while debugging. It is fair to say that we could deduce the timing if we are collecting /proc/meminfo or /proc/vmstat already and we do collect them often but this is not the case all of the time and sometimes it is important to know _who_ is doing all this. But how important is all that? The main piece of information the kernel developer wants is this guy is using drop_caches a lot. All the other info is peripheral and can be gathered by other means if so desired. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 12:54 PM, Andrew Morton wrote: hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? People use drop_caches because they _think_ the VM sucks, or they _think_ they're tuning their system. _They_ are supposed to stop using drop_caches. :) What kind of interface _is_ it in the first place? Is it really a production-level thing that we expect users to be poking at? Or, is it a rarely-used debugging and benchmarking knob which is fair game for us to tweak like this? Do we have any valid uses of drop_caches where the printk() would truly _be_ disruptive? Are those cases where we _also_ have real kernel bugs or issues that we should be working? If it disrupts them and they go to their vendor or the community directly, it gives us at least a shot at fixing the real problems (or fixing the invalid use). Adding taint, making this a single-shot printk, or adding vmstat counters are all good ideas. I guess I think the disruption is a feature because I hope it will draw some folks out of the woodwork. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 13:28:19 -0700 Dave Hansen d...@linux.vnet.ibm.com wrote: On 10/24/2012 12:54 PM, Andrew Morton wrote: hmpf. This patch worries me. If there are people out there who are regularly using drop_caches because the VM sucks, it seems pretty obnoxious of us to go dumping stuff into their syslog. What are they supposed to do? Stop using drop_caches? People use drop_caches because they _think_ the VM sucks, or they _think_ they're tuning their system. _They_ are supposed to stop using drop_caches. :) Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. [*] If your workload consists of having to handle large bursts of data with minimum latency and then waiting around for another burst, it makes sense to drop all your cached data between bursts. What kind of interface _is_ it in the first place? Is it really a production-level thing that we expect users to be poking at? Or, is it a rarely-used debugging and benchmarking knob which is fair game for us to tweak like this? It was a rarely-used mainly-developer-only thing which, apparently, real people found useful at some point in the past. Perhaps we should never have offered it. Do we have any valid uses of drop_caches where the printk() would truly _be_ disruptive? Are those cases where we _also_ have real kernel bugs or issues that we should be working? If it disrupts them and they go to their vendor or the community directly, it gives us at least a shot at fixing the real problems (or fixing the invalid use). Heaven knows - I'm just going from what Michal has told me and various rumors which keep surfacing on the internet ;) Adding taint, making this a single-shot printk, or adding vmstat counters are all good ideas. I guess I think the disruption is a feature because I hope it will draw some folks out of the woodwork. I had a send mail to a...@zip.com.au printk in 3c59x.c many years ago. For about two months. It took *years* before I stopped getting emails ;) Gee, I dunno. I have issues with it :( We could do printk_ratelimited(one-hour) but I suspect that would defeat Michal's purpose. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. Would that be a valid use-case? -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, 24 Oct 2012 23:06:00 +0200 Borislav Petkov b...@alien8.de wrote: On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. Who owns hibernaton nowadays? Rafael, I guess? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 02:06 PM, Borislav Petkov wrote: On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. Would that be a valid use-case? Sounds fairly valid to me. But, it's also one that would not be harmed or disrupted in any way because of a single additional printk() during each suspend-to-disk operation. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: On Wed, 24 Oct 2012 23:06:00 +0200 Borislav Petkov b...@alien8.de wrote: On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. That's correct. Who owns hibernaton nowadays? Rafael, I guess? I'm still maintaining it. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. shrink_all_memory() drop minimum memory to be needed from hibernation. that's trade off matter. - drop all page cache pros. speed up hibernation time cons. after go back from hibernation, system works very slow a while until system will get enough file cache. - drop minimum page cache pros. system works quickly when go back from hibernation. cons. relative large hibernation time So, I'm not fun change hibernation default. hmmm... Does adding tracepint instead of printk makes sense? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: Sounds fairly valid to me. But, it's also one that would not be harmed or disrupted in any way because of a single additional printk() during each suspend-to-disk operation. Btw, back to the drop_caches patch. How about we hide the drop_caches interface behind some mm debugging option in Kernel Hacking? Assuming we don't need it otherwise on production kernels. Probably make it depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. And then also add it to /proc/vmstat, in addition. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/24/2012 03:48 PM, Borislav Petkov wrote: On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: Sounds fairly valid to me. But, it's also one that would not be harmed or disrupted in any way because of a single additional printk() during each suspend-to-disk operation. back to the drop_caches patch. How about we hide the drop_caches interface behind some mm debugging option in Kernel Hacking? Assuming we don't need it otherwise on production kernels. Probably make it depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. And then also add it to /proc/vmstat, in addition. That effectively means removing it from the kernel since distros ship with those config options off. We don't want to do that since there _are_ valid, occasional uses like benchmarking that we want to be consistent. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Wed, Oct 24, 2012 at 6:57 PM, Dave Hansen d...@linux.vnet.ibm.com wrote: On 10/24/2012 03:48 PM, Borislav Petkov wrote: On Wed, Oct 24, 2012 at 02:18:38PM -0700, Dave Hansen wrote: Sounds fairly valid to me. But, it's also one that would not be harmed or disrupted in any way because of a single additional printk() during each suspend-to-disk operation. back to the drop_caches patch. How about we hide the drop_caches interface behind some mm debugging option in Kernel Hacking? Assuming we don't need it otherwise on production kernels. Probably make it depend on CONFIG_DEBUG_VM like CONFIG_DEBUG_VM_RB or so. And then also add it to /proc/vmstat, in addition. That effectively means removing it from the kernel since distros ship with those config options off. We don't want to do that since there _are_ valid, occasional uses like benchmarking that we want to be consistent. Agreed. we don't want to remove valid interface never. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Thu, 25 Oct 2012 00:04:46 +0200 Rafael J. Wysocki r...@sisk.pl wrote: On Wednesday 24 of October 2012 14:13:03 Andrew Morton wrote: On Wed, 24 Oct 2012 23:06:00 +0200 Borislav Petkov b...@alien8.de wrote: On Wed, Oct 24, 2012 at 01:48:36PM -0700, Andrew Morton wrote: Well who knows. Could be that people's vm *does* suck. Or they have some particularly peculiar worklosd or requirement[*]. Or their VM *used* to suck, and the drop_caches is not really needed any more but it's there in vendor-provided code and they can't practically prevent it. I have drop_caches in my suspend-to-disk script so that the hibernation image is kept at minimum and suspend times are as small as possible. hm, that sounds smart. Would that be a valid use-case? I'd say so, unless we change the kernel to do that internally. We do have the hibernation-specific shrink_all_memory() in the vmscan code. We didn't see fit to document _why_ that exists, but IIRC it's there to create enough free memory for hibernation to be able to successfully complete, but no more. That's correct. Well, my point was: how about the idea of reclaiming clean pagecache (and inodes, dentries, etc) before hibernation so we read/write less disk data? Given that it's so easy to do from the hibernation script, I guess there's not much point... -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Fri, 12 Oct 2012 14:57:08 +0200 Michal Hocko wrote: > Hi, > I would like to resurrect the following Dave's patch. The last time it > has been posted was here https://lkml.org/lkml/2010/9/16/250 and there > didn't seem to be any strong opposition. > Kosaki was worried about possible excessive logging when somebody drops > caches too often (but then he claimed he didn't have a strong opinion > on that) but I would say opposite. If somebody does that then I would > really like to know that from the log when supporting a system because > it almost for sure means that there is something fishy going on. It is > also worth mentioning that only root can write drop caches so this is > not an flooding attack vector. > I am bringing that up again because this can be really helpful when > chasing strange performance issues which (surprise surprise) turn out to > be related to artificially dropped caches done because the admin thinks > this would help... > > I have just refreshed the original patch on top of the current mm tree > but I could live with KERN_INFO as well if people think that KERN_NOTICE > is too hysterical. > --- > >From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 > From: Dave Hansen > Date: Fri, 12 Oct 2012 14:30:54 +0200 > Subject: [PATCH] add some drop_caches documentation and info messsge > > There is plenty of anecdotal evidence and a load of blog posts > suggesting that using "drop_caches" periodically keeps your system > running in "tip top shape". Perhaps adding some kernel > documentation will increase the amount of accurate data on its use. > > If we are not shrinking caches effectively, then we have real bugs. > Using drop_caches will simply mask the bugs and make them harder > to find, but certainly does not fix them, nor is it an appropriate > "workaround" to limit the size of the caches. > > It's a great debugging tool, and is really handy for doing things > like repeatable benchmark runs. So, add a bit more documentation > about it, and add a little KERN_NOTICE. It should help developers > who are chasing down reclaim-related bugs. > > ... > > + printk(KERN_NOTICE "%s (%d): dropped kernel caches: %d\n", > + current->comm, task_pid_nr(current), > sysctl_drop_caches); urgh. Are we really sure we want to do this? The system operators who are actually using this thing will hate us :( More friendly alternatives might be: - Taint the kernel. But that will only become apparent with an oops trace or similar. - Add a drop_caches counter and make that available in /proc/vmstat, show_mem() output and perhaps other places. I suspect the /proc/vmstat counter will suffice - if someone is having vm issues, we'll be seeing their /proc/vmstat at some stage and if the drop_caches counter is high, that's enough to get suspicious? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Fri, 12 Oct 2012 14:57:08 +0200 Michal Hocko mho...@suse.cz wrote: Hi, I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. I am bringing that up again because this can be really helpful when chasing strange performance issues which (surprise surprise) turn out to be related to artificially dropped caches done because the admin thinks this would help... I have just refreshed the original patch on top of the current mm tree but I could live with KERN_INFO as well if people think that KERN_NOTICE is too hysterical. --- From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 From: Dave Hansen d...@linux.vnet.ibm.com Date: Fri, 12 Oct 2012 14:30:54 +0200 Subject: [PATCH] add some drop_caches documentation and info messsge There is plenty of anecdotal evidence and a load of blog posts suggesting that using drop_caches periodically keeps your system running in tip top shape. Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate workaround to limit the size of the caches. It's a great debugging tool, and is really handy for doing things like repeatable benchmark runs. So, add a bit more documentation about it, and add a little KERN_NOTICE. It should help developers who are chasing down reclaim-related bugs. ... + printk(KERN_NOTICE %s (%d): dropped kernel caches: %d\n, + current-comm, task_pid_nr(current), sysctl_drop_caches); urgh. Are we really sure we want to do this? The system operators who are actually using this thing will hate us :( More friendly alternatives might be: - Taint the kernel. But that will only become apparent with an oops trace or similar. - Add a drop_caches counter and make that available in /proc/vmstat, show_mem() output and perhaps other places. I suspect the /proc/vmstat counter will suffice - if someone is having vm issues, we'll be seeing their /proc/vmstat at some stage and if the drop_caches counter is high, that's enough to get suspicious? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/12/2012 05:57 AM, Michal Hocko wrote: > I would like to resurrect the following Dave's patch. The last time it > has been posted was here https://lkml.org/lkml/2010/9/16/250 and there > didn't seem to be any strong opposition. > Kosaki was worried about possible excessive logging when somebody drops > caches too often (but then he claimed he didn't have a strong opinion > on that) but I would say opposite. If somebody does that then I would > really like to know that from the log when supporting a system because > it almost for sure means that there is something fishy going on. It is > also worth mentioning that only root can write drop caches so this is > not an flooding attack vector. Just read through the patch again. Still looks great to me. Thanks for bringing it up again, Michal! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
(2012/10/12 21:57), Michal Hocko wrote: Hi, I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. I am bringing that up again because this can be really helpful when chasing strange performance issues which (surprise surprise) turn out to be related to artificially dropped caches done because the admin thinks this would help... I have just refreshed the original patch on top of the current mm tree but I could live with KERN_INFO as well if people think that KERN_NOTICE is too hysterical. --- From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 12 Oct 2012 14:30:54 +0200 Subject: [PATCH] add some drop_caches documentation and info messsge There is plenty of anecdotal evidence and a load of blog posts suggesting that using "drop_caches" periodically keeps your system running in "tip top shape". Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate "workaround" to limit the size of the caches. It's a great debugging tool, and is really handy for doing things like repeatable benchmark runs. So, add a bit more documentation about it, and add a little KERN_NOTICE. It should help developers who are chasing down reclaim-related bugs. [mho...@suse.cz: refreshed to current -mm tree] Signed-off-by: Dave Hansen Reviewed-by: KAMEZAWA Hiroyuki Acked-by: Michal Hocko Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
(2012/10/12 21:57), Michal Hocko wrote: Hi, I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. I am bringing that up again because this can be really helpful when chasing strange performance issues which (surprise surprise) turn out to be related to artificially dropped caches done because the admin thinks this would help... I have just refreshed the original patch on top of the current mm tree but I could live with KERN_INFO as well if people think that KERN_NOTICE is too hysterical. --- From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 From: Dave Hansen d...@linux.vnet.ibm.com Date: Fri, 12 Oct 2012 14:30:54 +0200 Subject: [PATCH] add some drop_caches documentation and info messsge There is plenty of anecdotal evidence and a load of blog posts suggesting that using drop_caches periodically keeps your system running in tip top shape. Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate workaround to limit the size of the caches. It's a great debugging tool, and is really handy for doing things like repeatable benchmark runs. So, add a bit more documentation about it, and add a little KERN_NOTICE. It should help developers who are chasing down reclaim-related bugs. [mho...@suse.cz: refreshed to current -mm tree] Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Acked-by: Michal Hocko mho...@suse.cz Acked-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On 10/12/2012 05:57 AM, Michal Hocko wrote: I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. Just read through the patch again. Still looks great to me. Thanks for bringing it up again, Michal! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Fri, Oct 12, 2012 at 8:57 AM, Michal Hocko wrote: > Hi, > I would like to resurrect the following Dave's patch. The last time it > has been posted was here https://lkml.org/lkml/2010/9/16/250 and there > didn't seem to be any strong opposition. > Kosaki was worried about possible excessive logging when somebody drops > caches too often (but then he claimed he didn't have a strong opinion > on that) but I would say opposite. If somebody does that then I would > really like to know that from the log when supporting a system because > it almost for sure means that there is something fishy going on. It is > also worth mentioning that only root can write drop caches so this is > not an flooding attack vector. > I am bringing that up again because this can be really helpful when > chasing strange performance issues which (surprise surprise) turn out to > be related to artificially dropped caches done because the admin thinks > this would help... > > I have just refreshed the original patch on top of the current mm tree > but I could live with KERN_INFO as well if people think that KERN_NOTICE > is too hysterical. > --- > From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 > From: Dave Hansen > Date: Fri, 12 Oct 2012 14:30:54 +0200 > Subject: [PATCH] add some drop_caches documentation and info messsge > > There is plenty of anecdotal evidence and a load of blog posts > suggesting that using "drop_caches" periodically keeps your system > running in "tip top shape". Perhaps adding some kernel > documentation will increase the amount of accurate data on its use. > > If we are not shrinking caches effectively, then we have real bugs. > Using drop_caches will simply mask the bugs and make them harder > to find, but certainly does not fix them, nor is it an appropriate > "workaround" to limit the size of the caches. > > It's a great debugging tool, and is really handy for doing things > like repeatable benchmark runs. So, add a bit more documentation > about it, and add a little KERN_NOTICE. It should help developers > who are chasing down reclaim-related bugs. > > [mho...@suse.cz: refreshed to current -mm tree] > Signed-off-by: Dave Hansen > Reviewed-by: KAMEZAWA Hiroyuki > Acked-by: Michal Hocko Looks fine. Acked-by: KOSAKI Motohiro -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] add some drop_caches documentation and info messsge
On Fri, Oct 12, 2012 at 8:57 AM, Michal Hocko mho...@suse.cz wrote: Hi, I would like to resurrect the following Dave's patch. The last time it has been posted was here https://lkml.org/lkml/2010/9/16/250 and there didn't seem to be any strong opposition. Kosaki was worried about possible excessive logging when somebody drops caches too often (but then he claimed he didn't have a strong opinion on that) but I would say opposite. If somebody does that then I would really like to know that from the log when supporting a system because it almost for sure means that there is something fishy going on. It is also worth mentioning that only root can write drop caches so this is not an flooding attack vector. I am bringing that up again because this can be really helpful when chasing strange performance issues which (surprise surprise) turn out to be related to artificially dropped caches done because the admin thinks this would help... I have just refreshed the original patch on top of the current mm tree but I could live with KERN_INFO as well if people think that KERN_NOTICE is too hysterical. --- From 1f4058be9b089bc9d43d71bc63989335d7637d8d Mon Sep 17 00:00:00 2001 From: Dave Hansen d...@linux.vnet.ibm.com Date: Fri, 12 Oct 2012 14:30:54 +0200 Subject: [PATCH] add some drop_caches documentation and info messsge There is plenty of anecdotal evidence and a load of blog posts suggesting that using drop_caches periodically keeps your system running in tip top shape. Perhaps adding some kernel documentation will increase the amount of accurate data on its use. If we are not shrinking caches effectively, then we have real bugs. Using drop_caches will simply mask the bugs and make them harder to find, but certainly does not fix them, nor is it an appropriate workaround to limit the size of the caches. It's a great debugging tool, and is really handy for doing things like repeatable benchmark runs. So, add a bit more documentation about it, and add a little KERN_NOTICE. It should help developers who are chasing down reclaim-related bugs. [mho...@suse.cz: refreshed to current -mm tree] Signed-off-by: Dave Hansen d...@linux.vnet.ibm.com Reviewed-by: KAMEZAWA Hiroyuki kamezawa.hir...@jp.fujitsu.com Acked-by: Michal Hocko mho...@suse.cz Looks fine. Acked-by: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/