Time zones
Is it possible to tell the kernel what time zone the RTC is in? Right now it appears to assume that it's always in UTC, and this causes a few headaches during the boot process. As it is I tried to file a bug to have openrc activate hwclock earlier, but it was rejected.
Aborting a core dump on a fatal signal
Would there be any benefit to allowing an in-progress core dump to be aborted if the dumping process receives a fatal signal? Example: Process segfaults, starts dumping core, but it has a lot of virtual memory allocated so it promptly leads to a queue-clogging deluge of I/O that takes in some cases several minutes to finish. In between the time where it snags the crash signal and the time when it finishes dumping and terminates, I'd like to be able to send it a SIGKILL or something to abort the core dump.
Aborting a core dump on a fatal signal
Would there be any benefit to allowing an in-progress core dump to be aborted if the dumping process receives a fatal signal? Example: Process segfaults, starts dumping core, but it has a lot of virtual memory allocated so it promptly leads to a queue-clogging deluge of I/O that takes in some cases several minutes to finish. In between the time where it snags the crash signal and the time when it finishes dumping and terminates, I'd like to be able to send it a SIGKILL or something to abort the core dump.
Re: [PATCH 0/3] ABI CHANGE!!! Remove questionable remote SP reads
My personal opinion is that even looking at esp/rsp is asking for trouble. The only reliable information is VM_STACK or another VM flag that makes the area expand in response to stack growth. Besides, userspace could always play funky trampoline games with the stack pointer, or even dynamically expand the stack by doing a malloc if a stack overflow draws near, which would put the stack in the data section temporarily. As long as esp is in the bounds of a valid VMA, my vote is that we should consider it undefined how the task uses it. On Mon, Oct 3, 2016 at 4:17 PM, Linus Torvaldswrote: On Mon, Oct 3, 2016 at 4:08 PM, Andy Lutomirski wrote: Ping! We need to decide fairly soon whether to apply these (or perhaps just patch 1 or just patches 2 and 3) for 4.9. For any parts that aren't applied, I'll send quick fixups to pin the stack in the offending code. I think we should apply it. Hopefully nothing uses it, and nobody will notice. And if somebody *does* notice, the sooner we find out, the better. Linus
Re: [PATCH 0/3] ABI CHANGE!!! Remove questionable remote SP reads
My personal opinion is that even looking at esp/rsp is asking for trouble. The only reliable information is VM_STACK or another VM flag that makes the area expand in response to stack growth. Besides, userspace could always play funky trampoline games with the stack pointer, or even dynamically expand the stack by doing a malloc if a stack overflow draws near, which would put the stack in the data section temporarily. As long as esp is in the bounds of a valid VMA, my vote is that we should consider it undefined how the task uses it. On Mon, Oct 3, 2016 at 4:17 PM, Linus Torvalds wrote: On Mon, Oct 3, 2016 at 4:08 PM, Andy Lutomirski wrote: Ping! We need to decide fairly soon whether to apply these (or perhaps just patch 1 or just patches 2 and 3) for 4.9. For any parts that aren't applied, I'll send quick fixups to pin the stack in the offending code. I think we should apply it. Hopefully nothing uses it, and nobody will notice. And if somebody *does* notice, the sooner we find out, the better. Linus
Re: BUG_ON() in workingset_node_shadows_dec() triggers
On Mon, Oct 3, 2016 at 9:12 PM, Linus Torvaldswrote: On Mon, Oct 3, 2016 at 9:07 PM, Andrew Morton wrote: Well, it's a VM_BUG_ON and few people run with CONFIG_DEBUG_VM. Ehh. If by "few people" you mean "pretty much everybody", you'd be right, but your choice of wording would be somewhat misleading, wouldn't you say? Hint: here's a line from the standard Fedora kernel config: CONFIG_DEBUG_VM=y so *no*. VM_BUG_ON() is no less deadly than a regular BUG_ON(). It just allows some people to build smaller kernels, but apparently distro people would rather have debugging than save a few kB of RAM. The VM debvugging code has VM_WARN_ON() and VM_WARN_ON_ONCE() for people who want to get a "oops, my assumptions were wrong" Killing machines because somebody made an assumption that was wrong is not ok. Killing the machine is ok if we have a situation where there literally is no other choice. For the curious: This would include situations like 1. The kernel is confused and further processing would result in undefined behavior (like bluesmoke detecting PCC for example) 2. Security hazards where we'd leak stuff if we don't shut down. ? Linus
Re: BUG_ON() in workingset_node_shadows_dec() triggers
On Mon, Oct 3, 2016 at 9:12 PM, Linus Torvalds wrote: On Mon, Oct 3, 2016 at 9:07 PM, Andrew Morton wrote: Well, it's a VM_BUG_ON and few people run with CONFIG_DEBUG_VM. Ehh. If by "few people" you mean "pretty much everybody", you'd be right, but your choice of wording would be somewhat misleading, wouldn't you say? Hint: here's a line from the standard Fedora kernel config: CONFIG_DEBUG_VM=y so *no*. VM_BUG_ON() is no less deadly than a regular BUG_ON(). It just allows some people to build smaller kernels, but apparently distro people would rather have debugging than save a few kB of RAM. The VM debvugging code has VM_WARN_ON() and VM_WARN_ON_ONCE() for people who want to get a "oops, my assumptions were wrong" Killing machines because somebody made an assumption that was wrong is not ok. Killing the machine is ok if we have a situation where there literally is no other choice. For the curious: This would include situations like 1. The kernel is confused and further processing would result in undefined behavior (like bluesmoke detecting PCC for example) 2. Security hazards where we'd leak stuff if we don't shut down. ? Linus
Removal of wchan and top
Hey, don't know if this is important enough, but could I request that the removal of wchan be reverted, or at least wrapped in an optional config setting? I happen to enjoy monitoring this information with a secure top, and it's useful for understanding how my system works and I've used it a few times for debugging. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Removal of wchan and top
Hey, don't know if this is important enough, but could I request that the removal of wchan be reverted, or at least wrapped in an optional config setting? I happen to enjoy monitoring this information with a secure top, and it's useful for understanding how my system works and I've used it a few times for debugging. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Updated scalable urandom patchkit
On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'o wrote: On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote: > Segregating abusers solves both problems. If we do this then we don't > need to drop the locks from the nonblocking pool, which solves the > security problem. Er, sort of. I still think my points were valid, but they're about a particular optimization suggestion you had. By avoiding the need for the optimization, the entire issue is mooted. Sure, I'm not in love with anyone's particular optimization, whether it's mine, yours, or Andi's. I'm just trying to solve the scalability problem while also trying to keep the code maintainable and easy to understand (and over the years we've actually made things worse, to the extent that having a single mixing for the input and output pools is starting to be more of problem than a feature, since we're coding in a bunch of exceptions when it's the output pool, etc.). So if we can solve a problem by routing around it, that's fine in my book. You have to copy the state *anyway* because you don't want it overwritten by the ChaCha output, so there's really no point storing the constants. (Also, ChaCha has a simpler input block structure than Salsa20; the constants are all adjacent.) We're really getting into low-level implementations here, and I think it's best to worry about these sorts of things when we have a patch to review. (Note: one problem with ChaCha specifically is that is needs 16x32 bits of registers, and Arm32 doesn't quite have enough. We may want to provide an arch CPRNG hook so people can plug in other algorithms with good platform support, like x86 AES instructions.) So while a ChaCha20-based CRNG should be faster than a SHA-1 based CRNG, and I consider this a good thing, for me speed is **not** more important than keeping the underlying code maintainable and simple. This is one of the reasons why I looked at, and then discarded, to use x86 accelerated AES as the basis for a CRNG. Setting up AES so that it can be used easily with or without hardware acceleration looks very complicated to do in a cross-architectural way, and I don't want to drag in all of the crypto layer for /dev/random. The same variables can be used (with different parameters) to decide if we want to get out of mitigation mode. The one thing to watch out for is that "cat /dev/sdX" may have some huge pauses once the buffer cache fills. We don't want to forgive after too small a fixed interval. At least initially, once we go into mitigation mode for a particular process, it's probably safer to simply not exit it. Finally, we have the issue of where to attach this rate-limiting structure and crypto context. My idea was to use the struct file. But now that we have getrandom(2), it's harder. mm, task_struct, signal_struct, what? I'm personally more inclined to keep it with the task struct, so that different threads will use different crypto contexts, just from simplicity point of view since we won't need to worry about locking. Since many processes don't use /dev/urandom or getrandom(2) at all, the first time they do, we'd allocate a structure and hang it off the task_struct. When the process exits, we would explicitly memzero it and then release the memory. (Post-finally, do we want this feature to be configurable under CONFIG_EMBEDDED? I know keeping the /dev/random code size small is a speficic design goal, and abuse mitigation is optional.) Once we code it up we can see how many bytes this takes, we can have this discussion. I'll note that ChaCha20 is much more compact than SHA1: textdata bss dec hex filename 4230 0 042301086 /build/ext4-64/lib/sha1.o 1152 304 0 1456 5b0 /build/ext4-64/crypto/chacha20_generic.o ... and I've thought about this as being the first step towards potentially replacing SHA1 with something ChaCha20 based, in light of the SHAppening attack. Unfortunately, BLAKE2s is similar to ChaCha only from design perspective, not an implementation perspective. Still, I suspect the just looking at the crypto primitives, even if we need to include two independent copies of the ChaCha20 core crypto and the Blake2s core crypto, it still should be about half the size of the SHA-1 crypto primitive. And from the non-plumbing side of things, Andi's patchset increases the size of /dev/random by a bit over 6%, or 974 bytes from a starting base of 15719 bytes. It ought to be possible to implement a ChaCha20 based CRNG (ignoring the crypto primitives) in less than 974 bytes of x86_64 assembly. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ This might be stupid,
Re: Updated scalable urandom patchkit
On Mon, Oct 12, 2015 at 7:46 PM, Theodore Ts'owrote: On Mon, Oct 12, 2015 at 04:30:59PM -0400, George Spelvin wrote: > Segregating abusers solves both problems. If we do this then we don't > need to drop the locks from the nonblocking pool, which solves the > security problem. Er, sort of. I still think my points were valid, but they're about a particular optimization suggestion you had. By avoiding the need for the optimization, the entire issue is mooted. Sure, I'm not in love with anyone's particular optimization, whether it's mine, yours, or Andi's. I'm just trying to solve the scalability problem while also trying to keep the code maintainable and easy to understand (and over the years we've actually made things worse, to the extent that having a single mixing for the input and output pools is starting to be more of problem than a feature, since we're coding in a bunch of exceptions when it's the output pool, etc.). So if we can solve a problem by routing around it, that's fine in my book. You have to copy the state *anyway* because you don't want it overwritten by the ChaCha output, so there's really no point storing the constants. (Also, ChaCha has a simpler input block structure than Salsa20; the constants are all adjacent.) We're really getting into low-level implementations here, and I think it's best to worry about these sorts of things when we have a patch to review. (Note: one problem with ChaCha specifically is that is needs 16x32 bits of registers, and Arm32 doesn't quite have enough. We may want to provide an arch CPRNG hook so people can plug in other algorithms with good platform support, like x86 AES instructions.) So while a ChaCha20-based CRNG should be faster than a SHA-1 based CRNG, and I consider this a good thing, for me speed is **not** more important than keeping the underlying code maintainable and simple. This is one of the reasons why I looked at, and then discarded, to use x86 accelerated AES as the basis for a CRNG. Setting up AES so that it can be used easily with or without hardware acceleration looks very complicated to do in a cross-architectural way, and I don't want to drag in all of the crypto layer for /dev/random. The same variables can be used (with different parameters) to decide if we want to get out of mitigation mode. The one thing to watch out for is that "cat /dev/sdX" may have some huge pauses once the buffer cache fills. We don't want to forgive after too small a fixed interval. At least initially, once we go into mitigation mode for a particular process, it's probably safer to simply not exit it. Finally, we have the issue of where to attach this rate-limiting structure and crypto context. My idea was to use the struct file. But now that we have getrandom(2), it's harder. mm, task_struct, signal_struct, what? I'm personally more inclined to keep it with the task struct, so that different threads will use different crypto contexts, just from simplicity point of view since we won't need to worry about locking. Since many processes don't use /dev/urandom or getrandom(2) at all, the first time they do, we'd allocate a structure and hang it off the task_struct. When the process exits, we would explicitly memzero it and then release the memory. (Post-finally, do we want this feature to be configurable under CONFIG_EMBEDDED? I know keeping the /dev/random code size small is a speficic design goal, and abuse mitigation is optional.) Once we code it up we can see how many bytes this takes, we can have this discussion. I'll note that ChaCha20 is much more compact than SHA1: textdata bss dec hex filename 4230 0 042301086 /build/ext4-64/lib/sha1.o 1152 304 0 1456 5b0 /build/ext4-64/crypto/chacha20_generic.o ... and I've thought about this as being the first step towards potentially replacing SHA1 with something ChaCha20 based, in light of the SHAppening attack. Unfortunately, BLAKE2s is similar to ChaCha only from design perspective, not an implementation perspective. Still, I suspect the just looking at the crypto primitives, even if we need to include two independent copies of the ChaCha20 core crypto and the Blake2s core crypto, it still should be about half the size of the SHA-1 crypto primitive. And from the non-plumbing side of things, Andi's patchset increases the size of /dev/random by a bit over 6%, or 974 bytes from a starting base of 15719 bytes. It ought to be possible to implement a ChaCha20 based CRNG (ignoring the crypto primitives) in less than 974 bytes of x86_64 assembly. :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ This
Re: can't oom-kill zap the victim's memory?
On 09/20/15 11:05, Linus Torvalds wrote: On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterov wrote: In this case the workqueue thread will block. What workqueue thread? pagefault_out_of_memory -> out_of_memory -> oom_kill_process as far as I can tell, this can be called by any task. Now, that pagefault case should only happen when the page fault comes from user space, but we also have __alloc_pages_slowpath -> __alloc_pages_may_oom -> out_of_memory -> oom_kill_process which can be called from just about any context (but atomic allocations will never get here, so it can schedule etc). I think in this case the oom killer should just slap a SIGKILL on the task and then back out, and whatever needed the memory should just wait patiently for the sacrificial lamb to commit seppuku. Which, btw, we should IMO encourage ASAP in the context of the lamb by having anything potentially locky or semaphory pay attention to if the task in question has a fatal signal pending, and if so, drop everything and run like hell so that the task can cough up any locks or semaphores. So what's your point? Explain again just how do you guarantee that you can take the mmap_sem. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org Also, I observed that a task in the middle of dumping core doesn't respond to signals while it's dumping, and I would guess that might be the case even if the task receives a SIGKILL from the OOM handler. Just a potential observation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: can't oom-kill zap the victim's memory?
On 09/20/15 11:05, Linus Torvalds wrote: On Sun, Sep 20, 2015 at 5:56 AM, Oleg Nesterovwrote: In this case the workqueue thread will block. What workqueue thread? pagefault_out_of_memory -> out_of_memory -> oom_kill_process as far as I can tell, this can be called by any task. Now, that pagefault case should only happen when the page fault comes from user space, but we also have __alloc_pages_slowpath -> __alloc_pages_may_oom -> out_of_memory -> oom_kill_process which can be called from just about any context (but atomic allocations will never get here, so it can schedule etc). I think in this case the oom killer should just slap a SIGKILL on the task and then back out, and whatever needed the memory should just wait patiently for the sacrificial lamb to commit seppuku. Which, btw, we should IMO encourage ASAP in the context of the lamb by having anything potentially locky or semaphory pay attention to if the task in question has a fatal signal pending, and if so, drop everything and run like hell so that the task can cough up any locks or semaphores. So what's your point? Explain again just how do you guarantee that you can take the mmap_sem. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org Also, I observed that a task in the middle of dumping core doesn't respond to signals while it's dumping, and I would guess that might be the case even if the task receives a SIGKILL from the OOM handler. Just a potential observation. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: can't oom-kill zap the victim's memory?
On 09/19/15 15:24, Linus Torvalds wrote: On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterov wrote: + +static void oom_unmap_func(struct work_struct *work) +{ + struct mm_struct *mm = xchg(_unmap_mm, NULL); + + if (!atomic_inc_not_zero(>mm_users)) + return; + + // If this is not safe we can do use_mm() + unuse_mm() + down_read(>mmap_sem); I don't think this is safe. What makes you sure that we might not deadlock on the mmap_sem here? For all we know, the process that is going out of memory is in the middle of a mmap(), and already holds the mmap_sem for writing. No? Potentially stupid question that others may be asking: Is it legal to return EINTR from mmap() to let a SIGKILL from the OOM handler punch the task out of the kernel and back to userspace? (sorry for the dupe btw, new email client snuck in html and I got bounced) So at the very least that needs to be a trylock, I think. And I'm not sure zap_page_range() is ok with the mmap_sem only held for reading. Normally our rule is that you can *populate* the page tables concurrently, but you can't tear the down. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: can't oom-kill zap the victim's memory?
On 09/19/15 15:24, Linus Torvalds wrote: On Sat, Sep 19, 2015 at 8:03 AM, Oleg Nesterovwrote: + +static void oom_unmap_func(struct work_struct *work) +{ + struct mm_struct *mm = xchg(_unmap_mm, NULL); + + if (!atomic_inc_not_zero(>mm_users)) + return; + + // If this is not safe we can do use_mm() + unuse_mm() + down_read(>mmap_sem); I don't think this is safe. What makes you sure that we might not deadlock on the mmap_sem here? For all we know, the process that is going out of memory is in the middle of a mmap(), and already holds the mmap_sem for writing. No? Potentially stupid question that others may be asking: Is it legal to return EINTR from mmap() to let a SIGKILL from the OOM handler punch the task out of the kernel and back to userspace? (sorry for the dupe btw, new email client snuck in html and I got bounced) So at the very least that needs to be a trylock, I think. And I'm not sure zap_page_range() is ok with the mmap_sem only held for reading. Normally our rule is that you can *populate* the page tables concurrently, but you can't tear the down. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org;> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: First kernel patch (optimization)
On 09/18/15 00:42, Greg KH wrote: On Thu, Sep 17, 2015 at 11:12:51PM -0400, Theodore Ts'o wrote: On Wed, Sep 16, 2015 at 01:26:51PM -0400, Josh Boyer wrote: That isn't true. It helps the submitter understand the workflow and expectations. What you meant to say is that it doesn't help you. The problem is that workflow isn't the hard part. It's the part that can be taught most easily, sure. But people seem to get really hung up on it, and I fear that we have people who never progress beyond sending trivial patches and spelling fixes and white space fixes and micro-optimizations. If the "you too can be a kernel developer" classes and web sites and tutorials also taught people how to take performance measurements, and something about the scientific measurement, that would be something. Or if it taught people how to create tests and to run regression testing. Or if it taught people how to try to do fuzz testing, and then once they find a sequence which causes crash, how to narrow down the failure to a specific part of the kernel, and how to fix and confirm that the kernel no longer crashes with the fix --- that would be useful. If they can understand kernel code; if they can understand the scientific measurement; if they can understand how to do performance measurements --- being able to properly format patches is something which most kernel developers can very easily guide a new contributor to do correctly. Or in the worst case, it doesn't take much time for me to fix a whitespace problem and just tell the contributor --- by the way, I fixed up this minor issue; could you please make sure you do this in the future? But if a test hasn't been tested, or if the contributor things it's a micro-optimization, but it actually takes more CPU time and/or more stack space and/or bloats the kernel --- that's much more work for the kernel maintainer to have to deal with when reviewing a patch. So I have a very strong disagreement with the belief that teaching people the workflow is the more important thing. In my mind, that's like first focusing on the proper how to properly fill out a golf score card, and the ettiquette and traditions around handicaps, etc --- before making sure the prospective player is good at putting and driving. Personally, I'm terrible at putting and driving, so spending a lot of time learning how to fill out a golf score card would be a waste of my time. A good kernel programmer has to understand systems thinking; how to figure out abstractions and when it's a good thing to add a new layer of abstraction and when it's better to rework an exsting abstraction layer. If we have someone who knows the workflow, but which doesn't understand systems thinking, or how to do testing, then what? Great, we've just created another Nick Krause. Do you think encouraging a Nick Krause helps anyone? If people really are hung up on learning the workflow, I don't mind if they want to learn that part and send some silly micro-optimization or spelling fix or whitespace fix. But it's really, really important that they move beyond that. And if they aren't capable of moving beyond that, trying to inflate are recruitment numbers by encouraging someone who can only do trivial fixes means that we may be get what we can easily measure --- but it may not be what we really need as a community. Ted, you are full of crap. Where do you think that "new developers" come from? Do they show up in our inbox, with full knowledge of kernel internals and OS theory yet they somehow just can't grasp how to submit a patch correctly? Yes, they sometimes rarely do. But for the majority of people who got into Linux, that is not the case at all. People need to start with something simple, and easy, to get over the hurdles of: - generating a patch - sending an email - fixing the email client as it just corrupted the patch - fix the subject line as it was incorrect - fix the changelog as it was missing - fix the email client again as it corrupted the patch in a different way - giving up on using a web email client as it just will not work - figuring out who to send the patch to - fixing the email client as the mailing list bounced the email Those are non-trivial tasks. And by starting with "remove this space" you take the worry away from the specific content of the patch, and let them worry about the "hard" part first. +1 for this. For example, I for one cannot tell you how many times gmail snuck html sections into my outgoing emails before I finally caught it red handed and switched to using a local native client. Then, after all of the above is finished, and working, then they can start submitting real patches, that do real things, in patch series, as they can focus on the content much more, as the problems of how to make the patch into an acceptable format is not an issue anymore. Did anyone read linus torvald's post that
Re: First kernel patch (optimization)
On 09/18/15 00:42, Greg KH wrote: On Thu, Sep 17, 2015 at 11:12:51PM -0400, Theodore Ts'o wrote: On Wed, Sep 16, 2015 at 01:26:51PM -0400, Josh Boyer wrote: That isn't true. It helps the submitter understand the workflow and expectations. What you meant to say is that it doesn't help you. The problem is that workflow isn't the hard part. It's the part that can be taught most easily, sure. But people seem to get really hung up on it, and I fear that we have people who never progress beyond sending trivial patches and spelling fixes and white space fixes and micro-optimizations. If the "you too can be a kernel developer" classes and web sites and tutorials also taught people how to take performance measurements, and something about the scientific measurement, that would be something. Or if it taught people how to create tests and to run regression testing. Or if it taught people how to try to do fuzz testing, and then once they find a sequence which causes crash, how to narrow down the failure to a specific part of the kernel, and how to fix and confirm that the kernel no longer crashes with the fix --- that would be useful. If they can understand kernel code; if they can understand the scientific measurement; if they can understand how to do performance measurements --- being able to properly format patches is something which most kernel developers can very easily guide a new contributor to do correctly. Or in the worst case, it doesn't take much time for me to fix a whitespace problem and just tell the contributor --- by the way, I fixed up this minor issue; could you please make sure you do this in the future? But if a test hasn't been tested, or if the contributor things it's a micro-optimization, but it actually takes more CPU time and/or more stack space and/or bloats the kernel --- that's much more work for the kernel maintainer to have to deal with when reviewing a patch. So I have a very strong disagreement with the belief that teaching people the workflow is the more important thing. In my mind, that's like first focusing on the proper how to properly fill out a golf score card, and the ettiquette and traditions around handicaps, etc --- before making sure the prospective player is good at putting and driving. Personally, I'm terrible at putting and driving, so spending a lot of time learning how to fill out a golf score card would be a waste of my time. A good kernel programmer has to understand systems thinking; how to figure out abstractions and when it's a good thing to add a new layer of abstraction and when it's better to rework an exsting abstraction layer. If we have someone who knows the workflow, but which doesn't understand systems thinking, or how to do testing, then what? Great, we've just created another Nick Krause. Do you think encouraging a Nick Krause helps anyone? If people really are hung up on learning the workflow, I don't mind if they want to learn that part and send some silly micro-optimization or spelling fix or whitespace fix. But it's really, really important that they move beyond that. And if they aren't capable of moving beyond that, trying to inflate are recruitment numbers by encouraging someone who can only do trivial fixes means that we may be get what we can easily measure --- but it may not be what we really need as a community. Ted, you are full of crap. Where do you think that "new developers" come from? Do they show up in our inbox, with full knowledge of kernel internals and OS theory yet they somehow just can't grasp how to submit a patch correctly? Yes, they sometimes rarely do. But for the majority of people who got into Linux, that is not the case at all. People need to start with something simple, and easy, to get over the hurdles of: - generating a patch - sending an email - fixing the email client as it just corrupted the patch - fix the subject line as it was incorrect - fix the changelog as it was missing - fix the email client again as it corrupted the patch in a different way - giving up on using a web email client as it just will not work - figuring out who to send the patch to - fixing the email client as the mailing list bounced the email Those are non-trivial tasks. And by starting with "remove this space" you take the worry away from the specific content of the patch, and let them worry about the "hard" part first. +1 for this. For example, I for one cannot tell you how many times gmail snuck html sections into my outgoing emails before I finally caught it red handed and switched to using a local native client. Then, after all of the above is finished, and working, then they can start submitting real patches, that do real things, in patch series, as they can focus on the content much more, as the problems of how to make the patch into an acceptable format is not an issue anymore. Did anyone read linus torvald's post that
Re: First kernel patch (optimization)
On 09/16/15 09:40, Theodore Ts'o wrote: On Wed, Sep 16, 2015 at 05:03:39PM +0100, Eric Curtin wrote: Hi Greg, As I said in the subject of the mail (which I have been since told I shouldn't have done this), I'm a noob to kernel code. I tried to pick off something super simple to just see what the process of getting a patch in is. Youtube videos and documentation only get you so far. From reading your response, should I refrain from sending in these micro-optimizations in future? Getting in smaller patches is easier for me as I only do this in my spare time, which I don't have a lot of! What I'd ask you to consider is what your end goal? Is it just to collect a scalp (woo hoo! I've gotten a patch into the kernel)? Or is it to actually make things better for yourself or other users? Or are you trying to get make your self more employable, etc. It could well be that he's wanting to practice getting used to the development process. https://lkml.org/lkml/2004/12/20/255 Micro-optimizations is often not particularly useful for anything other than the first goal, and it really doesn't help anyone. If you're just doing this in your spare time, then hopefully I hope you are being choosy about what's the best way to use your spare time, so the question of what your goals are going to be is a very important thing for you to figure out. Regardless of whether it's worthwhile to get this patch into the kernel, doing any *more* micro-optimizations is probably not a good use of your time or anyone else's. I'd strongly encourage you to move on to something more than just micro-optimizations as quickly as possible. Tytso is right here. If you want to be useful you should find something with real impact once you've learned the ropes. Best regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: First kernel patch (optimization)
On 09/16/15 09:40, Theodore Ts'o wrote: On Wed, Sep 16, 2015 at 05:03:39PM +0100, Eric Curtin wrote: Hi Greg, As I said in the subject of the mail (which I have been since told I shouldn't have done this), I'm a noob to kernel code. I tried to pick off something super simple to just see what the process of getting a patch in is. Youtube videos and documentation only get you so far. From reading your response, should I refrain from sending in these micro-optimizations in future? Getting in smaller patches is easier for me as I only do this in my spare time, which I don't have a lot of! What I'd ask you to consider is what your end goal? Is it just to collect a scalp (woo hoo! I've gotten a patch into the kernel)? Or is it to actually make things better for yourself or other users? Or are you trying to get make your self more employable, etc. It could well be that he's wanting to practice getting used to the development process. https://lkml.org/lkml/2004/12/20/255 Micro-optimizations is often not particularly useful for anything other than the first goal, and it really doesn't help anyone. If you're just doing this in your spare time, then hopefully I hope you are being choosy about what's the best way to use your spare time, so the question of what your goals are going to be is a very important thing for you to figure out. Regardless of whether it's worthwhile to get this patch into the kernel, doing any *more* micro-optimizations is probably not a good use of your time or anyone else's. I'd strongly encourage you to move on to something more than just micro-optimizations as quickly as possible. Tytso is right here. If you want to be useful you should find something with real impact once you've learned the ropes. Best regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')
On 09/04/15 14:30, Stas Sergeev wrote: 05.09.2015 00:16, Stas Sergeev пишет: I agree. vm86() is a mess. My point is that its risky parts and useless funtionality is _already_ known (even I can point to the particular code parts than can simply be removed). As such, it simply had to be re-visited and cleaned up to match at least 1 and 3 (and then maybe 5). This wasn't done, and the knob was introduced _instead_ of doing this. Grr, I mean it was disabled by default instead of doing this, and the knob was only proposed, not added. You can't just pull vm86 out of the kernel anyway. dosemu is a userspace application that depends on it, so pulling this feature out would be a big fat regression, period. I would personally rather not hear about how "it's a legacy program so its userbase is shrinking" used as any sort of excuse to ignore the fact that we shouldn't break userspace. I can even say as a user that vm86 is important to me. By all means, cleaning up vm86 is a good idea. But removing it or fencing it off with a strong deprecation doesn't sound like the right idea. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: stop breaking dosemu (Re: x86/kconfig/32: Rename CONFIG_VM86 and default it to 'n')
On 09/04/15 14:30, Stas Sergeev wrote: 05.09.2015 00:16, Stas Sergeev пишет: I agree. vm86() is a mess. My point is that its risky parts and useless funtionality is _already_ known (even I can point to the particular code parts than can simply be removed). As such, it simply had to be re-visited and cleaned up to match at least 1 and 3 (and then maybe 5). This wasn't done, and the knob was introduced _instead_ of doing this. Grr, I mean it was disabled by default instead of doing this, and the knob was only proposed, not added. You can't just pull vm86 out of the kernel anyway. dosemu is a userspace application that depends on it, so pulling this feature out would be a big fat regression, period. I would personally rather not hear about how "it's a legacy program so its userbase is shrinking" used as any sort of excuse to ignore the fact that we shouldn't break userspace. I can even say as a user that vm86 is important to me. By all means, cleaning up vm86 is a good idea. But removing it or fencing it off with a strong deprecation doesn't sound like the right idea. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 09/01/15 20:30, Albino B Neto wrote: 2015-08-31 23:53 GMT-03:00 Theodore Ts'o : Yes, you can go back to ext3-only. In fact, we do *not* automatically upgrade the file system to use ext4-specific features. So it's not just a "you can use ext4 instead" issue. Can you do that *without* then forcing an upgrade forever on that partition? I'm not sure the ext4 people are really even willing to guarantee that kind of backwards compatibility. Actually, we do guarantee this. It's considered poor form to automatically change the superblock to add new file system features in a way that would break the ability for the user to roll back to an older kernel. This isn't just for ext3->ext4, but for new ext4 features such as metadata checksumming. The user has to explicitly enable the feature using "tune2fs -O new_feature /dev/sdXX". Yeah! 2015-09-01 16:39 GMT-03:00 Austin S Hemmelgarn : NO, it is not logical. A vast majority of Android smartphones in the wild use ext2, as do a very significant portion of embedded systems that don't have room for the few hundred kilobytes of extra code that the ext4 driver has in comparison to ext2. Ext2 portion embedded and Ext3 many machines. So basically the game plan is gutting ext3 because code-dupe with ext4, but keep ext2 because ext4 is too big for embedded to outright replace ext2? Hmm...are there any embedded systems out there that use ext3 and can fit its code ext3 but not ext4? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 09/01/15 20:30, Albino B Neto wrote: 2015-08-31 23:53 GMT-03:00 Theodore Ts'o: Yes, you can go back to ext3-only. In fact, we do *not* automatically upgrade the file system to use ext4-specific features. So it's not just a "you can use ext4 instead" issue. Can you do that *without* then forcing an upgrade forever on that partition? I'm not sure the ext4 people are really even willing to guarantee that kind of backwards compatibility. Actually, we do guarantee this. It's considered poor form to automatically change the superblock to add new file system features in a way that would break the ability for the user to roll back to an older kernel. This isn't just for ext3->ext4, but for new ext4 features such as metadata checksumming. The user has to explicitly enable the feature using "tune2fs -O new_feature /dev/sdXX". Yeah! 2015-09-01 16:39 GMT-03:00 Austin S Hemmelgarn : NO, it is not logical. A vast majority of Android smartphones in the wild use ext2, as do a very significant portion of embedded systems that don't have room for the few hundred kilobytes of extra code that the ext4 driver has in comparison to ext2. Ext2 portion embedded and Ext3 many machines. So basically the game plan is gutting ext3 because code-dupe with ext4, but keep ext2 because ext4 is too big for embedded to outright replace ext2? Hmm...are there any embedded systems out there that use ext3 and can fit its code ext3 but not ext4? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 15:31, Raymond Jennings wrote: On 08/31/15 14:37, Linus Torvalds wrote: On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara wrote: The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). I really am not ready to just remove ext3 without a lot of good arguments. There might well be people who this use ext3 as ext3, and don't want to update. I want more a rationale for removal than "ext4 can read old ext3 filesystems". I actually would agree that having two drivers for the same filesystem is redundant and unneeded code duplication. That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. Just gutting an entire driver because another driver can handle it only makes sense if nothing can go wrong and the potential for causing regressions is quite obvious. I think also that we should remove the ext2 driver before we remove the ext3 driver. My two cents. Just to ask a general opinion: Am I right that it's ok for kernel code to be organized how we (the developers) see fit as long as we don't break userspace or hardware in the process? So long as we function properly, should userspace care about how our source code is structured? I'm thinking yes, but it might be fruitful to see an answer archived on the list. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 15:39, Linus Torvalds wrote: On Mon, Aug 31, 2015 at 3:31 PM, Raymond Jennings wrote: That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. That's not my only worry. Things like "can you go back to ext3-only" is an issue too - I don't think that's been a big priority for ext4 any more, and if there are any existing hold-outs that still use ext3, they may want to be able to go back to old kernels. Then we should just consider anything making an ext3 system unusuable by older kernels as a regression to be stomped like any other. So it's not just a "you can use ext4 instead" issue. Can you do that *without* then forcing an upgrade forever on that partition? I'm not sure the ext4 people are really even willing to guarantee that kind of backwards compatibility. Breaking that guarantee would be an example of such a regression. I could be ok with removing ext3 in theory, but I haven't seen a lot of rationale for it, and I don't know if there are still users who may have their own good reasons to stay with ext3. Maybe there has been lots of discussion about this on fsdevel (which I don't follow), and I'm just lacking the background, but if so I want to see that background. Not just a oneliner description that basically says "remove ext3 support". I actually agree that removing support for ext3 as a filesystem is a bad idea. That would be a regression. What I'm in favor of is removing the ext3 code as redundant if ext4 code can handle everything. Of course, for it to be truly redundant, the ext4 code has to actually be capable of managing an ext3 filesystem without bumping it out of compatibility with older ext3 kernels. Any such bump would rightly be classified as a regression. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 14:37, Linus Torvalds wrote: On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara wrote: The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). I really am not ready to just remove ext3 without a lot of good arguments. There might well be people who this use ext3 as ext3, and don't want to update. I want more a rationale for removal than "ext4 can read old ext3 filesystems". I actually would agree that having two drivers for the same filesystem is redundant and unneeded code duplication. That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. Just gutting an entire driver because another driver can handle it only makes sense if nothing can go wrong and the potential for causing regressions is quite obvious. I think also that we should remove the ext2 driver before we remove the ext3 driver. My two cents. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 14:37, Linus Torvalds wrote: On Sun, Aug 30, 2015 at 11:19 PM, Jan Karawrote: The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). I really am not ready to just remove ext3 without a lot of good arguments. There might well be people who this use ext3 as ext3, and don't want to update. I want more a rationale for removal than "ext4 can read old ext3 filesystems". I actually would agree that having two drivers for the same filesystem is redundant and unneeded code duplication. That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. Just gutting an entire driver because another driver can handle it only makes sense if nothing can go wrong and the potential for causing regressions is quite obvious. I think also that we should remove the ext2 driver before we remove the ext3 driver. My two cents. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 15:39, Linus Torvalds wrote: On Mon, Aug 31, 2015 at 3:31 PM, Raymond Jennings <shent...@gmail.com> wrote: That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. That's not my only worry. Things like "can you go back to ext3-only" is an issue too - I don't think that's been a big priority for ext4 any more, and if there are any existing hold-outs that still use ext3, they may want to be able to go back to old kernels. Then we should just consider anything making an ext3 system unusuable by older kernels as a regression to be stomped like any other. So it's not just a "you can use ext4 instead" issue. Can you do that *without* then forcing an upgrade forever on that partition? I'm not sure the ext4 people are really even willing to guarantee that kind of backwards compatibility. Breaking that guarantee would be an example of such a regression. I could be ok with removing ext3 in theory, but I haven't seen a lot of rationale for it, and I don't know if there are still users who may have their own good reasons to stay with ext3. Maybe there has been lots of discussion about this on fsdevel (which I don't follow), and I'm just lacking the background, but if so I want to see that background. Not just a oneliner description that basically says "remove ext3 support". I actually agree that removing support for ext3 as a filesystem is a bad idea. That would be a regression. What I'm in favor of is removing the ext3 code as redundant if ext4 code can handle everything. Of course, for it to be truly redundant, the ext4 code has to actually be capable of managing an ext3 filesystem without bumping it out of compatibility with older ext3 kernels. Any such bump would rightly be classified as a regression. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Ext3 removal, quota & udf fixes
On 08/31/15 15:31, Raymond Jennings wrote: On 08/31/15 14:37, Linus Torvalds wrote: On Sun, Aug 30, 2015 at 11:19 PM, Jan Kara <j...@suse.cz> wrote: The biggest change in the pull is the removal of ext3 filesystem driver (~28k lines removed). I really am not ready to just remove ext3 without a lot of good arguments. There might well be people who this use ext3 as ext3, and don't want to update. I want more a rationale for removal than "ext4 can read old ext3 filesystems". I actually would agree that having two drivers for the same filesystem is redundant and unneeded code duplication. That said, I wouldn't mind myself if the ext4 driver were given a very grueling regression test to make sure it can actually handle old ext3 systems as well as the ext3 driver can. Just gutting an entire driver because another driver can handle it only makes sense if nothing can go wrong and the potential for causing regressions is quite obvious. I think also that we should remove the ext2 driver before we remove the ext3 driver. My two cents. Just to ask a general opinion: Am I right that it's ok for kernel code to be organized how we (the developers) see fit as long as we don't break userspace or hardware in the process? So long as we function properly, should userspace care about how our source code is structured? I'm thinking yes, but it might be fruitful to see an answer archived on the list. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Moving more kernel data into highmem?
Hey, I remembered that there's an option to put third level page tables in highmem. This might be a stupid question, but is there a way to move more kernel data into highmem? For example, page directories, first level page tables? I even remember a few articles on lwn about how much space is taken up by struct page. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Moving more kernel data into highmem?
Hey, I remembered that there's an option to put third level page tables in highmem. This might be a stupid question, but is there a way to move more kernel data into highmem? For example, page directories, first level page tables? I even remember a few articles on lwn about how much space is taken up by struct page. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Clean up whitespace in vfs.txt
On 08/10/15 02:31, Raymond Jennings wrote: I noticed that vfs.txt looked kinda funky, so I went ahead and reformatted it. Signed-off-by: Raymond Jennings Cc: Andrew Morton --- diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5eb8456..8ddfe06 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -114,12 +114,12 @@ members are defined: struct file_system_type { const char *name; int fs_flags; -struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); -void (*kill_sb) (struct super_block *); -struct module *owner; -struct file_system_type * next; -struct list_head fs_supers; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; }; @@ -136,7 +136,7 @@ struct file_system_type { should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. + most cases. next: for internal VFS use: you should initialize this to NULL @@ -145,7 +145,7 @@ struct file_system_type { The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized - by the specific filesystem code + by the specific filesystem code int flags: mount flags @@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The generic variants are: mount_nodev: mount a filesystem that is not backed by a device mount_single: mount a filesystem which shares the instance between - all mounts + all mounts A fill_super() callback implementation has the following arguments: struct super_block *sb: the superblock structure. The callback - must initialize this properly. + must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) @@ -208,26 +208,26 @@ This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { -struct inode *(*alloc_inode)(struct super_block *sb); -void (*destroy_inode)(struct inode *); - -void (*dirty_inode) (struct inode *, int flags); -int (*write_inode) (struct inode *, int); -void (*drop_inode) (struct inode *); -void (*delete_inode) (struct inode *); -void (*put_super) (struct super_block *); -int (*sync_fs)(struct super_block *sb, int wait); -int (*freeze_fs) (struct super_block *); -int (*unfreeze_fs) (struct super_block *); -int (*statfs) (struct dentry *, struct kstatfs *); -int (*remount_fs) (struct super_block *, int *, char *); -void (*clear_inode) (struct inode *); -void (*umount_begin) (struct super_block *); - -int (*show_options)(struct seq_file *, struct dentry *); - -ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); -ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); + + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); + + int (*show_options)(struct seq_file *, struct dentry *); + + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*nr_cached_objects)(struct super_block *); void (*free_cached_objects)(struct super_block *, int); }; @@ -238,14 +238,14 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which -
Re: [PATCH] Clean up whitespace in vfs.txt
On 08/10/15 02:31, Raymond Jennings wrote: I noticed that vfs.txt looked kinda funky, so I went ahead and reformatted it. Signed-off-by: Raymond Jennings Cc: Andrew Morton a...@linux-foundation.org --- diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5eb8456..8ddfe06 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -114,12 +114,12 @@ members are defined: struct file_system_type { const char *name; int fs_flags; -struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); -void (*kill_sb) (struct super_block *); -struct module *owner; -struct file_system_type * next; -struct list_head fs_supers; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; }; @@ -136,7 +136,7 @@ struct file_system_type { should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. + most cases. next: for internal VFS use: you should initialize this to NULL @@ -145,7 +145,7 @@ struct file_system_type { The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized - by the specific filesystem code + by the specific filesystem code int flags: mount flags @@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The generic variants are: mount_nodev: mount a filesystem that is not backed by a device mount_single: mount a filesystem which shares the instance between - all mounts + all mounts A fill_super() callback implementation has the following arguments: struct super_block *sb: the superblock structure. The callback - must initialize this properly. + must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string (see Mount Options section) @@ -208,26 +208,26 @@ This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { -struct inode *(*alloc_inode)(struct super_block *sb); -void (*destroy_inode)(struct inode *); - -void (*dirty_inode) (struct inode *, int flags); -int (*write_inode) (struct inode *, int); -void (*drop_inode) (struct inode *); -void (*delete_inode) (struct inode *); -void (*put_super) (struct super_block *); -int (*sync_fs)(struct super_block *sb, int wait); -int (*freeze_fs) (struct super_block *); -int (*unfreeze_fs) (struct super_block *); -int (*statfs) (struct dentry *, struct kstatfs *); -int (*remount_fs) (struct super_block *, int *, char *); -void (*clear_inode) (struct inode *); -void (*umount_begin) (struct super_block *); - -int (*show_options)(struct seq_file *, struct dentry *); - -ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); -ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); + + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); + + int (*show_options)(struct seq_file *, struct dentry *); + + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*nr_cached_objects)(struct super_block *); void (*free_cached_objects)(struct super_block *, int); }; @@ -238,14 +238,14 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 16:18, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds wrote: The _only_ thing that matters is that something broke. To clarify: things like test programs etc don't matter. Real applications, used by real users. That's what regressions cover. If you have a workflow that isn't just some random kernel test thing, and you depend on it, and we break it, the kernel is supposed to fix it. There are some (very few) exceptions. If it's a security issue, we may not be able to "fix" it, because other concerns can obviously take precedence. Also, sometimes the reports come in way too late - if you were running some stable distro kernel for several years, and updated, and notice a change that happened four years ago and modern applications now rely on the _new_ behavior, we may not be able to fix the regression any more. But no, "it was an unintentional kernel bug and clearly just stupid crap code, and we fixed it and now the kernel is much better and cleaner" is not a valid reason for regressions. We'll go back to the stupid and crap code if necessary, however much that may annoy us. For an example of the kind of things we may have to do, see commits 64f371bc3107 autofs: make the autofsv5 packet file descriptor use a packetized pipe 9883035ae7ed pipes: add a "packetized pipe" mode for writing and just wonder at the insanity. That's the kinds of things that happen when one application had actively worked around a bug in compatibility handling, and then trying to "fix" that bug just caused another application to break instead. Linus Is there a way to temporally confine the bad crap code just to the applications that depend on it, or does a userspace app latching onto bad behavior effectively lock down the abi for the future? I know that some features in the kernel get deprecated over a process (devfs for example) once userspace is given an alternative...would there be a process like that to deal with userspace code that is pinning a piece of crap in the kernel? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 14:46, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings wrote: I am curious about what's supposed to happen normally on signal delivery. Is SS a register that's supposed to be preserved like EIP/RIP and CS when a signal is delivered? What exactly does "supposed" mean? Basically, when a process/thread receives a signal, what happens to its registers? So clearly, we're not "supposed" to save/restore it. Because reality matters a hell of a lot more than any theoretical arguments. So it still counts as a regression if the kernel pulls the rug out from under someone that was relying on undocumented or buggy behavior? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 13:09, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov wrote: If only I'm not missin something obvious this should not hurt us. But I gonna build test kernel and check to be sure tomorrow, ok? Thanks, Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ I am curious about what's supposed to happen normally on signal delivery. Is SS a register that's supposed to be preserved like EIP/RIP and CS when a signal is delivered? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 13:09, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 1:08 PM, Cyrill Gorcunov gorcu...@gmail.com wrote: If only I'm not missin something obvious this should not hurt us. But I gonna build test kernel and check to be sure tomorrow, ok? Thanks, Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ I am curious about what's supposed to happen normally on signal delivery. Is SS a register that's supposed to be preserved like EIP/RIP and CS when a signal is delivered? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 14:46, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 2:42 PM, Raymond Jennings shent...@gmail.com wrote: I am curious about what's supposed to happen normally on signal delivery. Is SS a register that's supposed to be preserved like EIP/RIP and CS when a signal is delivered? What exactly does supposed mean? Basically, when a process/thread receives a signal, what happens to its registers? So clearly, we're not supposed to save/restore it. Because reality matters a hell of a lot more than any theoretical arguments. So it still counts as a regression if the kernel pulls the rug out from under someone that was relying on undocumented or buggy behavior? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [regression] x86/signal/64: Fix SS handling for signals delivered to 64-bit programs breaks dosemu
On 08/13/15 16:18, Linus Torvalds wrote: On Thu, Aug 13, 2015 at 4:05 PM, Linus Torvalds torva...@linux-foundation.org wrote: The _only_ thing that matters is that something broke. To clarify: things like test programs etc don't matter. Real applications, used by real users. That's what regressions cover. If you have a workflow that isn't just some random kernel test thing, and you depend on it, and we break it, the kernel is supposed to fix it. There are some (very few) exceptions. If it's a security issue, we may not be able to fix it, because other concerns can obviously take precedence. Also, sometimes the reports come in way too late - if you were running some stable distro kernel for several years, and updated, and notice a change that happened four years ago and modern applications now rely on the _new_ behavior, we may not be able to fix the regression any more. But no, it was an unintentional kernel bug and clearly just stupid crap code, and we fixed it and now the kernel is much better and cleaner is not a valid reason for regressions. We'll go back to the stupid and crap code if necessary, however much that may annoy us. For an example of the kind of things we may have to do, see commits 64f371bc3107 autofs: make the autofsv5 packet file descriptor use a packetized pipe 9883035ae7ed pipes: add a packetized pipe mode for writing and just wonder at the insanity. That's the kinds of things that happen when one application had actively worked around a bug in compatibility handling, and then trying to fix that bug just caused another application to break instead. Linus Is there a way to temporally confine the bad crap code just to the applications that depend on it, or does a userspace app latching onto bad behavior effectively lock down the abi for the future? I know that some features in the kernel get deprecated over a process (devfs for example) once userspace is given an alternative...would there be a process like that to deal with userspace code that is pinning a piece of crap in the kernel? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Clean up whitespace in vfs.txt
I noticed that vfs.txt looked kinda funky, so I went ahead and reformatted it. Signed-off-by: Raymond Jennings Cc: Andrew Morton --- diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5eb8456..8ddfe06 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -114,12 +114,12 @@ members are defined: struct file_system_type { const char *name; int fs_flags; -struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); -void (*kill_sb) (struct super_block *); -struct module *owner; -struct file_system_type * next; -struct list_head fs_supers; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; }; @@ -136,7 +136,7 @@ struct file_system_type { should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. + most cases. next: for internal VFS use: you should initialize this to NULL @@ -145,7 +145,7 @@ struct file_system_type { The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized - by the specific filesystem code + by the specific filesystem code int flags: mount flags @@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The generic variants are: mount_nodev: mount a filesystem that is not backed by a device mount_single: mount a filesystem which shares the instance between - all mounts + all mounts A fill_super() callback implementation has the following arguments: struct super_block *sb: the superblock structure. The callback - must initialize this properly. + must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string (see "Mount Options" section) @@ -208,26 +208,26 @@ This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { -struct inode *(*alloc_inode)(struct super_block *sb); -void (*destroy_inode)(struct inode *); - -void (*dirty_inode) (struct inode *, int flags); -int (*write_inode) (struct inode *, int); -void (*drop_inode) (struct inode *); -void (*delete_inode) (struct inode *); -void (*put_super) (struct super_block *); -int (*sync_fs)(struct super_block *sb, int wait); -int (*freeze_fs) (struct super_block *); -int (*unfreeze_fs) (struct super_block *); -int (*statfs) (struct dentry *, struct kstatfs *); -int (*remount_fs) (struct super_block *, int *, char *); -void (*clear_inode) (struct inode *); -void (*umount_begin) (struct super_block *); - -int (*show_options)(struct seq_file *, struct dentry *); - -ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); -ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); + + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); + + int (*show_options)(struct seq_file *, struct dentry *); + + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*nr_cached_objects)(struct super_block *); void (*free_cached_objects)(struct super_block *, int); }; @@ -238,14 +238,14 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which - contains a 'struct inode' embedded within it. + for struct inode and
[PATCH] Clean up whitespace in vfs.txt
I noticed that vfs.txt looked kinda funky, so I went ahead and reformatted it. Signed-off-by: Raymond Jennings Cc: Andrew Morton a...@linux-foundation.org --- diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 5eb8456..8ddfe06 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -114,12 +114,12 @@ members are defined: struct file_system_type { const char *name; int fs_flags; -struct dentry *(*mount) (struct file_system_type *, int, - const char *, void *); -void (*kill_sb) (struct super_block *); -struct module *owner; -struct file_system_type * next; -struct list_head fs_supers; + struct dentry *(*mount) (struct file_system_type *, int, + const char *, void *); + void (*kill_sb) (struct super_block *); + struct module *owner; + struct file_system_type * next; + struct list_head fs_supers; struct lock_class_key s_lock_key; struct lock_class_key s_umount_key; }; @@ -136,7 +136,7 @@ struct file_system_type { should be shut down owner: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. + most cases. next: for internal VFS use: you should initialize this to NULL @@ -145,7 +145,7 @@ struct file_system_type { The mount() method has the following arguments: struct file_system_type *fs_type: describes the filesystem, partly initialized - by the specific filesystem code + by the specific filesystem code int flags: mount flags @@ -182,12 +182,12 @@ and provides a fill_super() callback instead. The generic variants are: mount_nodev: mount a filesystem that is not backed by a device mount_single: mount a filesystem which shares the instance between - all mounts + all mounts A fill_super() callback implementation has the following arguments: struct super_block *sb: the superblock structure. The callback - must initialize this properly. + must initialize this properly. void *data: arbitrary mount options, usually comes as an ASCII string (see Mount Options section) @@ -208,26 +208,26 @@ This describes how the VFS can manipulate the superblock of your filesystem. As of kernel 2.6.22, the following members are defined: struct super_operations { -struct inode *(*alloc_inode)(struct super_block *sb); -void (*destroy_inode)(struct inode *); - -void (*dirty_inode) (struct inode *, int flags); -int (*write_inode) (struct inode *, int); -void (*drop_inode) (struct inode *); -void (*delete_inode) (struct inode *); -void (*put_super) (struct super_block *); -int (*sync_fs)(struct super_block *sb, int wait); -int (*freeze_fs) (struct super_block *); -int (*unfreeze_fs) (struct super_block *); -int (*statfs) (struct dentry *, struct kstatfs *); -int (*remount_fs) (struct super_block *, int *, char *); -void (*clear_inode) (struct inode *); -void (*umount_begin) (struct super_block *); - -int (*show_options)(struct seq_file *, struct dentry *); - -ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); -ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); + struct inode *(*alloc_inode)(struct super_block *sb); + void (*destroy_inode)(struct inode *); + + void (*dirty_inode) (struct inode *, int flags); + int (*write_inode) (struct inode *, int); + void (*drop_inode) (struct inode *); + void (*delete_inode) (struct inode *); + void (*put_super) (struct super_block *); + int (*sync_fs)(struct super_block *sb, int wait); + int (*freeze_fs) (struct super_block *); + int (*unfreeze_fs) (struct super_block *); + int (*statfs) (struct dentry *, struct kstatfs *); + int (*remount_fs) (struct super_block *, int *, char *); + void (*clear_inode) (struct inode *); + void (*umount_begin) (struct super_block *); + + int (*show_options)(struct seq_file *, struct dentry *); + + ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); + ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*nr_cached_objects)(struct super_block *); void (*free_cached_objects)(struct super_block *, int); }; @@ -238,14 +238,14 @@ only called from a process context (i.e. not from an interrupt handler or bottom half). alloc_inode: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which - contains a 'struct inode' embedded within it. + for struct
Re: Dealing with the NMI mess
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote: > [moved to a new thread, cc list trimmed] > > Hi all- > > We've considered two approaches to dealing with NMIs: > > 1. Allow nesting. We know quite well how messy that is. This might be a stupid question, but 1. What exactly does the NMI handler handle 2. Is it possible for the NMI handler to just increment a counter and return if it nests, and let the outer handler notice and rerun itself. > 2. Forbid IRET inside NMIs. Doable but maybe not that pretty. > > We haven't considered: > > 3. Forbid faults (other than MCE) inside NMI. > > Option 3 is almost easy. There are really only two kinds of faults > that can legitimately nest inside NMI: #PF and #DB. #DB is easy to > fix (e.g. with my patches or Peter's patches). > > What if we went all out and forbade page faults in NMI as well. There > are two reasons that I can think of that we might page fault inside an > NMI: > > a) vmalloc fault. I think Ingo already half-implemented a rework to > eliminate vmalloc faults entirely. > > b) User memory access faults. > > The reason we access user state in general from an NMI is to allow > perf to capture enough user stack data to let the tooling backtrace > back to user space. What if we did it differently? Instead of > capturing this data in NMI context, capture it in > prepare_exit_to_usermode. That would let us capture user state > *correctly*, which we currently can't really do. There's a > never-ending series of minor bugs in which we try to guess the user > register state from NMI context, and it sort of works. In > prepare_exit_to_usermode, we really truly know the user state. > There's a race where an NMI hits during or after > prepare_exit_to_usermode, but maybe that's okay -- just admit defeat > in that case and don't show the user state. (Realistically, without > CFI data, we're not going to be guaranteed to get the right state > anyway.) > > To make this work, we'd have to teach NMI-from-userspace to call the > callback itself. It would look like: > > prepare_exit_to_usermode() { > ... > while (blah blah blah) { > if (cached_flags & TIF_PERF_CAPTURE_USER_STATE) > perf_capture_user_state(); > ... > } > ... > } > > and then, on NMI exit, we'd call perf_capture_user_state directly, > since we don't want to enable IRQs or do opportunsitic sysret on exit > from NMI. (Why not? Because NMIs are still masked, and we don't want > to pay for double-IRET to unmask them, so we really want to leave IRQs > off and IRET straight back to user mode.) > > There's an unavoidable race in which we enter user mode with > TIF_PERF_CAPTURE_USER_STATE still set. In principle, we could > IPI-to-self from the NMI handler to cover that case (mostly -- we > capture the wrong state if we're on our way to an IRET fault), or we > could just check on entry if the flag is still set and, if so, admit > defeat. > > Peter, can this be done without breaking the perf ABI? If we were > designing all of this stuff from scratch right now, I'd suggest doing > it this way, but I'm not sure whether it makes sense to try to > retrofit it in. > > > If we decide to stick with option 2, then I've now convinced myself > that banning all kernel breakpoints and watchpoints during NMI > processing is probably for the best. Maybe we should go one step > farther and ban all DR7 breakpoints period. Sure, it will slow down > perf if there are user breakpoints or watchpoints set, but, having > looked at the asm, returning from #DB using RET is, while doable, > distinctly ugly. > > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dealing with the NMI mess
On Thu, 2015-07-23 at 13:21 -0700, Andy Lutomirski wrote: [moved to a new thread, cc list trimmed] Hi all- We've considered two approaches to dealing with NMIs: 1. Allow nesting. We know quite well how messy that is. This might be a stupid question, but 1. What exactly does the NMI handler handle 2. Is it possible for the NMI handler to just increment a counter and return if it nests, and let the outer handler notice and rerun itself. 2. Forbid IRET inside NMIs. Doable but maybe not that pretty. We haven't considered: 3. Forbid faults (other than MCE) inside NMI. Option 3 is almost easy. There are really only two kinds of faults that can legitimately nest inside NMI: #PF and #DB. #DB is easy to fix (e.g. with my patches or Peter's patches). What if we went all out and forbade page faults in NMI as well. There are two reasons that I can think of that we might page fault inside an NMI: a) vmalloc fault. I think Ingo already half-implemented a rework to eliminate vmalloc faults entirely. b) User memory access faults. The reason we access user state in general from an NMI is to allow perf to capture enough user stack data to let the tooling backtrace back to user space. What if we did it differently? Instead of capturing this data in NMI context, capture it in prepare_exit_to_usermode. That would let us capture user state *correctly*, which we currently can't really do. There's a never-ending series of minor bugs in which we try to guess the user register state from NMI context, and it sort of works. In prepare_exit_to_usermode, we really truly know the user state. There's a race where an NMI hits during or after prepare_exit_to_usermode, but maybe that's okay -- just admit defeat in that case and don't show the user state. (Realistically, without CFI data, we're not going to be guaranteed to get the right state anyway.) To make this work, we'd have to teach NMI-from-userspace to call the callback itself. It would look like: prepare_exit_to_usermode() { ... while (blah blah blah) { if (cached_flags TIF_PERF_CAPTURE_USER_STATE) perf_capture_user_state(); ... } ... } and then, on NMI exit, we'd call perf_capture_user_state directly, since we don't want to enable IRQs or do opportunsitic sysret on exit from NMI. (Why not? Because NMIs are still masked, and we don't want to pay for double-IRET to unmask them, so we really want to leave IRQs off and IRET straight back to user mode.) There's an unavoidable race in which we enter user mode with TIF_PERF_CAPTURE_USER_STATE still set. In principle, we could IPI-to-self from the NMI handler to cover that case (mostly -- we capture the wrong state if we're on our way to an IRET fault), or we could just check on entry if the flag is still set and, if so, admit defeat. Peter, can this be done without breaking the perf ABI? If we were designing all of this stuff from scratch right now, I'd suggest doing it this way, but I'm not sure whether it makes sense to try to retrofit it in. If we decide to stick with option 2, then I've now convinced myself that banning all kernel breakpoints and watchpoints during NMI processing is probably for the best. Maybe we should go one step farther and ban all DR7 breakpoints period. Sure, it will slow down perf if there are user breakpoints or watchpoints set, but, having looked at the asm, returning from #DB using RET is, while doable, distinctly ugly. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary support
On Mon, 2015-07-06 at 10:59 -0700, Andy Lutomirski wrote: > On Mon, Jul 6, 2015 at 10:40 AM, Ingo Molnar wrote: > > > > * Andy Lutomirski wrote: > > > >> > My reasoning: on modern uarchs there's no penalty for 32-bit > >> > misalignment of > >> > 64-bit variables, only if they cross 64-byte cache lines, which should > >> > be rare > >> > with a chance of 1:16. This small penalty (of at most +1 cycle in some > >> > circumstances IIRC) should be more than counterbalanced by the > >> > compression of > >> > the stack by 5% on average. > >> > >> I'll counter with: what's the benefit? There are no operations that will > >> naturally change RSP by anything that isn't a multiple of 8 (there's no > >> pushl in > >> 64-bit mode, or at least not on AMD chips -- the Intel manual is a bit > >> vague on > >> this point), so we'll end up with RSP being a multiple of 8 regardless. > >> Even if > >> we somehow shaved 4 bytes off in asm, that still wouldn't buy us anything, > >> as a > >> dangling 4 bytes at the bottom of the stack isn't useful for anything. > > > > Yeah, so it might be utilized in frame-pointer less builds (which we might > > be able > > to utilize in the future if sane Dwarf code comes around), which does not > > use > > push/pop to manage the stack but often has patterns like: > > > > 8102aa90 : > > 8102aa90: 48 83 ec 18 sub$0x18,%rsp > > > > and uses MOVs to manage the stack. Those kinds of stack frames could be > > 4-byte > > granular as well. > > > > But yeah ... it's pretty marginal. > > To get even that, we'd need an additional ABI-changing GCC flag to > change GCC's idea of the alignment of long from 8 to 4. (I just > checked: g++ thinks that alignof(long) == 8. I was too lazy to look > up how to ask the equivalent question in C.) I just want to point out that long itself is 8 bytes on 64-bit x86, but only 4 bytes on 32-bit x86. Perhaps we should keep in mind sizeof(long) and not just alignof(long)? My opinion btw, is that if long is 8 bytes wide, it should also be 8 bytes aligned. > --Andy > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary support
On Mon, 2015-07-06 at 10:59 -0700, Andy Lutomirski wrote: On Mon, Jul 6, 2015 at 10:40 AM, Ingo Molnar mi...@kernel.org wrote: * Andy Lutomirski l...@amacapital.net wrote: My reasoning: on modern uarchs there's no penalty for 32-bit misalignment of 64-bit variables, only if they cross 64-byte cache lines, which should be rare with a chance of 1:16. This small penalty (of at most +1 cycle in some circumstances IIRC) should be more than counterbalanced by the compression of the stack by 5% on average. I'll counter with: what's the benefit? There are no operations that will naturally change RSP by anything that isn't a multiple of 8 (there's no pushl in 64-bit mode, or at least not on AMD chips -- the Intel manual is a bit vague on this point), so we'll end up with RSP being a multiple of 8 regardless. Even if we somehow shaved 4 bytes off in asm, that still wouldn't buy us anything, as a dangling 4 bytes at the bottom of the stack isn't useful for anything. Yeah, so it might be utilized in frame-pointer less builds (which we might be able to utilize in the future if sane Dwarf code comes around), which does not use push/pop to manage the stack but often has patterns like: 8102aa90 SyS_getpriority: 8102aa90: 48 83 ec 18 sub$0x18,%rsp and uses MOVs to manage the stack. Those kinds of stack frames could be 4-byte granular as well. But yeah ... it's pretty marginal. To get even that, we'd need an additional ABI-changing GCC flag to change GCC's idea of the alignment of long from 8 to 4. (I just checked: g++ thinks that alignof(long) == 8. I was too lazy to look up how to ask the equivalent question in C.) I just want to point out that long itself is 8 bytes on 64-bit x86, but only 4 bytes on 32-bit x86. Perhaps we should keep in mind sizeof(long) and not just alignof(long)? My opinion btw, is that if long is 8 bytes wide, it should also be 8 bytes aligned. --Andy -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] tty: fix up atime/mtime mess, take four
On Fri, 2015-02-27 at 18:40 +0100, Jiri Slaby wrote: > So check the absolute difference of times and if it large than "8 > seconds or so", always update the time. That means we will update > immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the > check, but it was always that way. If I may ask, what is supposed to happen normally when you write to a tty device? I always thought the tty device was treated just like a normal file wrt. timestamps. Now I see a patch for 8 seconds something. > > Thanks John for serving me this so nicely debugged. > > Signed-off-by: Jiri Slaby > Reported-by: John Paul Perry > Cc: Greg Kroah-Hartman > Cc: # all, as b0b885657 was backported > Cc: Linus Torvalds > --- > drivers/tty/tty_io.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c > index e07f35e14fa2..e31b18a6d576 100644 > --- a/drivers/tty/tty_io.c > +++ b/drivers/tty/tty_io.c > @@ -1032,8 +1032,8 @@ EXPORT_SYMBOL(start_tty); > /* We limit tty time update visibility to every 8 seconds or so. */ > static void tty_update_time(struct timespec *time) > { > - unsigned long sec = get_seconds() & ~7; > - if ((long)(sec - time->tv_sec) > 0) > + unsigned long sec = get_seconds(); > + if (abs(sec - time->tv_sec) & ~7) > time->tv_sec = sec; > } > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/1] tty: fix up atime/mtime mess, take four
On Fri, 2015-02-27 at 18:40 +0100, Jiri Slaby wrote: So check the absolute difference of times and if it large than 8 seconds or so, always update the time. That means we will update immediatelly when changing time. Ergo, CAP_SYS_TIME can foul the check, but it was always that way. If I may ask, what is supposed to happen normally when you write to a tty device? I always thought the tty device was treated just like a normal file wrt. timestamps. Now I see a patch for 8 seconds something. Thanks John for serving me this so nicely debugged. Signed-off-by: Jiri Slaby jsl...@suse.cz Reported-by: John Paul Perry john_paul.pe...@alcatel-lucent.com Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Cc: sta...@vger.kernel.org # all, as b0b885657 was backported Cc: Linus Torvalds torva...@linux-foundation.org --- drivers/tty/tty_io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c index e07f35e14fa2..e31b18a6d576 100644 --- a/drivers/tty/tty_io.c +++ b/drivers/tty/tty_io.c @@ -1032,8 +1032,8 @@ EXPORT_SYMBOL(start_tty); /* We limit tty time update visibility to every 8 seconds or so. */ static void tty_update_time(struct timespec *time) { - unsigned long sec = get_seconds() ~7; - if ((long)(sec - time-tv_sec) 0) + unsigned long sec = get_seconds(); + if (abs(sec - time-tv_sec) ~7) time-tv_sec = sec; } -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mmotm: swap overflow warning patch: mangled description and missing review tag
I checked the mmotm queue and it seems that my mid-air corrections got the patch mangled when it was saved to your mail queue, and in addition to a missing correction of a typo in my testing log, Rik van Riel's Reviewed-By tag vanished http://www.ozlabs.org/~akpm/mmotm/broken-out/swap-warn-when-a-swap-area-overflows-the-maximum-size.patch If you could fix my test transcript and properly credit Rik for reviewing my patch before you ship it to linus I'd appreciate it. The correctly formatted patch and description with corrections and tags follows: From: Raymond Jennings Subject: swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 64G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k Signed-off-by: Raymond Jennings Acked-by: Valdis Kletnieks Reviewed-by: Rik van Riel Cc: Hugh Dickins Signed-off-by: Andrew Morton --- mm/swapfile.c |6 ++ 1 file changed, 6 insertions(+) diff -puN mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size mm/swapfile.c --- a/mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size +++ a/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(st */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (swap_header->info.last_page > maxpages) { + printk(KERN_WARNING + "Truncating oversized swap area, only using %luk out of %luk\n", + maxpages << (PAGE_SHIFT - 10), + swap_header->info.last_page << (PAGE_SHIFT - 10)); + } if (maxpages > swap_header->info.last_page) { maxpages = swap_header->info.last_page + 1; /* p->max is an unsigned int: don't overflow it */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
mmotm: swap overflow warning patch: mangled description and missing review tag
I checked the mmotm queue and it seems that my mid-air corrections got the patch mangled when it was saved to your mail queue, and in addition to a missing correction of a typo in my testing log, Rik van Riel's Reviewed-By tag vanished http://www.ozlabs.org/~akpm/mmotm/broken-out/swap-warn-when-a-swap-area-overflows-the-maximum-size.patch If you could fix my test transcript and properly credit Rik for reviewing my patch before you ship it to linus I'd appreciate it. The correctly formatted patch and description with corrections and tags follows: From: Raymond Jennings shent...@gmail.com Subject: swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 64G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k Signed-off-by: Raymond Jennings shent...@gmail.com Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu Reviewed-by: Rik van Riel r...@redhat.com Cc: Hugh Dickins hu...@google.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- mm/swapfile.c |6 ++ 1 file changed, 6 insertions(+) diff -puN mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size mm/swapfile.c --- a/mm/swapfile.c~swap-warn-when-a-swap-area-overflows-the-maximum-size +++ a/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(st */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (swap_header-info.last_page maxpages) { + printk(KERN_WARNING + Truncating oversized swap area, only using %luk out of %luk\n, + maxpages (PAGE_SHIFT - 10), + swap_header-info.last_page (PAGE_SHIFT - 10)); + } if (maxpages swap_header-info.last_page) { maxpages = swap_header-info.last_page + 1; /* p-max is an unsigned int: don't overflow it */ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ATTEND] How to act on LKML (was: [ 00/19] 3.10.1-stable review)
On Mon, 2013-07-15 at 15:38 -0700, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 3:08 PM, Steven Rostedt wrote: > > > > Can we please make this into a Kernel Summit discussion. I highly doubt > > we would solve anything, but it certainly would be a fun segment to > > watch :-) > > I think we should, because I think it's the kind of thing we really > need at the KS - talking about "process". > > At the same time, I really don't know what the format would possibly > be like for it to really work as a reasonable discussion. And I think > that is important, because this kind of subject is *not* likely > possible in the traditional "people sit around tables and maybe > somebody has a few slides" format. > > A small panel discussion with a few people (fiveish?) that have very > different viewpoints, along with baskets of rotten fruit set out on > the tables? That could be fun. And I'm serious, although we might want > to limit the size of the fruit to smaller berries ;) > > Sarah will bring the brownies. I'm sure slashdot will be happy to follow up, seeing as how this heated discussion just made headlines there. http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-torvalds-to-stop-using-abusive-language Personally I *like* when abusive language is used, assuming it's used appropriately. I *hate very much* when people are nice to me and let their frustrations grow, only to ambush me later with a string of curses and lashings in one fell swoop. Not only does "holding it in" set me up for failure becuase I remain ignorant, I also feel downright betrayed when they come off as vindictive bastards that saved their beefs until the moment was ripe to do the most damage. It doesn't just make me lose respect for them, it makes me lose trust. Give me an honest asshole over a silver tongued backstabber any day. >Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_* used by user-space to figure out whether a feature is on/off
On Mon, 2013-07-15 at 17:53 -0700, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 5:46 PM, Raymond Jennings wrote: > > > > I'd like to point out that Google Chrome also makes use of CONFIG_ tests > > to detect support for namespaces and pid containers and stuff. > > Hmm. It must work fine despite that. Because I run self-compiled > kernels and there are no config files to be found by user apps. > Neither in /proc nor in /boot. And I'm using chrome to write this. > > Linus It could be the quirks of my package manager though. I run Gentoo, so it's entirely possible that the ebuild is doing all the bitching, but chrome itself just fails gracefully and falls back to not using those features if it can't find them. I imagine though that the same stuff applying to applications in general should also apply to installers. Anyway I just wanted to highlight that it's not just the xen stuff that's peeking at kernel config. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_* used by user-space to figure out whether a feature is on/off
On Mon, 2013-07-15 at 10:17 -0700, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 8:40 AM, Konrad Rzeszutek Wilk > wrote: > > > > I am hoping you can help me draw an understanding and a line in sand > > whether: > > a) Tools should not depend on /proc/config.gz to figure out whether > > a kernel has some CONFIG_X=y feature. > > Well, /proc/config.gz is better than some crazy saved-off config file, > since it at least is guaranteed to match the kernel you're running, > but it's still a completely crazy idea. Not the least because it's not > at all guaranteed to be there, and even if it's there, we'll rename > config options without caring one whit. It's meant for "make > oldconfig" style stuff, nothing more. Any user program that depends on > it is broken by design. > > > b) If they are OK to do so, what do we do when certain CONFIG_X options > > get reworked/removed. Would they be considered regressions? Aka > > is this similar to 'you shall not break user-space'? > > Absolutely not. If you depend on any config file, you're broken by > definition. The only thing that can depend on the config file is the > kernel tree itself, and even then we happily break that at any time > (ie "make oldconfig" is meant to give an _approximation_ of the old > config, but if some config option gets renamed, the old value is > thrown away without question, and the new name is asked about). > > > Irrespective of that, do you have any ideas of how a user-space program > > (say GRUB) > > can figure out whether the configuration stanze it generates is supported by > > the kernel. I'd like to point out that Google Chrome also makes use of CONFIG_ tests to detect support for namespaces and pid containers and stuff. > If you don't want to answer this question - since this might > > open a can of worms you prefer not to deal with - that is absolutly OK. > > I think grub should stop trying to be clever. Quite frankly, from my > own experience, grub has become too clever by half, and become harder > to use and configure as a result. Just don't do it. > > If you want to have grub Xen options for the kernel, make them grub > options. In the grub config file. And if that option isn't there, just > boot it as a native kernel. That had better work anyway, and is a hell > of a lot more flexible and stable anyway. Don't try to be clever, and > certainly don't try to parse some random config file that may or may > not even match the kernel you're booting. > > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 00/19] 3.10.1-stable review
On Mon, 2013-07-15 at 12:23 -0700, Linus Torvalds wrote: > On Mon, Jul 15, 2013 at 12:17 PM, Willy Tarreau wrote: > > > > BTW, I was amazed that you managed to get him have a much softer tone inr > > his last e-mail, you probably found a weakness here in his management > > process :-) > > Hey, I _like_ arguing, and "cursing" and "arguing" are actually not at > all the same thing. > > And I really don't tend to curse unless people are doing something > stupid and annoying. If people have concerns and questions that I feel > are valid, I'm more than happy to talk about it. > > I curse when there isn't any argument. The cursing happens for the > "you're so f*cking wrong that it's not even worth trying to make > logical arguments about it, because you have no possible excuse" case. > > .. and sometimes people surprise me and come back with a valid excuse > after all. "My whole family died in a tragic freak accident and my > pony got cancer, and I was distracted". ...At least with the recent SCOTUS ruling, if you took your pony to a vet you wouldn't have to worry about Hasbro suing him for patent infringement... > And then I might even tell them I'm sorry. > > No. Not really. > >Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ 00/19] 3.10.1-stable review
On Mon, 2013-07-15 at 12:23 -0700, Linus Torvalds wrote: On Mon, Jul 15, 2013 at 12:17 PM, Willy Tarreau w...@1wt.eu wrote: BTW, I was amazed that you managed to get him have a much softer tone inr his last e-mail, you probably found a weakness here in his management process :-) Hey, I _like_ arguing, and cursing and arguing are actually not at all the same thing. And I really don't tend to curse unless people are doing something stupid and annoying. If people have concerns and questions that I feel are valid, I'm more than happy to talk about it. I curse when there isn't any argument. The cursing happens for the you're so f*cking wrong that it's not even worth trying to make logical arguments about it, because you have no possible excuse case. .. and sometimes people surprise me and come back with a valid excuse after all. My whole family died in a tragic freak accident and my pony got cancer, and I was distracted. ...At least with the recent SCOTUS ruling, if you took your pony to a vet you wouldn't have to worry about Hasbro suing him for patent infringement... And then I might even tell them I'm sorry. No. Not really. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_* used by user-space to figure out whether a feature is on/off
On Mon, 2013-07-15 at 10:17 -0700, Linus Torvalds wrote: On Mon, Jul 15, 2013 at 8:40 AM, Konrad Rzeszutek Wilk konrad.w...@oracle.com wrote: I am hoping you can help me draw an understanding and a line in sand whether: a) Tools should not depend on /proc/config.gz to figure out whether a kernel has some CONFIG_X=y feature. Well, /proc/config.gz is better than some crazy saved-off config file, since it at least is guaranteed to match the kernel you're running, but it's still a completely crazy idea. Not the least because it's not at all guaranteed to be there, and even if it's there, we'll rename config options without caring one whit. It's meant for make oldconfig style stuff, nothing more. Any user program that depends on it is broken by design. b) If they are OK to do so, what do we do when certain CONFIG_X options get reworked/removed. Would they be considered regressions? Aka is this similar to 'you shall not break user-space'? Absolutely not. If you depend on any config file, you're broken by definition. The only thing that can depend on the config file is the kernel tree itself, and even then we happily break that at any time (ie make oldconfig is meant to give an _approximation_ of the old config, but if some config option gets renamed, the old value is thrown away without question, and the new name is asked about). Irrespective of that, do you have any ideas of how a user-space program (say GRUB) can figure out whether the configuration stanze it generates is supported by the kernel. I'd like to point out that Google Chrome also makes use of CONFIG_ tests to detect support for namespaces and pid containers and stuff. If you don't want to answer this question - since this might open a can of worms you prefer not to deal with - that is absolutly OK. I think grub should stop trying to be clever. Quite frankly, from my own experience, grub has become too clever by half, and become harder to use and configure as a result. Just don't do it. If you want to have grub Xen options for the kernel, make them grub options. In the grub config file. And if that option isn't there, just boot it as a native kernel. That had better work anyway, and is a hell of a lot more flexible and stable anyway. Don't try to be clever, and certainly don't try to parse some random config file that may or may not even match the kernel you're booting. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_* used by user-space to figure out whether a feature is on/off
On Mon, 2013-07-15 at 17:53 -0700, Linus Torvalds wrote: On Mon, Jul 15, 2013 at 5:46 PM, Raymond Jennings shent...@gmail.com wrote: I'd like to point out that Google Chrome also makes use of CONFIG_ tests to detect support for namespaces and pid containers and stuff. Hmm. It must work fine despite that. Because I run self-compiled kernels and there are no config files to be found by user apps. Neither in /proc nor in /boot. And I'm using chrome to write this. Linus It could be the quirks of my package manager though. I run Gentoo, so it's entirely possible that the ebuild is doing all the bitching, but chrome itself just fails gracefully and falls back to not using those features if it can't find them. I imagine though that the same stuff applying to applications in general should also apply to installers. Anyway I just wanted to highlight that it's not just the xen stuff that's peeking at kernel config. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ATTEND] How to act on LKML (was: [ 00/19] 3.10.1-stable review)
On Mon, 2013-07-15 at 15:38 -0700, Linus Torvalds wrote: On Mon, Jul 15, 2013 at 3:08 PM, Steven Rostedt rost...@goodmis.org wrote: Can we please make this into a Kernel Summit discussion. I highly doubt we would solve anything, but it certainly would be a fun segment to watch :-) I think we should, because I think it's the kind of thing we really need at the KS - talking about process. At the same time, I really don't know what the format would possibly be like for it to really work as a reasonable discussion. And I think that is important, because this kind of subject is *not* likely possible in the traditional people sit around tables and maybe somebody has a few slides format. A small panel discussion with a few people (fiveish?) that have very different viewpoints, along with baskets of rotten fruit set out on the tables? That could be fun. And I'm serious, although we might want to limit the size of the fruit to smaller berries ;) Sarah will bring the brownies. I'm sure slashdot will be happy to follow up, seeing as how this heated discussion just made headlines there. http://linux.slashdot.org/story/13/07/15/2316219/kernel-dev-tells-linus-torvalds-to-stop-using-abusive-language Personally I *like* when abusive language is used, assuming it's used appropriately. I *hate very much* when people are nice to me and let their frustrations grow, only to ambush me later with a string of curses and lashings in one fell swoop. Not only does holding it in set me up for failure becuase I remain ignorant, I also feel downright betrayed when they come off as vindictive bastards that saved their beefs until the moment was ripe to do the most damage. It doesn't just make me lose respect for them, it makes me lose trust. Give me an honest asshole over a silver tongued backstabber any day. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] swap: warn when a swap area overflows the maximum size
Screwed up and didn't attach my fixed test log to the second version. See below. On Sun, 2013-07-07 at 15:31 -0400, Rik van Riel wrote: > On 07/07/2013 03:13 PM, Raymond Jennings wrote: > > Turned the comparison around for clarity of "bigger than" > > > > No semantic changes, if it still compiles it should do the same thing so > > I've omitted the testing this time. Will be happy to retest if required > > but I'm on an atom 330 and kernel rebuilds are a nightmare. > > Added CC: Andrew Morton, since this should probably go into -mm :) > > > > > > > swap: warn when a swap area overflows the maximum size > > > > It is possible to swapon a swap area that is too big for the pte width > > to handle. > > > > Presently this failure happens silently. > > > > Instead, emit a diagnostic to warn the user. > > > > Signed-off-by: Raymond Jennings > > Acked-by: Valdis Kletnieks > > Reviewed-by: Rik van Riel > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 36af6ee..5a4ce53 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct > > swap_info_struct *p, > > */ > > maxpages = swp_offset(pte_to_swp_entry( > > swp_entry_to_pte(swp_entry(0, ~0UL + 1; > > + if (swap_header->info.last_page > maxpages) { > > + printk(KERN_WARNING > > + "Truncating oversized swap area, only using %luk > > out of %luk > > \n", > > + maxpages << (PAGE_SHIFT - 10), > > + swap_header->info.last_page << (PAGE_SHIFT - > > 10)); > > + } > > if (maxpages > swap_header->info.last_page) { > > maxpages = swap_header->info.last_page + 1; > > /* p->max is an unsigned int: don't overflow it */ > > > > > > > > Testing results, root prompt commands and kernel log messages: > > > > # lvresize /dev/system/swap --size 16G > > # mkswap /dev/system/swap > > # swapon /dev/system/swap > > > > Jul 7 04:27:22 warfang kernel: Adding 16777212k swap > > on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k > > > > # lvresize /dev/system/swap --size 16G On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote: > # lvresize /dev/system/swap --size 16G Typo in the second test. The first line should read: # lvresize /dev/system/swap --size 64G First ever serious patch, got excited and burned the copypasta. > # mkswap /dev/system/swap > # swapon /dev/system/swap > > # mkswap /dev/system/swap > > # swapon /dev/system/swap > > > > Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only > > using 33554432k out of 67108860k > > Jul 7 04:27:22 warfang kernel: Adding 33554428k swap > > on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] swap: warn when a swap area overflows the maximum size
Turned the comparison around for clarity of "bigger than" No semantic changes, if it still compiles it should do the same thing so I've omitted the testing this time. Will be happy to retest if required but I'm on an atom 330 and kernel rebuilds are a nightmare. swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings Acked-by: Valdis Kletnieks diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (swap_header->info.last_page > maxpages) { + printk(KERN_WARNING + "Truncating oversized swap area, only using %luk out of %luk \n", + maxpages << (PAGE_SHIFT - 10), + swap_header->info.last_page << (PAGE_SHIFT - 10)); + } if (maxpages > swap_header->info.last_page) { maxpages = swap_header->info.last_page + 1; /* p->max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swap: warn when a swap area overflows the maximum size (resent)
...I hate you gmail... On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote: > # lvresize /dev/system/swap --size 16G Typo in the second test. The first line should read: # lvresize /dev/system/swap --size 64G First ever serious patch, got excited and burned the copypasta. > # mkswap /dev/system/swap > # swapon /dev/system/swap > > Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only > using 33554432k out of 67108860k > Jul 7 04:27:22 warfang kernel: Adding 33554428k swap > on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swap: warn when a swap area overflows the maximum size (resent)
Silly me, wrong email address On Sun, 2013-07-07 at 04:44 -0700, Raymond Jennings wrote: swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings Acked-by: Valdis Kletnieks diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (maxpages < swap_header->info.last_page) { + printk(KERN_WARNING + "Truncating oversized swap area, only using %luk out of %luk \n", + maxpages << (PAGE_SHIFT - 10), + swap_header->info.last_page << (PAGE_SHIFT - 10)); + } if (maxpages > swap_header->info.last_page) { maxpages = swap_header->info.last_page + 1; /* p->max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swap: warn when a swap area overflows the maximum size
swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings Acked-by: Valdis Kletnieks diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (maxpages < swap_header->info.last_page) { + printk(KERN_WARNING + "Truncating oversized swap area, only using %luk out of %luk \n", + maxpages << (PAGE_SHIFT - 10), + swap_header->info.last_page << (PAGE_SHIFT - 10)); + } if (maxpages > swap_header->info.last_page) { maxpages = swap_header->info.last_page + 1; /* p->max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swap: warn when a swap area overflows the maximum size
swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings shent...@gmail.com Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (maxpages swap_header-info.last_page) { + printk(KERN_WARNING + Truncating oversized swap area, only using %luk out of %luk \n, + maxpages (PAGE_SHIFT - 10), + swap_header-info.last_page (PAGE_SHIFT - 10)); + } if (maxpages swap_header-info.last_page) { maxpages = swap_header-info.last_page + 1; /* p-max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] swap: warn when a swap area overflows the maximum size (resent)
Silly me, wrong email address On Sun, 2013-07-07 at 04:44 -0700, Raymond Jennings wrote: swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings shent...@gmail.com Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (maxpages swap_header-info.last_page) { + printk(KERN_WARNING + Truncating oversized swap area, only using %luk out of %luk \n, + maxpages (PAGE_SHIFT - 10), + swap_header-info.last_page (PAGE_SHIFT - 10)); + } if (maxpages swap_header-info.last_page) { maxpages = swap_header-info.last_page + 1; /* p-max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] swap: warn when a swap area overflows the maximum size (resent)
...I hate you gmail... On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote: # lvresize /dev/system/swap --size 16G Typo in the second test. The first line should read: # lvresize /dev/system/swap --size 64G First ever serious patch, got excited and burned the copypasta. # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2] swap: warn when a swap area overflows the maximum size
Turned the comparison around for clarity of bigger than No semantic changes, if it still compiles it should do the same thing so I've omitted the testing this time. Will be happy to retest if required but I'm on an atom 330 and kernel rebuilds are a nightmare. swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings shent...@gmail.com Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (swap_header-info.last_page maxpages) { + printk(KERN_WARNING + Truncating oversized swap area, only using %luk out of %luk \n, + maxpages (PAGE_SHIFT - 10), + swap_header-info.last_page (PAGE_SHIFT - 10)); + } if (maxpages swap_header-info.last_page) { maxpages = swap_header-info.last_page + 1; /* p-max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] swap: warn when a swap area overflows the maximum size
Screwed up and didn't attach my fixed test log to the second version. See below. On Sun, 2013-07-07 at 15:31 -0400, Rik van Riel wrote: On 07/07/2013 03:13 PM, Raymond Jennings wrote: Turned the comparison around for clarity of bigger than No semantic changes, if it still compiles it should do the same thing so I've omitted the testing this time. Will be happy to retest if required but I'm on an atom 330 and kernel rebuilds are a nightmare. Added CC: Andrew Morton, since this should probably go into -mm :) swap: warn when a swap area overflows the maximum size It is possible to swapon a swap area that is too big for the pte width to handle. Presently this failure happens silently. Instead, emit a diagnostic to warn the user. Signed-off-by: Raymond Jennings shent...@gmail.com Acked-by: Valdis Kletnieks valdis.kletni...@vt.edu Reviewed-by: Rik van Riel r...@redhat.com diff --git a/mm/swapfile.c b/mm/swapfile.c index 36af6ee..5a4ce53 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1953,6 +1953,12 @@ static unsigned long read_swap_header(struct swap_info_struct *p, */ maxpages = swp_offset(pte_to_swp_entry( swp_entry_to_pte(swp_entry(0, ~0UL + 1; + if (swap_header-info.last_page maxpages) { + printk(KERN_WARNING + Truncating oversized swap area, only using %luk out of %luk \n, + maxpages (PAGE_SHIFT - 10), + swap_header-info.last_page (PAGE_SHIFT - 10)); + } if (maxpages swap_header-info.last_page) { maxpages = swap_header-info.last_page + 1; /* p-max is an unsigned int: don't overflow it */ Testing results, root prompt commands and kernel log messages: # lvresize /dev/system/swap --size 16G # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Adding 16777212k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:16777212k # lvresize /dev/system/swap --size 16G On Sun, 2013-07-07 at 04:52 -0700, Raymond Jennings wrote: # lvresize /dev/system/swap --size 16G Typo in the second test. The first line should read: # lvresize /dev/system/swap --size 64G First ever serious patch, got excited and burned the copypasta. # mkswap /dev/system/swap # swapon /dev/system/swap # mkswap /dev/system/swap # swapon /dev/system/swap Jul 7 04:27:22 warfang kernel: Truncating oversized swap area, only using 33554432k out of 67108860k Jul 7 04:27:22 warfang kernel: Adding 33554428k swap on /dev/mapper/system-swap. Priority:-1 extents:1 across:33554428k -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [problem?] swapon: swap partition/volume size capped at 32G?
On Wed, 2013-07-03 at 14:21 -0700, Raymond Jennings wrote: > Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big > 64GiB swap volume on lvm as usual. > > Suddenly, swapon doesn't recognize more than 32GiB, as top lists only > that much swap space. > > swapon using *two* separate 32GiB partitions works fine, but for some > reason a swap partition bigger than 32GiB isn't fully recognized. > > Previous kernel versions IIRC recognized the entire swap partition. > > Is something wrong or is this new behavior standard? > Hold on a minute, I just found out something ate my kernel config and turned off PAE when I upgraded. This in turn shrunk my pte's from 64 bits to 32 bits and is probably what killed >32G swap extents. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [problem?] swapon: swap partition/volume size capped at 32G?
On Wed, 2013-07-03 at 14:21 -0700, Raymond Jennings wrote: Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big 64GiB swap volume on lvm as usual. Suddenly, swapon doesn't recognize more than 32GiB, as top lists only that much swap space. swapon using *two* separate 32GiB partitions works fine, but for some reason a swap partition bigger than 32GiB isn't fully recognized. Previous kernel versions IIRC recognized the entire swap partition. Is something wrong or is this new behavior standard? Hold on a minute, I just found out something ate my kernel config and turned off PAE when I upgraded. This in turn shrunk my pte's from 64 bits to 32 bits and is probably what killed 32G swap extents. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[problem?] swapon: swap partition/volume size capped at 32G?
Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big 64GiB swap volume on lvm as usual. Suddenly, swapon doesn't recognize more than 32GiB, as top lists only that much swap space. swapon using *two* separate 32GiB partitions works fine, but for some reason a swap partition bigger than 32GiB isn't fully recognized. Previous kernel versions IIRC recognized the entire swap partition. Is something wrong or is this new behavior standard? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[problem?] swapon: swap partition/volume size capped at 32G?
Ok, so I just upgraded to 3.10.0 (gentoo system) and made a nice big 64GiB swap volume on lvm as usual. Suddenly, swapon doesn't recognize more than 32GiB, as top lists only that much swap space. swapon using *two* separate 32GiB partitions works fine, but for some reason a swap partition bigger than 32GiB isn't fully recognized. Previous kernel versions IIRC recognized the entire swap partition. Is something wrong or is this new behavior standard? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC ticketlock] Auto-queued ticketlock
On Wed, 2013-06-12 at 13:26 -0700, Linus Torvalds wrote: > On Wed, Jun 12, 2013 at 1:03 PM, Davidlohr Bueso > wrote: > > > > According to him: > > > > "the short workload calls security functions like getpwnam(), > > getpwuid(), getgrgid() a couple of times. These functions open > > the /etc/passwd or /etc/group files, read their content and close the > > files. > > Ahh, ok. So yeah, it's multiple threads all hitting the same file If that's the case and it's a bunch of reads, shouldn't they act concurrently anyway? I mean it's not like dentries are being changed or added or removed in this case. > I guess that /etc/passwd case is historically interesting, but I'm not > sure we really want to care too deeply.. > > > I did a quick attempt at this (patch attached). > > Yeah, that's wrong, although it probably approximates the dget() case > (but incorrectly). > > One of the points behind using an atomic d_count is that then dput() should do > >if (!atomic_dec_and_lock(>d_count, >d_count)) > return; > > at the very top of the function. It can avoid taking the lock entirely > if the count doesn't go down to zero, which would be a common case if > you have lots of users opening the same file. While still protecting > d_count from ever going to zero while the lock is held. > > Your > > + if (atomic_read(>d_count) > 1) { > + atomic_dec(>d_count); > + return; > + } > + spin_lock(>d_lock); > > pattern is fundamentally racy, but it's what "atomic_dec_and_lock()" > should do race-free. > > For similar reasons, I think you need to still maintain the d_lock in > d_prune_aliases etc. That's a slow-path, so the fact that we add an > atomic sequence there doesn't much matter. > > However, one optimization missing from your patch is obvious in the > profile. "dget_parent()" also needs to be optimized - you still have > that as 99% of the spin-lock case. I think we could do something like > >rcu_read_lock(); >parent = ACCESS_ONCE(dentry->d_parent); >if (atomic_inc_nonzero(>d_count)) > return parent; >.. get d_lock and do it the slow way ... >rcu_read_unlock(); > > to locklessly get the parent pointer. We know "parent" isn't going > away (dentries are rcu-free'd and we hold the rcu read lock), and I > think that we can optimistically take *any* parent dentry that > happened to be valid at one point. As long as the refcount didn't go > down to zero. Al? > > With dput and dget_parent() both being lockless for the common case, > you might get rid of the d_lock contention entirely for that load. I > dunno. And I should really think more about that dget_parent() thing a > bit more, but I cannot imagine how it could not be right (because even > with the current d_lock model, the lock is gotten *within* > dget_parent(), so the caller can never know if it gets a new or an old > parent, so there is no higher-level serialization going on - and we > might as well return *either* the new or the old as such). > > I really want Al to double-check me if we decide to try going down > this hole. But the above two fixes to your patch should at least > approximate the d_lock changes, even if I'd have to look more closely > at the other details of your patch.. > > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC ticketlock] Auto-queued ticketlock
On Wed, 2013-06-12 at 13:26 -0700, Linus Torvalds wrote: On Wed, Jun 12, 2013 at 1:03 PM, Davidlohr Bueso davidlohr.bu...@hp.com wrote: According to him: the short workload calls security functions like getpwnam(), getpwuid(), getgrgid() a couple of times. These functions open the /etc/passwd or /etc/group files, read their content and close the files. Ahh, ok. So yeah, it's multiple threads all hitting the same file If that's the case and it's a bunch of reads, shouldn't they act concurrently anyway? I mean it's not like dentries are being changed or added or removed in this case. I guess that /etc/passwd case is historically interesting, but I'm not sure we really want to care too deeply.. I did a quick attempt at this (patch attached). Yeah, that's wrong, although it probably approximates the dget() case (but incorrectly). One of the points behind using an atomic d_count is that then dput() should do if (!atomic_dec_and_lock(dentry-d_count, dentry-d_count)) return; at the very top of the function. It can avoid taking the lock entirely if the count doesn't go down to zero, which would be a common case if you have lots of users opening the same file. While still protecting d_count from ever going to zero while the lock is held. Your + if (atomic_read(dentry-d_count) 1) { + atomic_dec(dentry-d_count); + return; + } + spin_lock(dentry-d_lock); pattern is fundamentally racy, but it's what atomic_dec_and_lock() should do race-free. For similar reasons, I think you need to still maintain the d_lock in d_prune_aliases etc. That's a slow-path, so the fact that we add an atomic sequence there doesn't much matter. However, one optimization missing from your patch is obvious in the profile. dget_parent() also needs to be optimized - you still have that as 99% of the spin-lock case. I think we could do something like rcu_read_lock(); parent = ACCESS_ONCE(dentry-d_parent); if (atomic_inc_nonzero(parent-d_count)) return parent; .. get d_lock and do it the slow way ... rcu_read_unlock(); to locklessly get the parent pointer. We know parent isn't going away (dentries are rcu-free'd and we hold the rcu read lock), and I think that we can optimistically take *any* parent dentry that happened to be valid at one point. As long as the refcount didn't go down to zero. Al? With dput and dget_parent() both being lockless for the common case, you might get rid of the d_lock contention entirely for that load. I dunno. And I should really think more about that dget_parent() thing a bit more, but I cannot imagine how it could not be right (because even with the current d_lock model, the lock is gotten *within* dget_parent(), so the caller can never know if it gets a new or an old parent, so there is no higher-level serialization going on - and we might as well return *either* the new or the old as such). I really want Al to double-check me if we decide to try going down this hole. But the above two fixes to your patch should at least approximate the d_lock changes, even if I'd have to look more closely at the other details of your patch.. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Yet another pipe related oops.
On Wed, Mar 27, 2013 at 9:33 AM, Linus Torvalds wrote: > On Wed, Mar 27, 2013 at 8:20 AM, Al Viro wrote: >> >> Actually, that's my fault - check lost in patch reordering. My apologies ;-/ >> Eventually, we want that in fs/splice.c side of things (no point repeating it >> for every buffer, after all), but for now this is the obvious minimal fix. > > Applied. > > Do we actually have files with NULL f_ops pointers? Should we? What > could we possibly do with a file descriptor that doesn't have any > fops? For the sake of the curious including myself: How would such a NULL f_ops file get created in the first place? > Also, perhaps we should do something more akin to what we do for > dentry functions where we validate them on registration, and we could > fix up or validate read/write pointers, with semantics something like > > if (!fop->write) > fop->write = fop->aio_write ? do_sync_write : EINVAL_write; > if (!fop->read) > fop->read = fop->aio_read ? do_sync_read : EINVAL_read; > > kind of things? > > Not a big deal, perhaps. > > Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Yet another pipe related oops.
On Wed, Mar 27, 2013 at 9:33 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Wed, Mar 27, 2013 at 8:20 AM, Al Viro v...@zeniv.linux.org.uk wrote: Actually, that's my fault - check lost in patch reordering. My apologies ;-/ Eventually, we want that in fs/splice.c side of things (no point repeating it for every buffer, after all), but for now this is the obvious minimal fix. Applied. Do we actually have files with NULL f_ops pointers? Should we? What could we possibly do with a file descriptor that doesn't have any fops? For the sake of the curious including myself: How would such a NULL f_ops file get created in the first place? Also, perhaps we should do something more akin to what we do for dentry functions where we validate them on registration, and we could fix up or validate read/write pointers, with semantics something like if (!fop-write) fop-write = fop-aio_write ? do_sync_write : EINVAL_write; if (!fop-read) fop-read = fop-aio_read ? do_sync_read : EINVAL_read; kind of things? Not a big deal, perhaps. Linus -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] late arch/metag fixes for v3.9-rc1
On Sat, Mar 2, 2013 at 10:10 AM, Borislav Petkov wrote: > On Sat, Mar 02, 2013 at 08:28:56AM -0800, Linus Torvalds wrote: >> On Sat, Mar 2, 2013 at 2:22 AM, James Hogan wrote: >> > >> > Okay, I've rebased the arch/metag tree onto mainline to make all the >> > back-merges unnecessary and applied those simple fixes into "Build >> > infrastructure" and "Various other headers" commits (additionally >> > trivially removing ARCH_NO_VIRT_TO_BUS which is also now unnecessary). >> >> No, this is *exactly* the wrong thing to do. > > > > Hmm, so this comes up almost everytime new maintainers send stuff (and > when seasoned maintainers forget :)), maybe we should hold it down > somewhere in Documentation/ for future reference? Hear hear! Come to think of it given how often Linus has bitched about rebasing and back merging I'm surprised it's not already mentioned. > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] late arch/metag fixes for v3.9-rc1
On Sat, Mar 2, 2013 at 10:10 AM, Borislav Petkov b...@alien8.de wrote: On Sat, Mar 02, 2013 at 08:28:56AM -0800, Linus Torvalds wrote: On Sat, Mar 2, 2013 at 2:22 AM, James Hogan james.ho...@imgtec.com wrote: Okay, I've rebased the arch/metag tree onto mainline to make all the back-merges unnecessary and applied those simple fixes into Build infrastructure and Various other headers commits (additionally trivially removing ARCH_NO_VIRT_TO_BUS which is also now unnecessary). No, this is *exactly* the wrong thing to do. snip good practices and musings about maintainer trees Hmm, so this comes up almost everytime new maintainers send stuff (and when seasoned maintainers forget :)), maybe we should hold it down somewhere in Documentation/ for future reference? Hear hear! Come to think of it given how often Linus has bitched about rebasing and back merging I'm surprised it's not already mentioned. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
My two cents on this subject btw is that anything to do with Microsoft's intentions or plans is an issue of policy that belongs entirely in userspace. "mechanism, not policy" Besides, what do modules have to do with this if we're talking about UEFI? Doesn't the kernel have to be loaded before modules are even an issue? Pardon me for being lost, just tyring to follow this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] Load keys from signed PE binaries
My two cents on this subject btw is that anything to do with Microsoft's intentions or plans is an issue of policy that belongs entirely in userspace. mechanism, not policy Besides, what do modules have to do with this if we're talking about UEFI? Doesn't the kernel have to be loaded before modules are even an issue? Pardon me for being lost, just tyring to follow this. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
On Fri, Feb 15, 2013 at 6:20 PM, Al Viro wrote: > Arrgh... OK, I'm a blind idiot. These places in binfmt_elf.c currently use > force_sig(), not send_sig_info(). Currently == since 2006 when somebody > noticed the problem. Their counterparts in binfmt_elf_fdpic.c were *not* > noticed. Anyway, that definitely means we want to do it in a single commit; > the only remaining question is whether we have any problems with somebody > ptracing such execve() and then poking the sucker with ptrace(); Personally if I was ptracing another process, I'd be flummoxed if I saw it get nailed with a fatal segfault that I somehow wasn't allowed to intercept. An even bigger question might be why an execve is allowed to get into an unrecoverable state to begin with. Assuming that one builds the new mm_struct and whatnot BEFORE discarding old state, why would execve be in a position for a fatal error in the first place? > that _can_ > happen with the current mainline for ELF binaries, so this is not something > new. I'm low on coffee and about to crash, so I might be missing some > horrible problem with it, but in this case I'm fairly sure that such a problem > would be present in current mainline. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] SIGKILL vs. SIGSEGV on late execve() failures
On Fri, Feb 15, 2013 at 6:20 PM, Al Viro v...@zeniv.linux.org.uk wrote: Arrgh... OK, I'm a blind idiot. These places in binfmt_elf.c currently use force_sig(), not send_sig_info(). Currently == since 2006 when somebody noticed the problem. Their counterparts in binfmt_elf_fdpic.c were *not* noticed. Anyway, that definitely means we want to do it in a single commit; the only remaining question is whether we have any problems with somebody ptracing such execve() and then poking the sucker with ptrace(); Personally if I was ptracing another process, I'd be flummoxed if I saw it get nailed with a fatal segfault that I somehow wasn't allowed to intercept. An even bigger question might be why an execve is allowed to get into an unrecoverable state to begin with. Assuming that one builds the new mm_struct and whatnot BEFORE discarding old state, why would execve be in a position for a fatal error in the first place? that _can_ happen with the current mainline for ELF binaries, so this is not something new. I'm low on coffee and about to crash, so I might be missing some horrible problem with it, but in this case I'm fairly sure that such a problem would be present in current mainline. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drop support for x86-32
Some useless troll said: > nouveau is useless garbage as most open source graphics drivers. Coming to an open source mailing list like LKML just to bitch about open source being garbage? Come on...at least entertain us with better subtlety. I'm ready to ignore this guy, how about everyone else? *plonk* Ah, much better. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drop support for x86-32
On Thu, 2012-08-23 at 12:41 +0200, wbrana wrote: > Microsoft will drop support for x86-32 in Windows 9. > Linux could do same. > http://www.networkworld.com/community/blog/windows-9-details-are-already-emerging > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ I use an x86-32 system myself. So do many other people. Besides, it's not really your call to decide if x86-32 is obsolete. If it's anyone's call, it's for companies like AMD and Intel that actually make the chips. Microsoft doesn't make x86 chips, so their opinion on x86-32's viability is none of our concern. Similiarly, if I were a marketing director for pepsi, I wouldn't listen to anything that Coca cola has to say about what flavors of soda to make. A problem with the liquid CO2 company I buy my fizz from however WOULD get my attention. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drop support for x86-32
On Thu, 2012-08-23 at 12:41 +0200, wbrana wrote: Microsoft will drop support for x86-32 in Windows 9. Linux could do same. http://www.networkworld.com/community/blog/windows-9-details-are-already-emerging -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ I use an x86-32 system myself. So do many other people. Besides, it's not really your call to decide if x86-32 is obsolete. If it's anyone's call, it's for companies like AMD and Intel that actually make the chips. Microsoft doesn't make x86 chips, so their opinion on x86-32's viability is none of our concern. Similiarly, if I were a marketing director for pepsi, I wouldn't listen to anything that Coca cola has to say about what flavors of soda to make. A problem with the liquid CO2 company I buy my fizz from however WOULD get my attention. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Drop support for x86-32
Some useless troll said: nouveau is useless garbage as most open source graphics drivers. Coming to an open source mailing list like LKML just to bitch about open source being garbage? Come on...at least entertain us with better subtlety. I'm ready to ignore this guy, how about everyone else? *plonk* Ah, much better. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Introducing Lanyard Filesystem
On Sun, 2012-08-19 at 20:47 -0400, Theodore Ts'o wrote: > On Mon, Aug 20, 2012 at 01:06:20AM +0200, Carlos Alberto Lopez Perez wrote: > > > > > I also seriously question the niche of people who want to use a thumb > > > drive to transfer > 4GB files. Try it sometime and see what a painful > > > user experience it is > > > > Think for example on consumer devices, for example on most moderns TV > > you can plug a USB memory disk with videos and play them. > > More and more consumer devices, including TV's, are network-enabled. > I'm not at all convinced the USB memory disk model is the one which > makes sense --- you can make a much better user experience work if you > can rely on networking. That way you don't have to move USB storage > devices around, and USB storage devices are *slow* when the most > common types are HDD's and crappy flash devices. How many people are > going to drop several hundred dollars for a USB-attached SSD, when > using a networking transfer mechanism is much more convenient? > > > And I doubt that the majority of this consumer devices are able to read > > nothing more than FAT32 file-systems, so the 4GB limit is a big problem. > > And here is where Microsoft is pushing their exFAT FS since it allows > > working with 4GB+ files without the NTFS overhead. > > We'll see how popular a heavily IP-encumbered file system will be, > especially given that its main use case is for devices which are so > constrained that they can't afford to use a "real file system" (like > ntfs or ext4 or some other more sophisticated file system), but which > nevertheless needs to be able to handle 4GB+ files. My two cents: After seeing microsoft's attack on TomTom over the vfat patents I honesstly would consider it a good move to have an alternative free format available. > I'm sure there will be some use cases that might fit that niche, but > it seems pretty tiny. And this is completely ignoring what might > happen if in the future people take 1gig fiber connections to the home > (such as what many people in Kansas City will be enjoying very > shortly) for granted > > > As a side note, it would be possible to write a driver for exFAT and get > > it merged upstream on the Linux Kernel without "breaking any law"? > > Goggling I found an attempt to write such driver but seems that never > > got merged: https://lkml.org/lkml/2009/2/8/24 > > You'll need to talk to a lawyer about that, since that's fundamentally > a legal question. > > Regards, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] fs: Introducing Lanyard Filesystem
On Sun, 2012-08-19 at 20:47 -0400, Theodore Ts'o wrote: On Mon, Aug 20, 2012 at 01:06:20AM +0200, Carlos Alberto Lopez Perez wrote: I also seriously question the niche of people who want to use a thumb drive to transfer 4GB files. Try it sometime and see what a painful user experience it is Think for example on consumer devices, for example on most moderns TV you can plug a USB memory disk with videos and play them. More and more consumer devices, including TV's, are network-enabled. I'm not at all convinced the USB memory disk model is the one which makes sense --- you can make a much better user experience work if you can rely on networking. That way you don't have to move USB storage devices around, and USB storage devices are *slow* when the most common types are HDD's and crappy flash devices. How many people are going to drop several hundred dollars for a USB-attached SSD, when using a networking transfer mechanism is much more convenient? And I doubt that the majority of this consumer devices are able to read nothing more than FAT32 file-systems, so the 4GB limit is a big problem. And here is where Microsoft is pushing their exFAT FS since it allows working with 4GB+ files without the NTFS overhead. We'll see how popular a heavily IP-encumbered file system will be, especially given that its main use case is for devices which are so constrained that they can't afford to use a real file system (like ntfs or ext4 or some other more sophisticated file system), but which nevertheless needs to be able to handle 4GB+ files. My two cents: After seeing microsoft's attack on TomTom over the vfat patents I honesstly would consider it a good move to have an alternative free format available. I'm sure there will be some use cases that might fit that niche, but it seems pretty tiny. And this is completely ignoring what might happen if in the future people take 1gig fiber connections to the home (such as what many people in Kansas City will be enjoying very shortly) for granted As a side note, it would be possible to write a driver for exFAT and get it merged upstream on the Linux Kernel without breaking any law? Goggling I found an attempt to write such driver but seems that never got merged: https://lkml.org/lkml/2009/2/8/24 You'll need to talk to a lawyer about that, since that's fundamentally a legal question. Regards, - Ted -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/