RE: [PATCH] 2.6.25-rc2-mm1 - fix mcount GPL bogosity.
> "David Schwartz" <[EMAIL PROTECTED]> writes: > > I don't know who told you that or why, but it's obvious nonsense, > Correct. > > Exports should be marked GPL if and only if they cannot be used > > except in a derivative work. If it is possible to use them > > without taking > > sufficient protectable expression, they should not be marked GPL. > This isn't very obvious to me. It may not be obvious, but it is the design and purpose of marking exports GPL. > The licence doesn't talk about GPL or non-GPL exports. It doesn't > restrict the use, only distribution of the software. One is free to > remove _GPL from the code and distribute it anyway (except perhaps for > some DMCA nonsense). That's true. The DMCA doesn't prevent it, since marking symbols is *not* a license enforcement mechanism. > If a code is a derivative work it has to be distributed (use is not > restricted) under GPL, EXPORT _GPL or not _GPL. Of course. > One may say _GPL is a strong indication that all users are > automatically a derivative works, but it's only that - indication. It > doesn't mean they are really derivative works and it doesn't mean a > module not using any _GPL exports isn't a derivative. Of course. (The only people who argue otherwise are the 'linking makes a derivative work' idiots.) > I think introducing these _GPL symbols was a mistake in the first place. Perhaps, since people seem to be trying to refight the same battles again. The agreement made when the feature was added was that EXPORT_GPL was not a license enforcement mechanism but was an indication that someone believed that any use of the symbol was possible only a derivative work that would need to be distributed under the GPL. > Actually I think the _GPL exports are really harmful - somebody > distributing a binary module may claim he/she doesn't violate the GPL > because the module uses only non-GPL exports. Anyone can argue anything. That would be an obviously stupid argument. Perhaps clearer documentation might be helpful, but the GPL speaks for itself. > OTOH GPL symbols give > _us_ exactly nothing. They serve as a warning and, as a practical matter, may make it a bit more difficult to violate the license. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] 2.6.25-rc2-mm1 - fix mcount GPL bogosity.
David Schwartz [EMAIL PROTECTED] writes: I don't know who told you that or why, but it's obvious nonsense, Correct. Exports should be marked GPL if and only if they cannot be used except in a derivative work. If it is possible to use them without taking sufficient protectable expression, they should not be marked GPL. This isn't very obvious to me. It may not be obvious, but it is the design and purpose of marking exports GPL. The licence doesn't talk about GPL or non-GPL exports. It doesn't restrict the use, only distribution of the software. One is free to remove _GPL from the code and distribute it anyway (except perhaps for some DMCA nonsense). That's true. The DMCA doesn't prevent it, since marking symbols is *not* a license enforcement mechanism. If a code is a derivative work it has to be distributed (use is not restricted) under GPL, EXPORT _GPL or not _GPL. Of course. One may say _GPL is a strong indication that all users are automatically a derivative works, but it's only that - indication. It doesn't mean they are really derivative works and it doesn't mean a module not using any _GPL exports isn't a derivative. Of course. (The only people who argue otherwise are the 'linking makes a derivative work' idiots.) I think introducing these _GPL symbols was a mistake in the first place. Perhaps, since people seem to be trying to refight the same battles again. The agreement made when the feature was added was that EXPORT_GPL was not a license enforcement mechanism but was an indication that someone believed that any use of the symbol was possible only a derivative work that would need to be distributed under the GPL. Actually I think the _GPL exports are really harmful - somebody distributing a binary module may claim he/she doesn't violate the GPL because the module uses only non-GPL exports. Anyone can argue anything. That would be an obviously stupid argument. Perhaps clearer documentation might be helpful, but the GPL speaks for itself. OTOH GPL symbols give _us_ exactly nothing. They serve as a warning and, as a practical matter, may make it a bit more difficult to violate the license. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] 2.6.25-rc2-mm1 - fix mcount GPL bogosity.
> The reason I added GPL is not because of some idea that this is all > "chummy" with the kernel. But because I derived the mcount code from > glibc's version of mcount. Now you may argue that glibc is under LGPL > and a non-GPL export is fine. But I've been advised that if I ever take > code from someone else, to always export it with GPL. > > -- Steve I don't know who told you that or why, but it's obvious nonsense, as this issue shows. Exports should be marked GPL if and only if they cannot be used except in a derivative work. If it is possible to use them without taking sufficient protectable expression, they should not be marked GPL. This was what everyone agreed to when GPL exports were created. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] 2.6.25-rc2-mm1 - fix mcount GPL bogosity.
The reason I added GPL is not because of some idea that this is all chummy with the kernel. But because I derived the mcount code from glibc's version of mcount. Now you may argue that glibc is under LGPL and a non-GPL export is fine. But I've been advised that if I ever take code from someone else, to always export it with GPL. -- Steve I don't know who told you that or why, but it's obvious nonsense, as this issue shows. Exports should be marked GPL if and only if they cannot be used except in a derivative work. If it is possible to use them without taking sufficient protectable expression, they should not be marked GPL. This was what everyone agreed to when GPL exports were created. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
Marcel Holtmann wrote: > Lets phrase this in better words as Valdis pointed out: You can't > distribute an application (binary or source form) under anything else > than GPL if it uses a GPL library. This simply cannot be correct. The only way it could be true is if the work was a derivative work of a GPL'd work. There is no other way it could become subject to the GPL. So this argument reduces to -- any work that uses a library is a derivative work of that library. But this is clearly wrong. For work X to be a derivative work of work Y, it must contain substantial protected expression from work Y, but an application need not have any expression from the libraries it uses. > It makes no difference if you > distribute the GPL library with it or not. If you do not distribute the GPL library, the library is simply being used in the intended, ordinary way. You do not need to agree to, nor can you violate, the GPL simply by using a work in its ordinary intended way. If the application contains insufficient copyrightable expression from the library to be considered a derivative work (and purely functional things do not count), then it cannot be a derivative work. The library is not being copied or distributed. So how can its copyright be infringed? > But hey (again), feel free to disagree with me here. This argument has no basis in law or common sense. It's completely off-the-wall. And to Pekka Enberg: >It doesn't matter how "hard" it was to write that code. What matters >is whether your code requires enough copyrighted aspects of the >original work to constitute as derived work. There's a huge difference >between using kmalloc and spin_lock and writing a driver that is built >on to of the full USB stack of Linux kernel, for example. The legal standard is not whether it "requires" copyrighted aspects but whether it *contains* them. The driver does not contain the USB stack. The aspects of the USB stack that the driver must contain are purely functional -- its API. You simply can't have it both ways. If the driver must contain X in order to do its job, then X is functional and cannot make the driver a derivative work. You cannot protect, by copyright, every way to accomplish a particular function. Copyright only protects creative choices among millions of (at least arguably) equally good choices. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
Marcel Holtmann wrote: Lets phrase this in better words as Valdis pointed out: You can't distribute an application (binary or source form) under anything else than GPL if it uses a GPL library. This simply cannot be correct. The only way it could be true is if the work was a derivative work of a GPL'd work. There is no other way it could become subject to the GPL. So this argument reduces to -- any work that uses a library is a derivative work of that library. But this is clearly wrong. For work X to be a derivative work of work Y, it must contain substantial protected expression from work Y, but an application need not have any expression from the libraries it uses. It makes no difference if you distribute the GPL library with it or not. If you do not distribute the GPL library, the library is simply being used in the intended, ordinary way. You do not need to agree to, nor can you violate, the GPL simply by using a work in its ordinary intended way. If the application contains insufficient copyrightable expression from the library to be considered a derivative work (and purely functional things do not count), then it cannot be a derivative work. The library is not being copied or distributed. So how can its copyright be infringed? But hey (again), feel free to disagree with me here. This argument has no basis in law or common sense. It's completely off-the-wall. And to Pekka Enberg: It doesn't matter how hard it was to write that code. What matters is whether your code requires enough copyrighted aspects of the original work to constitute as derived work. There's a huge difference between using kmalloc and spin_lock and writing a driver that is built on to of the full USB stack of Linux kernel, for example. The legal standard is not whether it requires copyrighted aspects but whether it *contains* them. The driver does not contain the USB stack. The aspects of the USB stack that the driver must contain are purely functional -- its API. You simply can't have it both ways. If the driver must contain X in order to do its job, then X is functional and cannot make the driver a derivative work. You cannot protect, by copyright, every way to accomplish a particular function. Copyright only protects creative choices among millions of (at least arguably) equally good choices. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
> Don't ignore, "mere aggregation of another work not based on the Program > with the Program (or with a work based on the Program) on a volume of a > storage or distribution medium does not bring the other work under the > scope of this License." Static linking certainly makes something part > of the whole; dynamic linking doesn't. Actually, static linking does not, since the whole is not a "work". Under copyright law, a "work" can only be made by creative effort. Static linking is not creative effort, so it cannot create a work. If it were, the linker would be entitled to copyright on the new work, which makes no sense at all. An exception might exist if there were a large number of equally good ways to perform the link and the person who lined it had to creatively chose a method. But normally, anything purely dominated by functional considerations (which statically linking almost always is) is not considered sufficiently creative. If you statically link work "X" to work "Y", the result is *not* work "Z", derivative from "X" and "Y". It is parts of work "X" and parts of work "Y" mechanically combined. A group of combined works follows the license for each of the individual works from which sufficient protectable expression has been taken. A "derivative work" is a new work, and can only be formed by creative effort not in the works it is claimed to be derivative of. And to Alan Cox, who write: > First mistake: The GPL is not a contract it is a license. A license is a form of contract in which part of the compensation one party receives is rights to the intellectual property of the other party. >If the GPL was a contract it could most certainly impose conditions upon >original works. Contract law permits to write things like "If you buy the >source for this package you agree not to write a competing product for >three years even if an origina work". Sure, and those things would apply to anyone who has accepted the contract. Why do you think the GPL couldn't say those things and enforce them against anyone who had agreed to the GPL? How is agreeing to release source code any different from agreeing not to write a competing product? (Except that a court may be more likely to enforce the latter than the former, of course.) And to Marcel: > so how do you build this module that is not linked without using the > Linux kernel. Hence derivative work. Hence dynamic linking at runtime of > binary only code is violating the GPL. When there is only one way to do it, you cannot copyright that one way. You need a patent for that. So, no, it's not a derivative work because what was taken is the one way to do it, and "one way to do it" is not protectable expression. A derivative work only applies when protectable expression is taken. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
Don't ignore, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. Static linking certainly makes something part of the whole; dynamic linking doesn't. Actually, static linking does not, since the whole is not a work. Under copyright law, a work can only be made by creative effort. Static linking is not creative effort, so it cannot create a work. If it were, the linker would be entitled to copyright on the new work, which makes no sense at all. An exception might exist if there were a large number of equally good ways to perform the link and the person who lined it had to creatively chose a method. But normally, anything purely dominated by functional considerations (which statically linking almost always is) is not considered sufficiently creative. If you statically link work X to work Y, the result is *not* work Z, derivative from X and Y. It is parts of work X and parts of work Y mechanically combined. A group of combined works follows the license for each of the individual works from which sufficient protectable expression has been taken. A derivative work is a new work, and can only be formed by creative effort not in the works it is claimed to be derivative of. And to Alan Cox, who write: First mistake: The GPL is not a contract it is a license. A license is a form of contract in which part of the compensation one party receives is rights to the intellectual property of the other party. If the GPL was a contract it could most certainly impose conditions upon original works. Contract law permits to write things like If you buy the source for this package you agree not to write a competing product for three years even if an origina work. Sure, and those things would apply to anyone who has accepted the contract. Why do you think the GPL couldn't say those things and enforce them against anyone who had agreed to the GPL? How is agreeing to release source code any different from agreeing not to write a competing product? (Except that a court may be more likely to enforce the latter than the former, of course.) And to Marcel: so how do you build this module that is not linked without using the Linux kernel. Hence derivative work. Hence dynamic linking at runtime of binary only code is violating the GPL. When there is only one way to do it, you cannot copyright that one way. You need a patent for that. So, no, it's not a derivative work because what was taken is the one way to do it, and one way to do it is not protectable expression. A derivative work only applies when protectable expression is taken. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
> IANAL, but when looking at the "But when you distribute the same > sections as part of a whole which is a work based on the Program, the > distribution of the whole must be on the terms of this License" of the > GPLv2 I would still consult a lawyer before e.g. selling a laptop with a > closed-source driver loaded through ndiswrapper. If that were true, you couldn't legally install more than one program on a computer without permission from all the copyright holders without specific license permission. A "work based on the Program" is the same as a derivative work. A laptop with an assortment of different programs on it is not a work, it is a collection of works. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
Adrian Bunk wrote: > The Linux kernel is licenced under the GPLv2. > > Ndiswrapper loads and executes code with not GPLv2 compatible licences > in a way in the kernel that might be considered similar to a GPLv2'ed > userspace program dlopen() a dynamic library file with a not GPLv2 > compatible licence. > > IANAL, but I do think there might be real copyright issues with > ndiswrapper. Neither the kernel+ndiswrapper nor the non-free driver were developed with knowledge of the other, so there is simply no way one could be a derivative work of the other. Since no creative effort is required to link them together, and the linked result is not fixed in a permanent medium, a derivative work cannot be created by the linking process itself. In any event, even if it was, this is the normal use of ndiswrapper, and normal use cannot be encumbered by copyright. Otherwise, it would be unwise to color in a coloring book. So if there is a possible copyright issue, I for one can't imagine what it could be. There simply *cannot* be a copyright issue when one merely uses a work in the normal, intended and expected way. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
Adrian Bunk wrote: The Linux kernel is licenced under the GPLv2. Ndiswrapper loads and executes code with not GPLv2 compatible licences in a way in the kernel that might be considered similar to a GPLv2'ed userspace program dlopen() a dynamic library file with a not GPLv2 compatible licence. IANAL, but I do think there might be real copyright issues with ndiswrapper. Neither the kernel+ndiswrapper nor the non-free driver were developed with knowledge of the other, so there is simply no way one could be a derivative work of the other. Since no creative effort is required to link them together, and the linked result is not fixed in a permanent medium, a derivative work cannot be created by the linking process itself. In any event, even if it was, this is the normal use of ndiswrapper, and normal use cannot be encumbered by copyright. Otherwise, it would be unwise to color in a coloring book. So if there is a possible copyright issue, I for one can't imagine what it could be. There simply *cannot* be a copyright issue when one merely uses a work in the normal, intended and expected way. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] USB: mark USB drivers as being GPL only
IANAL, but when looking at the But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License of the GPLv2 I would still consult a lawyer before e.g. selling a laptop with a closed-source driver loaded through ndiswrapper. If that were true, you couldn't legally install more than one program on a computer without permission from all the copyright holders without specific license permission. A work based on the Program is the same as a derivative work. A laptop with an assortment of different programs on it is not a work, it is a collection of works. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
Combined reponses to many fragmented comments in this thread. No two consecutive excerpts are from the same person. > Interesting... I never heard about this `transferring ownership of a > single copy not involving GPL'. > > Note that some lawyers claim that at trade shows, you should not hand over > a demo device running GPLed code to any interested party, as it would be > distribution... In the United States, 17 USC 109 specifically permits this: "Notwithstanding the provisions of section 106 (3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord." > IANAL, and I don't know abou the laws in other countries, but at least > in Germany modifications of a copyrighted work require the permission of > the copyright holder. Ah, so coloring books are illegal in Germany? Or it's just illegal to color them in? Or you need a special license to do so? > IANAL, but I have serious doubts whether putting some glue layer between > the GPL'ed code and the code with a not GPL compatible licence is really > a legally effictive way of circumventing the GPL. The GPL has no power to control works that are neither GPL nor derived from GPL works. There is no need to circumvent situations the GPL has no business applying to. This is a use of the GPL'd code. It's not a distribution and it's not a creative combination. It is, and should be, outside the GPL's scope. > Read the paragraph starting with "These requirements apply to the > modified work as a whole." of the GPLv2. There is no "modified work as a whole" in this case. A machine combination of two or more works produces those two or more works, not a work. Otherwise, the linker itself would be entitled to copyright on the new work, which is nonsense. For copyright purposes, a work can only be created by creative effort. There is no creative effort in linking the kernel, ndiswrapper, and a Windows driver, so no "modified work as a whole" is created. A linker cannot create a work because it is incapable of creative effort. If it cannot create a work, it cannot create a derivative work. There is no "modified work as a whole". Section 2 of the GPL is about creative modifications that form a "work based on the Program". Only a human can do that. GPL section 2 actually makes that fairly clear: "These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it." Note that it is only when you distribute the "same sections" as part of a "whole which is a work based on the Program". So these requirements only apply when someone creates a single work. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
> I wouldn't quite say that. I wasn't going to comment, but...personally, > I actually disagree with the assertions that ndiswrapper isn't causing > proprietary code to link against GPL functions in the kernel (how is > an NDIS implementation any different than a shim layer provided to > load a graphics driver?), but I wasn't trying to make that point. By that logic, the kernel should always be tainted since it could potentially always be linked to non-GPL code. The ndiswrapper code is just like the kernel. It is GPL, but it could be linked to non-free code. Any reason why ndiswrapper should be tainted would equally well argue that any kernel with module-loading capability should be tainted. Somebody might load a non-free module. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
I wouldn't quite say that. I wasn't going to comment, but...personally, I actually disagree with the assertions that ndiswrapper isn't causing proprietary code to link against GPL functions in the kernel (how is an NDIS implementation any different than a shim layer provided to load a graphics driver?), but I wasn't trying to make that point. By that logic, the kernel should always be tainted since it could potentially always be linked to non-GPL code. The ndiswrapper code is just like the kernel. It is GPL, but it could be linked to non-free code. Any reason why ndiswrapper should be tainted would equally well argue that any kernel with module-loading capability should be tainted. Somebody might load a non-free module. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: ndiswrapper and GPL-only symbols redux
Combined reponses to many fragmented comments in this thread. No two consecutive excerpts are from the same person. Interesting... I never heard about this `transferring ownership of a single copy not involving GPL'. Note that some lawyers claim that at trade shows, you should not hand over a demo device running GPLed code to any interested party, as it would be distribution... In the United States, 17 USC 109 specifically permits this: Notwithstanding the provisions of section 106 (3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord. IANAL, and I don't know abou the laws in other countries, but at least in Germany modifications of a copyrighted work require the permission of the copyright holder. Ah, so coloring books are illegal in Germany? Or it's just illegal to color them in? Or you need a special license to do so? IANAL, but I have serious doubts whether putting some glue layer between the GPL'ed code and the code with a not GPL compatible licence is really a legally effictive way of circumventing the GPL. The GPL has no power to control works that are neither GPL nor derived from GPL works. There is no need to circumvent situations the GPL has no business applying to. This is a use of the GPL'd code. It's not a distribution and it's not a creative combination. It is, and should be, outside the GPL's scope. Read the paragraph starting with These requirements apply to the modified work as a whole. of the GPLv2. There is no modified work as a whole in this case. A machine combination of two or more works produces those two or more works, not a work. Otherwise, the linker itself would be entitled to copyright on the new work, which is nonsense. For copyright purposes, a work can only be created by creative effort. There is no creative effort in linking the kernel, ndiswrapper, and a Windows driver, so no modified work as a whole is created. A linker cannot create a work because it is incapable of creative effort. If it cannot create a work, it cannot create a derivative work. There is no modified work as a whole. Section 2 of the GPL is about creative modifications that form a work based on the Program. Only a human can do that. GPL section 2 actually makes that fairly clear: These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Note that it is only when you distribute the same sections as part of a whole which is a work based on the Program. So these requirements only apply when someone creates a single work. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
> On Thu, 17 Jan 2008, David Schwartz wrote: > > Nonsense. The 'kfree' function *destroys* the object pointer to by the > > pointer. How can you describe that as not doing anything to the object? > > Here's an idea. Think it through. > > Why don't we need write permissions to a file to unlink it? You cannot unlink a file. Given a file, if you were to attempt to unlink it, what directory would you remove it from? Unlinking a file is an operation on the directory the file is in, not on the directory itself. We do need write permissions to the directory. If you had only a const pointer to the data in the file, you should definitely not be able to use that pointer to find a directory the file is in and remove it without clearly indicating you know *exactly* what you're doing. Given just that 'const' pointer, you're not supposed to be modifying the data and certainly using that pointer to modify anything logically above it. > Here's a hint: because unlinking doesn't *write* to it. In fact, it > doesn't read from it either. It doesn't do any access at all to that > object, it just *removes* it. Right. It's an operation on the directory the file is in that might have consequences for the file. > Is the file gone after you unlink it? Yes (modulo refcounting for > aliasing > "pointers" aka filenames, but that's the same for any memory manager - > malloc/free just doesn't have any, so you could think of it as a > non-hardlinking filesystem). The file is gone if and only if the directory was the only thing that needed the file to exist. A file that is only on one directory "belongs to" that directory. So write permission to the directory is all that is needed. What you are arguing is essentially that you should be able to remove a file from any directory it is in just because you have write access to the file's data. > So you're the one who are speaking nonsense. Making something "not exist" > is not at all the same thing as accessing it for a write (or a read). It > is a metadata operation that doesn't conceptually change the data in any > way, shape or form - it just makes it go away. Making something "not exist" is a modification operation on that thing. > And btw, exactly as with kfree(), a unlink() may well do something like > "disk scrubbing" for security purposes, or cancel pending writes to the > backing store. But even though it may write (or, by undoing a pending > write, effectively "change the state") to the disk sectors that used to > contain the file data, ONLY AN IDIOT would call it "writing to the file". > Because "the file" is gone. Writing to the place where the file > used to be > is a different thing. I agree with you about that part. I can't understand why you keep thinking this is where our disagreement lies when I've stated at least three times that I agree about this. The issue has nothing to do with whether or not 'kfree' modifies the particular bytes pointed to. It has to do with whether or not 'kfree' is the kind of operation one would normally want to allow on a 'const' object. > So give it up. You're wrong. Freeing a memory area is not "writing to it" > or accessing it in *any* manner, it's an operation on another level > entirely. Nevertheless, it's a modification operation on an object that's not supposed to be modified. By the way, I did think of one argument that supports your position: Suppose you have a reference counted object. You have a 'release reference and free if zero' function. Should it be 'const'? If not, how can a 'lookup and reference for read' function return a const pointer to the object? However, on balance, I think a 'release reference and free if zero' function that operates on a const pointer is sufficiently unusual that a cast to show you know what you're doing is not a bad thing. In this case, you know it's safe to destroy the object through a const pointer because you *know* nobody gave you the 'const' pointer trusting you not to destroy the object. A cast to show you have that special knowledge is, IMO, reasonable. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
On Thu, 17 Jan 2008, David Schwartz wrote: Nonsense. The 'kfree' function *destroys* the object pointer to by the pointer. How can you describe that as not doing anything to the object? Here's an idea. Think it through. Why don't we need write permissions to a file to unlink it? You cannot unlink a file. Given a file, if you were to attempt to unlink it, what directory would you remove it from? Unlinking a file is an operation on the directory the file is in, not on the directory itself. We do need write permissions to the directory. If you had only a const pointer to the data in the file, you should definitely not be able to use that pointer to find a directory the file is in and remove it without clearly indicating you know *exactly* what you're doing. Given just that 'const' pointer, you're not supposed to be modifying the data and certainly using that pointer to modify anything logically above it. Here's a hint: because unlinking doesn't *write* to it. In fact, it doesn't read from it either. It doesn't do any access at all to that object, it just *removes* it. Right. It's an operation on the directory the file is in that might have consequences for the file. Is the file gone after you unlink it? Yes (modulo refcounting for aliasing pointers aka filenames, but that's the same for any memory manager - malloc/free just doesn't have any, so you could think of it as a non-hardlinking filesystem). The file is gone if and only if the directory was the only thing that needed the file to exist. A file that is only on one directory belongs to that directory. So write permission to the directory is all that is needed. What you are arguing is essentially that you should be able to remove a file from any directory it is in just because you have write access to the file's data. So you're the one who are speaking nonsense. Making something not exist is not at all the same thing as accessing it for a write (or a read). It is a metadata operation that doesn't conceptually change the data in any way, shape or form - it just makes it go away. Making something not exist is a modification operation on that thing. And btw, exactly as with kfree(), a unlink() may well do something like disk scrubbing for security purposes, or cancel pending writes to the backing store. But even though it may write (or, by undoing a pending write, effectively change the state) to the disk sectors that used to contain the file data, ONLY AN IDIOT would call it writing to the file. Because the file is gone. Writing to the place where the file used to be is a different thing. I agree with you about that part. I can't understand why you keep thinking this is where our disagreement lies when I've stated at least three times that I agree about this. The issue has nothing to do with whether or not 'kfree' modifies the particular bytes pointed to. It has to do with whether or not 'kfree' is the kind of operation one would normally want to allow on a 'const' object. So give it up. You're wrong. Freeing a memory area is not writing to it or accessing it in *any* manner, it's an operation on another level entirely. Nevertheless, it's a modification operation on an object that's not supposed to be modified. By the way, I did think of one argument that supports your position: Suppose you have a reference counted object. You have a 'release reference and free if zero' function. Should it be 'const'? If not, how can a 'lookup and reference for read' function return a const pointer to the object? However, on balance, I think a 'release reference and free if zero' function that operates on a const pointer is sufficiently unusual that a cast to show you know what you're doing is not a bad thing. In this case, you know it's safe to destroy the object through a const pointer because you *know* nobody gave you the 'const' pointer trusting you not to destroy the object. A cast to show you have that special knowledge is, IMO, reasonable. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
> On Thu, 17 Jan 2008, David Schwartz wrote: > > > "const" has nothing to do with "logical state". It has one > > > meaning, and > > > one meaning only: the compiler should complain if that > > > particular type is > > > used to do a write access. > > > > Right, exactly. > So why do you complain? > > kfree() literally doesn't write to the object. Because the object ceases to exist. However, any modification requires write access, whether or not that modification is a write. > > You are the only one who has suggested it has anything to do > > with changes > > through other pointers or in other ways. So you are arguing against only > > yourself here. > No, I'm saying that "const" has absolutely *zero* meaning on writes to an > object through _other_ pointers (or direct access) to the object. Nobody disagrees with that. > And you're seemingly not understanding that *lack* of meaning. No, I understand that. > kfree() doesn't do *squat* to the object pointed to by the pointer it is > passed. It only uses it to look up its own data structures, of which the > pointer is but a small detail. > And those other data structures aren't constant. Nonsense. The 'kfree' function *destroys* the object pointer to by the pointer. How can you describe that as not doing anything to the object? > > Nobody has said it has anything to do with anything but > > operations through > > that pointer. > .. and I'm telling you: kfree() does *nothing* conceptually through that > pointer. No writes, and not even any reads! Which is exactly why it's > const. It destroys the object the pointer points to. Destroying an object requires write access to it. > The only thing kfree does through that pointer is to update its own > concept of what memory it has free. That is not what it does, that is how it does it. What it does is destroy the object. > Now, what it does to its own free memory is just an > implementation detail, > and has nothing what-so-ever to do with the pointer you passed it. I agree, except that it destroys the object the pointer points to. > See? I now have a much better understanding of what you're saying, but I still think it's nonsense. 1) An operation that modifies the logical state of an object should not normally be done through a 'const' pointer. The reason you make a pointer 'const' is to indicate that this pointer should not be used to change the logical state of the object pointed to. 2) The 'kfree' operation changes the logical state of the object pointed to, as the object goes from existent to non-existent. 3) It is most useful for 'kfree' to be non-const because destroying an object through a const pointer can easily be done in error. One of the reasons you provide a const pointer is because you need the function you pass the pointer to not to modify the object. Since this is an unusual operation that could be an error, it is logical to force the person doing it to clearly indicate that he knows the pointer is const and that he knows it is right anyway. I'm curious to hear how some other people on this feel. You are the first competent coder I have *ever* heard make this argument. By the way, I disagree with your metadata versus data argument. I would agree that a function that changes only an object's metadata could be done through a const pointer without needed a cast. A good example would be a function that updates a "last time this object was read" variable. However, *destroying* an object is not a metadata operation -- it destroys the data as well. This is kind of a philosophical point, but an object does not have a "does this object exist" piece of metadata. If an object does not exist, it has no data. So destroying an object destroys the data and is thus a write/modification operation on the data. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
> On Thu, 17 Jan 2008, David Schwartz wrote: > > No, that's not what it means. It has nothing to do with memory. > > It has to do > > with logical state. > Blah. That's just your own made-up explanation of what you think "const" > should mean. It has no logical background or any basis in the C language. To some extent, I agree. You can use "const" for pretty much any reason. It's just a way to say that you have a pointer and you would like an error if certain things are done with it. You could use it to mean anything you want it to mean. The most common use, and the one intended, is to indicate that an object's logical state will not be changed through that pointer. > "const" has nothing to do with "logical state". It has one meaning, and > one meaning only: the compiler should complain if that particular type is > used to do a write access. Right, exactly. > It says nothing at all about the "logical state of the object". > It cannot, > since a single object can - and does - have multiple pointers to it. You are the only one who has suggested it has anything to do with changes through other pointers or in other ways. So you are arguing against only yourself here. Nobody has said it has anything to do with anything but operations through that pointer. > So your standpoint not only has no relevant background to it, it's also > not even logically consistent. Actually, that is true of your position. On the one hand, you defend it because kfree does not change the data. On the other hand, you claim that it has nothing to do with whether or not the data is changed. The normal use of "const" is to indicate that the logical state of the object should not be changed through that pointer. The 'kfree' function changes the logical state of the object. So, logically, 'kfree' should not be const. The usefulness of "const" is that you get an error if you unexpectedly modify something you weren't expected to modify. If you are 'kfree'ing an object that is supposed to be logically immutable, you should be made to indicate that you are aware the object is logically immutable. Simply put, you you have to cast in any case where you mean to do something that you want to get an error in if you do not cast. I would like to get an error if I call 'kfree' through a const pointer, because that often is an error. I may have a const pointer because my caller still plans to use the object. Honestly, I find your position bizarre. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
> On Thu, 17 Jan 2008, David Schwartz wrote: > > Which does change the thing the pointer points to. It goes from being a > > valid object to not existing. > No. It's the *pointer* that is no longer valid. The pointer is no longer valid because the object it pointed to no longer exists. The pointer is also no longer valid, but that is not the end of the story. > There's definitely a difference between "exists and is changed" and > "doesn't exist any more". > > How is ceasing to exist not a change? > It's not a change to the data behind it, it's a change to the *metadata*. > Which is somethign that "const" doesn't talk about at all. It doesn't matter what has changed. All that matters is whether this is something we normally want to happen to a const pointer or whether doing this to a const pointer is not normal. > > >Why? Because we want the types to be as tight as possible, > > > and normal > > >code should need as few casts as possible. > > > > Right, and that's why you are wrong. > > No, it's why I'm right. > "kmalloc/kfree" (or any memory manager) by definition has to play games > with pointers and do things like cast them. But the users shouldn't need > to, not for something like this. If you don't like having to cast, don't use 'const'. But if you use 'const', you have to cast when you mean to do something that you would like to be warned about if you do it by accident. > > No, it's both correct and useful. This code is the exception to > a rule. The > > rule is that the object remain unchanged and this violates that rule. > No. > > You are continuing to make the mistake that you think that "const" means > that the memory behind the pointer is not going to change. No, that's not what it means. It has nothing to do with memory. It has to do with logical state. > Why do you make that mistake, when it is PROVABLY NOT TRUE! I don't. You do, because you argue 'kfree' can be const because it doesn't change the memory. The change in the memory is meaningless, the change in the logical state of the object is what matters. > Try this trivial program: > > int main(int argc, char **argv) > { > int i; > const int *c; > > i = 5; > c = > i = 10; > return *c; > } > > and realize that according to the C rules, if it returns anything but 10, > the compiler is *buggy*. > > The fact is, that in spite of us having a "const int *", the data behind > that pointer may change. I don't know what you think this example proves. Nobody is arguing that so long as one const pointer to an object exists, no code anywhere should ever be able to change it. All I'm saying is that changing the logical state of an object *through* a const pointer is unusual. You should need a cast to do this because that's the only way to get a warning if you do it by mistake. > So it doesn't matter ONE WHIT if you pass in a "const *" to "kfree()": it > does not guarantee that the data doesn't change, because the object you > point to has other pointers pointing to it. Right, nobody said this was about guaranteeing that data doesn't change. > This isn't worth discussing. It's really simple: a conforming program > CANNOT POSSIBLY TELL whether "kfree()" modified the data or not. But that's exactly what doesn't matter. As you've said at least twice now, it has nothing to do with changing the data. It has to do with changing the logical state of the object. That's what you're not supposed to do through a 'const' pointer. > As such, > AS FAR AS THE PROGRAM IS CONCERNED, kfree() takes a const > pointer, and the > rule that "if it can be considered const, it should be marked > const" comes > and says that kfree() should take a const pointer. That's crazy. > In other words - anythign that could ever disagree with "const *" is BY > DEFINITION buggy. > > It really is that simple. I think you may be the only person in the world who thinks so. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
On Thu, 17 Jan 2008, David Schwartz wrote: Which does change the thing the pointer points to. It goes from being a valid object to not existing. No. It's the *pointer* that is no longer valid. The pointer is no longer valid because the object it pointed to no longer exists. The pointer is also no longer valid, but that is not the end of the story. There's definitely a difference between exists and is changed and doesn't exist any more. How is ceasing to exist not a change? It's not a change to the data behind it, it's a change to the *metadata*. Which is somethign that const doesn't talk about at all. It doesn't matter what has changed. All that matters is whether this is something we normally want to happen to a const pointer or whether doing this to a const pointer is not normal. Why? Because we want the types to be as tight as possible, and normal code should need as few casts as possible. Right, and that's why you are wrong. No, it's why I'm right. kmalloc/kfree (or any memory manager) by definition has to play games with pointers and do things like cast them. But the users shouldn't need to, not for something like this. If you don't like having to cast, don't use 'const'. But if you use 'const', you have to cast when you mean to do something that you would like to be warned about if you do it by accident. No, it's both correct and useful. This code is the exception to a rule. The rule is that the object remain unchanged and this violates that rule. No. You are continuing to make the mistake that you think that const means that the memory behind the pointer is not going to change. No, that's not what it means. It has nothing to do with memory. It has to do with logical state. Why do you make that mistake, when it is PROVABLY NOT TRUE! I don't. You do, because you argue 'kfree' can be const because it doesn't change the memory. The change in the memory is meaningless, the change in the logical state of the object is what matters. Try this trivial program: int main(int argc, char **argv) { int i; const int *c; i = 5; c = i; i = 10; return *c; } and realize that according to the C rules, if it returns anything but 10, the compiler is *buggy*. The fact is, that in spite of us having a const int *, the data behind that pointer may change. I don't know what you think this example proves. Nobody is arguing that so long as one const pointer to an object exists, no code anywhere should ever be able to change it. All I'm saying is that changing the logical state of an object *through* a const pointer is unusual. You should need a cast to do this because that's the only way to get a warning if you do it by mistake. So it doesn't matter ONE WHIT if you pass in a const * to kfree(): it does not guarantee that the data doesn't change, because the object you point to has other pointers pointing to it. Right, nobody said this was about guaranteeing that data doesn't change. This isn't worth discussing. It's really simple: a conforming program CANNOT POSSIBLY TELL whether kfree() modified the data or not. But that's exactly what doesn't matter. As you've said at least twice now, it has nothing to do with changing the data. It has to do with changing the logical state of the object. That's what you're not supposed to do through a 'const' pointer. As such, AS FAR AS THE PROGRAM IS CONCERNED, kfree() takes a const pointer, and the rule that if it can be considered const, it should be marked const comes and says that kfree() should take a const pointer. That's crazy. In other words - anythign that could ever disagree with const * is BY DEFINITION buggy. It really is that simple. I think you may be the only person in the world who thinks so. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
On Thu, 17 Jan 2008, David Schwartz wrote: No, that's not what it means. It has nothing to do with memory. It has to do with logical state. Blah. That's just your own made-up explanation of what you think const should mean. It has no logical background or any basis in the C language. To some extent, I agree. You can use const for pretty much any reason. It's just a way to say that you have a pointer and you would like an error if certain things are done with it. You could use it to mean anything you want it to mean. The most common use, and the one intended, is to indicate that an object's logical state will not be changed through that pointer. const has nothing to do with logical state. It has one meaning, and one meaning only: the compiler should complain if that particular type is used to do a write access. Right, exactly. It says nothing at all about the logical state of the object. It cannot, since a single object can - and does - have multiple pointers to it. You are the only one who has suggested it has anything to do with changes through other pointers or in other ways. So you are arguing against only yourself here. Nobody has said it has anything to do with anything but operations through that pointer. So your standpoint not only has no relevant background to it, it's also not even logically consistent. Actually, that is true of your position. On the one hand, you defend it because kfree does not change the data. On the other hand, you claim that it has nothing to do with whether or not the data is changed. The normal use of const is to indicate that the logical state of the object should not be changed through that pointer. The 'kfree' function changes the logical state of the object. So, logically, 'kfree' should not be const. The usefulness of const is that you get an error if you unexpectedly modify something you weren't expected to modify. If you are 'kfree'ing an object that is supposed to be logically immutable, you should be made to indicate that you are aware the object is logically immutable. Simply put, you you have to cast in any case where you mean to do something that you want to get an error in if you do not cast. I would like to get an error if I call 'kfree' through a const pointer, because that often is an error. I may have a const pointer because my caller still plans to use the object. Honestly, I find your position bizarre. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why is the kfree() argument const?
On Thu, 17 Jan 2008, David Schwartz wrote: const has nothing to do with logical state. It has one meaning, and one meaning only: the compiler should complain if that particular type is used to do a write access. Right, exactly. So why do you complain? kfree() literally doesn't write to the object. Because the object ceases to exist. However, any modification requires write access, whether or not that modification is a write. You are the only one who has suggested it has anything to do with changes through other pointers or in other ways. So you are arguing against only yourself here. No, I'm saying that const has absolutely *zero* meaning on writes to an object through _other_ pointers (or direct access) to the object. Nobody disagrees with that. And you're seemingly not understanding that *lack* of meaning. No, I understand that. kfree() doesn't do *squat* to the object pointed to by the pointer it is passed. It only uses it to look up its own data structures, of which the pointer is but a small detail. And those other data structures aren't constant. Nonsense. The 'kfree' function *destroys* the object pointer to by the pointer. How can you describe that as not doing anything to the object? Nobody has said it has anything to do with anything but operations through that pointer. .. and I'm telling you: kfree() does *nothing* conceptually through that pointer. No writes, and not even any reads! Which is exactly why it's const. It destroys the object the pointer points to. Destroying an object requires write access to it. The only thing kfree does through that pointer is to update its own concept of what memory it has free. That is not what it does, that is how it does it. What it does is destroy the object. Now, what it does to its own free memory is just an implementation detail, and has nothing what-so-ever to do with the pointer you passed it. I agree, except that it destroys the object the pointer points to. See? I now have a much better understanding of what you're saying, but I still think it's nonsense. 1) An operation that modifies the logical state of an object should not normally be done through a 'const' pointer. The reason you make a pointer 'const' is to indicate that this pointer should not be used to change the logical state of the object pointed to. 2) The 'kfree' operation changes the logical state of the object pointed to, as the object goes from existent to non-existent. 3) It is most useful for 'kfree' to be non-const because destroying an object through a const pointer can easily be done in error. One of the reasons you provide a const pointer is because you need the function you pass the pointer to not to modify the object. Since this is an unusual operation that could be an error, it is logical to force the person doing it to clearly indicate that he knows the pointer is const and that he knows it is right anyway. I'm curious to hear how some other people on this feel. You are the first competent coder I have *ever* heard make this argument. By the way, I disagree with your metadata versus data argument. I would agree that a function that changes only an object's metadata could be done through a const pointer without needed a cast. A good example would be a function that updates a last time this object was read variable. However, *destroying* an object is not a metadata operation -- it destroys the data as well. This is kind of a philosophical point, but an object does not have a does this object exist piece of metadata. If an object does not exist, it has no data. So destroying an object destroys the data and is thus a write/modification operation on the data. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Trailing periods in kernel messages
Jan Engelhardt wrote: > On Dec 21 2007 17:56, Herbert Xu wrote: > >> > >> I do not believe "opinions" are relevant here. Relevant would be cites > >> from respected style guides (Fowlers, Oxford Guide To Style et al.) to > >> show they do not need a full stop. > >> > >> I've not found one, but I am open to references. > > > >Well from where I come from, full stops are only used for complete > >sentences. > >[...] > >As to what is a complete sentence, that is debatable. However, > >typically it would include a subject and a predicate. By this > >rule the following line is not a complete sentence: > > > > [XFS] Initialise current offset in xfs_file_readdir correctly > > > >The reason is that it lacks a subject. > > "current offset" is your subject. I hate to have to point this out, but "current offset" is the object, not the subject. If the sentence was, "I have initialized the current offset in xfs_file_readdir correctly.", then it would be quite clear that "I" is the subject and "the current offset" is the object. The log entry has an implied subject of "I" or, if you prefer, "the kernel". It is not a complete sentence both because it implies the subject in a context where English does not permit that and it lacks words required by grammar (such as the "the" before "current offset"). It also lacks a helping verb since it should be "have initialized" (or perhaps "initialized"). Sometime you can imply the subject, such as in, "Go home!". This is not one of those cases. You cannot say "Am sleepy" to mean "I am sleepy", even though it would seem perfectly reasonable to allow an implied subject, English doesn't. There is no reason log entries should be complete sentences. If you look at a typical log, the complete sentences generally look worse than the fragments. For example: CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. and EXT3 FS on hdc7, internal journal EXT3-fs: mounted filesystem with ordered data mode. And why the inconsistency in the beginning in both these examples? Personally, I think a mix of sentences and statements is just fine. Sentences should end with a period when it looks worse not to. The following extracts from my log looks perfect to me: Switched to high resolution mode on CPU 0 lp: driver loaded but no devices found Real Time Clock Driver v1.12ac Linux agpgart interface v0.102 agpgart: Detected VIA Apollo Pro 133 chipset agpgart: AGP aperture is 4M @ 0xfe00 Entries that look imperfect to me include: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Detected 1004.544 MHz processor. ENABLING IO-APIC IRQs EXT3-fs: INFO: recovery required on readonly filesystem. Time: tsc clocksource has been installed. The last one just looks wrong, even though it is a complete sentence. Perhaps changing 'tsc' to 'TSC' will help or just saying "using TSC" or "TSC enabled" would help. Inconsistencies include: PCI: VIA PCI bridge detected. Disabling DAC. PCI: Enabling Via external APIC routing pci :00:04.2: uhci_check_and_reset_hc: legsup = 0x2000 pci :00:04.2: Performing full reset and TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered and PCI: Bridge: :00:01.0 IO window: disabled. MEM window: f800-fddf More important than any hard and fast rules is just how it looks. Also important is how it looks in context. For example, with the upper case and lower case 'pci', either way is fine, but some of each doesn't look good. Same for 'TCP'. Why does one message have a colon and not the others? DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Trailing periods in kernel messages
Jan Engelhardt wrote: On Dec 21 2007 17:56, Herbert Xu wrote: I do not believe opinions are relevant here. Relevant would be cites from respected style guides (Fowlers, Oxford Guide To Style et al.) to show they do not need a full stop. I've not found one, but I am open to references. Well from where I come from, full stops are only used for complete sentences. [...] As to what is a complete sentence, that is debatable. However, typically it would include a subject and a predicate. By this rule the following line is not a complete sentence: [XFS] Initialise current offset in xfs_file_readdir correctly The reason is that it lacks a subject. current offset is your subject. I hate to have to point this out, but current offset is the object, not the subject. If the sentence was, I have initialized the current offset in xfs_file_readdir correctly., then it would be quite clear that I is the subject and the current offset is the object. The log entry has an implied subject of I or, if you prefer, the kernel. It is not a complete sentence both because it implies the subject in a context where English does not permit that and it lacks words required by grammar (such as the the before current offset). It also lacks a helping verb since it should be have initialized (or perhaps initialized). Sometime you can imply the subject, such as in, Go home!. This is not one of those cases. You cannot say Am sleepy to mean I am sleepy, even though it would seem perfectly reasonable to allow an implied subject, English doesn't. There is no reason log entries should be complete sentences. If you look at a typical log, the complete sentences generally look worse than the fragments. For example: CPU: L1 I cache: 16K, L1 D cache: 16K CPU: L2 cache: 256K CPU serial number disabled. and EXT3 FS on hdc7, internal journal EXT3-fs: mounted filesystem with ordered data mode. And why the inconsistency in the beginning in both these examples? Personally, I think a mix of sentences and statements is just fine. Sentences should end with a period when it looks worse not to. The following extracts from my log looks perfect to me: Switched to high resolution mode on CPU 0 lp: driver loaded but no devices found Real Time Clock Driver v1.12ac Linux agpgart interface v0.102 agpgart: Detected VIA Apollo Pro 133 chipset agpgart: AGP aperture is 4M @ 0xfe00 Entries that look imperfect to me include: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level) Detected 1004.544 MHz processor. ENABLING IO-APIC IRQs EXT3-fs: INFO: recovery required on readonly filesystem. Time: tsc clocksource has been installed. The last one just looks wrong, even though it is a complete sentence. Perhaps changing 'tsc' to 'TSC' will help or just saying using TSC or TSC enabled would help. Inconsistencies include: PCI: VIA PCI bridge detected. Disabling DAC. PCI: Enabling Via external APIC routing pci :00:04.2: uhci_check_and_reset_hc: legsup = 0x2000 pci :00:04.2: Performing full reset and TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered and PCI: Bridge: :00:01.0 IO window: disabled. MEM window: f800-fddf More important than any hard and fast rules is just how it looks. Also important is how it looks in context. For example, with the upper case and lower case 'pci', either way is fine, but some of each doesn't look good. Same for 'TCP'. Why does one message have a colon and not the others? DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /dev/urandom uses uninit bytes, leaks user data
> Has anyone *proven* that using uninitialized data this way is safe? You can probably find dozens of things in the Linux kernel that have not been proven to be safe. That means nothing. > As > a *user* of this stuff, I'm *very* hesitant to trust Linux's RNG when I > hear things like this. (Hint: most protocols are considered insecure > until proven otherwise, not the other way around.) There's no reason whatsoever to think this is unsafe. First, you can't access the pool directly. Second, even if you could, it's mixed in securely. > Now imagine a security program. It runs some forward secret protocol > and it's very safe not to leak data that would break forward secrecy > (mlockall, memset when done with stuff, etc). It runs on a freshly > booted machine (no DSA involved, so we're not automatically hosed), so > an attacker knows the initial pool state. Conveniently, some *secret* > (say an ephemeral key, or, worse, a password) gets mixed in to the pool. > There are apparently at most three bytes of extra data mixed in, but > suppose the attacker knows add the words that were supposed to get mixed > in. Now the program clears all its state to "ensure" forward secrecy, > and *then* the machine gets hacked. Now the attacker can learn (with at > most 2^24 guesses worth of computation) 24 bits worth of a secret, which > could quite easily reduce the work involved in breaking whatever forward > secret protocol was involved from intractable to somewhat easy. Or it > could leak three bytes of password. Or whatever. This is no more precise than "imagine there's some vulnerability in the RNG". Yes, if there's a vulnerability, then we're vulnerable. An attacker can always (at least in principle) get the pool out of the kernel. The RNG's design is premised on the notion that it is computationally infeasbile to get the input entropy out of the pool. If an attacker can watch data going into the pool, he needn't get it out of the pool. > Sorry for the somewhat inflammatory email, but this is absurd. I agree. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /dev/urandom uses uninit bytes, leaks user data
> The bottom line: At a cost of at most three unpredictable branches > (whether to clear the bytes in the last word with indices congruent > to 1, 2, or 3 modulo 4), then the code can reduce the risk from something > small but positive, to zero. This is very inexpensive insurance. > John Reiser, [EMAIL PROTECTED] Even if you're right, the change isn't free. You've simply presented evidence of one non-zero benefit of it. You've given no ability to assess the size of this benefit and no way to figure if it exceeds the cost. There is also a non-zero *security* cost to this change. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /dev/urandom uses uninit bytes, leaks user data
The bottom line: At a cost of at most three unpredictable branches (whether to clear the bytes in the last word with indices congruent to 1, 2, or 3 modulo 4), then the code can reduce the risk from something small but positive, to zero. This is very inexpensive insurance. John Reiser, [EMAIL PROTECTED] Even if you're right, the change isn't free. You've simply presented evidence of one non-zero benefit of it. You've given no ability to assess the size of this benefit and no way to figure if it exceeds the cost. There is also a non-zero *security* cost to this change. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: /dev/urandom uses uninit bytes, leaks user data
Has anyone *proven* that using uninitialized data this way is safe? You can probably find dozens of things in the Linux kernel that have not been proven to be safe. That means nothing. As a *user* of this stuff, I'm *very* hesitant to trust Linux's RNG when I hear things like this. (Hint: most protocols are considered insecure until proven otherwise, not the other way around.) There's no reason whatsoever to think this is unsafe. First, you can't access the pool directly. Second, even if you could, it's mixed in securely. Now imagine a security program. It runs some forward secret protocol and it's very safe not to leak data that would break forward secrecy (mlockall, memset when done with stuff, etc). It runs on a freshly booted machine (no DSA involved, so we're not automatically hosed), so an attacker knows the initial pool state. Conveniently, some *secret* (say an ephemeral key, or, worse, a password) gets mixed in to the pool. There are apparently at most three bytes of extra data mixed in, but suppose the attacker knows add the words that were supposed to get mixed in. Now the program clears all its state to ensure forward secrecy, and *then* the machine gets hacked. Now the attacker can learn (with at most 2^24 guesses worth of computation) 24 bits worth of a secret, which could quite easily reduce the work involved in breaking whatever forward secret protocol was involved from intractable to somewhat easy. Or it could leak three bytes of password. Or whatever. This is no more precise than imagine there's some vulnerability in the RNG. Yes, if there's a vulnerability, then we're vulnerable. An attacker can always (at least in principle) get the pool out of the kernel. The RNG's design is premised on the notion that it is computationally infeasbile to get the input entropy out of the pool. If an attacker can watch data going into the pool, he needn't get it out of the pool. Sorry for the somewhat inflammatory email, but this is absurd. I agree. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: yield API
Kyle Moffett wrote: > That is a *terrible* disgusting way to use yield. Better options: >(1) inotify/dnotify Sure, tie yourself to a Linux-specific mechanism that may or may not work over things like NFS. That's much worse. >(2) create a "foo.lock" file and put the mutex in that Right, tie yourself to process-shared mutexes which historically weren't available on Linux. That's much better than an option that's been stable for a decade. >(3) just start with the check-file-and-sleep loop. How is that better? There is literally no improvement, since the first check will (almost) always fail. > > Now is this the best way to handle this situation? No. Does it > > work better than just doing the wait loop from the start? Yes. > > It works better than doing the wait-loop from the start? What > evidence do you provide to support this assertion? The evidence is that more than half the time, this avoids the sleep. That means it has zero cost, since the yield is no heavier than a sleep would be, and has a possible benefit, since the first sleep may be too long. > Specifically, in > the first case you tell the kernel "I'm waiting for something but I > don't know what it is or how long it will take"; while in the second > case you tell the kernel "I'm waiting for something that will take > exactly X milliseconds, even though I don't know what it is. If you > really want something similar to the old behavior then just replace > the "sched_yield()" call with a proper sleep for the estimated time > it will take the program to create the file. The problem is that if the estimate is too short, pre-emption will result in a huge performance drop. If the estimate is too long, there will be some wasted CPU. What was the claimed benefit of doing this again? > > Is this a good way to use sched_yield()? Maybe, maybe not. But it > > *is* an actual use of the API in a real app. > We weren't looking for "actual uses", especially not in binary-only > apps. What we are looking for is optimal uses of sched_yield(); ones > where that is the best alternative. This... certainly isn't. Your standards for "optimal" are totally unrealistic. In his case, it was optimal. Using platform-specific optimizations would have meant more development and test time for minimal benefit. Sleeping first would have had some performance cost and no benefit. In his case, sched_yield was optimal. Really. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: yield API
Kyle Moffett wrote: That is a *terrible* disgusting way to use yield. Better options: (1) inotify/dnotify Sure, tie yourself to a Linux-specific mechanism that may or may not work over things like NFS. That's much worse. (2) create a foo.lock file and put the mutex in that Right, tie yourself to process-shared mutexes which historically weren't available on Linux. That's much better than an option that's been stable for a decade. (3) just start with the check-file-and-sleep loop. How is that better? There is literally no improvement, since the first check will (almost) always fail. Now is this the best way to handle this situation? No. Does it work better than just doing the wait loop from the start? Yes. It works better than doing the wait-loop from the start? What evidence do you provide to support this assertion? The evidence is that more than half the time, this avoids the sleep. That means it has zero cost, since the yield is no heavier than a sleep would be, and has a possible benefit, since the first sleep may be too long. Specifically, in the first case you tell the kernel I'm waiting for something but I don't know what it is or how long it will take; while in the second case you tell the kernel I'm waiting for something that will take exactly X milliseconds, even though I don't know what it is. If you really want something similar to the old behavior then just replace the sched_yield() call with a proper sleep for the estimated time it will take the program to create the file. The problem is that if the estimate is too short, pre-emption will result in a huge performance drop. If the estimate is too long, there will be some wasted CPU. What was the claimed benefit of doing this again? Is this a good way to use sched_yield()? Maybe, maybe not. But it *is* an actual use of the API in a real app. We weren't looking for actual uses, especially not in binary-only apps. What we are looking for is optimal uses of sched_yield(); ones where that is the best alternative. This... certainly isn't. Your standards for optimal are totally unrealistic. In his case, it was optimal. Using platform-specific optimizations would have meant more development and test time for minimal benefit. Sleeping first would have had some performance cost and no benefit. In his case, sched_yield was optimal. Really. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why does reading from /dev/urandom deplete entropy so much?
Phillip Susi wrote: > What good does using multiple levels of RNG do? Why seed one RNG from > another? Wouldn't it be better to have just one RNG that everybody > uses? Doesn't the act of reading from the RNG add entropy to it, since > no one reader has any idea how often and at what times other readers are > stirring the pool? No, unfortunately. The problem is that while in most typical cases may be true, the estimate of how much entropy we have has to be based on the assumption that everything we've done up to that point has been carefully orchestrated by the mortal enemy of whatever is currently asking us for entropy. While I don't have any easy solutions with obvious irrefutable technical brilliance or that will make everyone happy, I do think that one of the problems is that neither /dev/random nor /dev/urandom are guaranteed to provide what most people want. In the most common use case, you want crypographically-strong randomness even under the assumption that all previous activity is orchestrated by the enemy. Unfortunately, /dev/urandom will happily give you randomness worse than this while /dev/random will block even when you have it. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why does reading from /dev/urandom deplete entropy so much?
Phillip Susi wrote: What good does using multiple levels of RNG do? Why seed one RNG from another? Wouldn't it be better to have just one RNG that everybody uses? Doesn't the act of reading from the RNG add entropy to it, since no one reader has any idea how often and at what times other readers are stirring the pool? No, unfortunately. The problem is that while in most typical cases may be true, the estimate of how much entropy we have has to be based on the assumption that everything we've done up to that point has been carefully orchestrated by the mortal enemy of whatever is currently asking us for entropy. While I don't have any easy solutions with obvious irrefutable technical brilliance or that will make everyone happy, I do think that one of the problems is that neither /dev/random nor /dev/urandom are guaranteed to provide what most people want. In the most common use case, you want crypographically-strong randomness even under the assumption that all previous activity is orchestrated by the enemy. Unfortunately, /dev/urandom will happily give you randomness worse than this while /dev/random will block even when you have it. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why does reading from /dev/urandom deplete entropy so much?
> heh, along those lines you could also do > > dmesg > /dev/random > > > > dmesg often has machine-unique identifiers of all sorts (including the > MAC address, if you have an ethernet driver loaded) > > Jeff A good three-part solution would be: 1) Encourage distributions to do "dmesg > /dev/random" in their startup scripts. This could even be added to the kernel (as a one-time dump of the kernel message buffer just before init is started). 2) Encourage drivers to output any unique information to the kernel log. I believe all/most Ethernet drivers already do this with MAC addresses. Perhaps we can get the kernel to include CPU serial numbers and we can get the IDE/SATA drivers to include hard drive serial numbers. We can also use the TSC, where available, in early bootup, which measures exactly how long it took to get the kernel going, which should have some entropy in it. 3) Add more entropy to the kernel's pool at early startup, even if the quality of that entropy is low. Track it appropriately, of course. This should be enough to get cryptographically-strong random numbers that would hold up against anyone who didn't have access to the 'dmesg' output. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Why does reading from /dev/urandom deplete entropy so much?
heh, along those lines you could also do dmesg /dev/random grin dmesg often has machine-unique identifiers of all sorts (including the MAC address, if you have an ethernet driver loaded) Jeff A good three-part solution would be: 1) Encourage distributions to do dmesg /dev/random in their startup scripts. This could even be added to the kernel (as a one-time dump of the kernel message buffer just before init is started). 2) Encourage drivers to output any unique information to the kernel log. I believe all/most Ethernet drivers already do this with MAC addresses. Perhaps we can get the kernel to include CPU serial numbers and we can get the IDE/SATA drivers to include hard drive serial numbers. We can also use the TSC, where available, in early bootup, which measures exactly how long it took to get the kernel going, which should have some entropy in it. 3) Add more entropy to the kernel's pool at early startup, even if the quality of that entropy is low. Track it appropriately, of course. This should be enough to get cryptographically-strong random numbers that would hold up against anyone who didn't have access to the 'dmesg' output. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is the PCI clock within the spec?
> > A scope probe will allow you to see if there is > > a clock signal. That's all. You can't determine > > its quality. A 4-inch ground lead on the scope > > probe will result in 10-20% overshoot and undershoot > > being observed. > I don't understand this 10-20% figure. > (0V + 10-20% is still 0V.) If you're jumping from a 900 foot marker to a 910 foot marker, does a 10% overshoot mean you jumped 1 foot too far or 90 feet too far? The percentage is of the distance you were trying to go, not of where you started or where you ended up. > AFAIU, the nominal peak-to-peak voltage is 3.3V. The observed > peak-to-peak voltage is 6.08V (3.3V + 84%). So a 10% undershoot would mean that rather than going from 3.3V to 0V, you overshot 0V by 10% or the distance you travelled. The voltages could just as well be 100V and 103.3V, the transitions would still be the same. What you call zero is, at least in principle, arbitrary. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is the PCI clock within the spec?
A scope probe will allow you to see if there is a clock signal. That's all. You can't determine its quality. A 4-inch ground lead on the scope probe will result in 10-20% overshoot and undershoot being observed. I don't understand this 10-20% figure. (0V + 10-20% is still 0V.) If you're jumping from a 900 foot marker to a 910 foot marker, does a 10% overshoot mean you jumped 1 foot too far or 90 feet too far? The percentage is of the distance you were trying to go, not of where you started or where you ended up. AFAIU, the nominal peak-to-peak voltage is 3.3V. The observed peak-to-peak voltage is 6.08V (3.3V + 84%). So a 10% undershoot would mean that rather than going from 3.3V to 0V, you overshot 0V by 10% or the distance you travelled. The voltages could just as well be 100V and 103.3V, the transitions would still be the same. What you call zero is, at least in principle, arbitrary. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
> * Mark Lord <[EMAIL PROTECTED]> wrote: > > Ack. And what of the suggestion to try to ensure that a yielding task > > simply not end up as the very next one chosen to run? Maybe by > > swapping it with another (adjacent?) task in the tree if it comes out > > on top again? > we did that too for quite some time in CFS - it was found to be "not > agressive enough" by some folks and "too agressive" by others. Then when > people started bickering over this we added these two simple corner > cases - switchable via a flag. (minimum agression and maximum agression) They are both correct. It is not agressive enough if there are tasks other than those two that are at the same static priority level and ready to run. It is too agressive if the task it is swapped with is at a lower static priority level. Perhaps it might be possible to scan for the task at the same static priority level that is ready-to-run but last in line among other ready-to-run tasks and put it after that task? I think that's about as close as we can get to the POSIX-specified behavior. > > Thanks Ingo -- I *really* like this scheduler! Just in case this isn't clear, I like CFS too and sincerely appreciate the work Ingo, Con, and others have done on it. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
Chris Friesen wrote: > David Schwartz wrote: > > I've asked versions of this question at least three times > > and never gotten > > anything approaching a straight answer: > > > > 1) What is the current default 'sched_yield' behavior? > > > > 2) What is the current alternate 'sched_yield' behavior? > I'm pretty sure I've seen responses from Ingo describing this multiple > times in various threads. Google should have them. > If I remember right, the default is to simply recalculate the task's > position in the tree and reinsert it, and the alternate is to yield to > everything currently runnable. The meaning of the default behavior then depends upon where in the tree it reinserts it. > > 3) Are either of them sensible? Simply acting as if the > > current thread's > > timeslice was up should be sufficient. > The new scheduler doesn't really have a concept of "timeslice". This is > one of the core problems with determining what to do on sched_yield(). Then it should probably just not support 'sched_yield' and return ENOSYS. Applications should work around an ENOSYS reply (since some versions of Solaris return this, among other reasons). Perhaps for compatability, it could also yield 'lightly' just in case applications ignore the return value. It could also handle it the way it handles the smallest sleep time that it supports. This is sub-optimal if no other task are ready-to-run at the same static priority level and that might be an expensive check. If CFS really can't support sched_yield's semantics, then it should just not, and that's that. Return ENOSYS and admit that the behavior sched_yield is documented to have simply can't be supported by the scheduler. > > The implication I keep getting is that neither the default > > behavior nor the > > alternate behavior are sensible. What is so hard about simply > > scheduling the > > next thread? > The problem is where do we insert the task that is yielding? CFS is > based around a tree structure ordered by time. We put it exactly where we would have when its timeslice ran out. If we can reward it a little bit, that's great. But if not, we can live with that. Just imagine that the timer interrupt fired to indicate the end of the thread's run time when the thread called 'sched_yield'. > The old scheduler was priority-based, so you could essentially yield to > everyone of the same niceness level. > > With the new scheduler, this would be possible, but would involve extra > work tracking the position of the rightmost task at each priority level. > This additional overhead is what Ingo is trying to avoid. Then what does he do when the task runs out of run time? It's hard to imagine we can't do that when the task calls sched_yield. > > We don't need perfection, but it sounds like we have two > > alternatives of > > which neither is sensible. > sched_yield() isn't a great API. I agree. > It just says to delay the task, > without specifying how long or what the task is waiting *for*. That is not true. The task is waiting for something that will be done by another thread that is ready-to-run and at the same priority level. The task does not need to wait until the thing is guaranteed done but wishes to wait until it is more likely to be done. This is an often-misused but sometimes sensible thing to do. I think the API gets blamed for two things that are not its fault: 1) It's often misunderstood and misused. 2) It was often chosen as a "best available" solution because no truly good solutions were available. > Other > constructs are much more useful because they give the scheduler more > information with which to make a decision. Sure, if there is more information. But if all you really want to do is wait until other threads at the same static priority level have had a chance to run, then sched_yield is the right API. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
I've asked versions of this question at least three times and never gotten anything approaching a straight answer: 1) What is the current default 'sched_yield' behavior? 2) What is the current alternate 'sched_yield' behavior? 3) Are either of them sensible? Simply acting as if the current thread's timeslice was up should be sufficient. The implication I keep getting is that neither the default behavior nor the alternate behavior are sensible. What is so hard about simply scheduling the next thread? We don't need perfection, but it sounds like we have two alternatives of which neither is sensible. DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
I've asked versions of this question at least three times and never gotten anything approaching a straight answer: 1) What is the current default 'sched_yield' behavior? 2) What is the current alternate 'sched_yield' behavior? 3) Are either of them sensible? Simply acting as if the current thread's timeslice was up should be sufficient. The implication I keep getting is that neither the default behavior nor the alternate behavior are sensible. What is so hard about simply scheduling the next thread? We don't need perfection, but it sounds like we have two alternatives of which neither is sensible. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
Chris Friesen wrote: David Schwartz wrote: I've asked versions of this question at least three times and never gotten anything approaching a straight answer: 1) What is the current default 'sched_yield' behavior? 2) What is the current alternate 'sched_yield' behavior? I'm pretty sure I've seen responses from Ingo describing this multiple times in various threads. Google should have them. If I remember right, the default is to simply recalculate the task's position in the tree and reinsert it, and the alternate is to yield to everything currently runnable. The meaning of the default behavior then depends upon where in the tree it reinserts it. 3) Are either of them sensible? Simply acting as if the current thread's timeslice was up should be sufficient. The new scheduler doesn't really have a concept of timeslice. This is one of the core problems with determining what to do on sched_yield(). Then it should probably just not support 'sched_yield' and return ENOSYS. Applications should work around an ENOSYS reply (since some versions of Solaris return this, among other reasons). Perhaps for compatability, it could also yield 'lightly' just in case applications ignore the return value. It could also handle it the way it handles the smallest sleep time that it supports. This is sub-optimal if no other task are ready-to-run at the same static priority level and that might be an expensive check. If CFS really can't support sched_yield's semantics, then it should just not, and that's that. Return ENOSYS and admit that the behavior sched_yield is documented to have simply can't be supported by the scheduler. The implication I keep getting is that neither the default behavior nor the alternate behavior are sensible. What is so hard about simply scheduling the next thread? The problem is where do we insert the task that is yielding? CFS is based around a tree structure ordered by time. We put it exactly where we would have when its timeslice ran out. If we can reward it a little bit, that's great. But if not, we can live with that. Just imagine that the timer interrupt fired to indicate the end of the thread's run time when the thread called 'sched_yield'. The old scheduler was priority-based, so you could essentially yield to everyone of the same niceness level. With the new scheduler, this would be possible, but would involve extra work tracking the position of the rightmost task at each priority level. This additional overhead is what Ingo is trying to avoid. Then what does he do when the task runs out of run time? It's hard to imagine we can't do that when the task calls sched_yield. We don't need perfection, but it sounds like we have two alternatives of which neither is sensible. sched_yield() isn't a great API. I agree. It just says to delay the task, without specifying how long or what the task is waiting *for*. That is not true. The task is waiting for something that will be done by another thread that is ready-to-run and at the same priority level. The task does not need to wait until the thing is guaranteed done but wishes to wait until it is more likely to be done. This is an often-misused but sometimes sensible thing to do. I think the API gets blamed for two things that are not its fault: 1) It's often misunderstood and misused. 2) It was often chosen as a best available solution because no truly good solutions were available. Other constructs are much more useful because they give the scheduler more information with which to make a decision. Sure, if there is more information. But if all you really want to do is wait until other threads at the same static priority level have had a chance to run, then sched_yield is the right API. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: sched_yield: delete sysctl_sched_compat_yield
* Mark Lord [EMAIL PROTECTED] wrote: Ack. And what of the suggestion to try to ensure that a yielding task simply not end up as the very next one chosen to run? Maybe by swapping it with another (adjacent?) task in the tree if it comes out on top again? we did that too for quite some time in CFS - it was found to be not agressive enough by some folks and too agressive by others. Then when people started bickering over this we added these two simple corner cases - switchable via a flag. (minimum agression and maximum agression) They are both correct. It is not agressive enough if there are tasks other than those two that are at the same static priority level and ready to run. It is too agressive if the task it is swapped with is at a lower static priority level. Perhaps it might be possible to scan for the task at the same static priority level that is ready-to-run but last in line among other ready-to-run tasks and put it after that task? I think that's about as close as we can get to the POSIX-specified behavior. Thanks Ingo -- I *really* like this scheduler! Just in case this isn't clear, I like CFS too and sincerely appreciate the work Ingo, Con, and others have done on it. DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: namespace support requires network modules to say "GPL"
> > Then init_net needs to be not GPL limited. Sorry, we need to allow > > non GPL network drivers. There is a fine line between keeping the > Why - they aren't exactly likely to be permissible by law Really? What law and/or what clause in the GPL says that derivative works have to be licensed under the GPL? Or does the kernel have some new technique to determine whether or not code has been distributed? As I read the GPL, it only requires you to release something under the GPL if you distribute it. The kernel has no idea whether or not code has been distributed. So if it's enforcing the GPL, it cannot prohibit anything non-distributed code can lawfully do. (Ergo, it's *NOT* *ENFORCING* the GPL.) > > binary seething masses from accessing random kernel functions, > and allowing > > reasonable (but still non GPL) things like ndiswrapper to use network > > device interface. > > Its up to the ndiswrapper authors how the licence their code, but they > should respect how we licence ours. You license yours under the GPL, so they should respect the GPL. It sounds like we're back to where we were years ago. Didn't we already agree that EXPORT_SYMBOL_GPL was *NOT* a GPL-enforcement mechanism and had nothing to do with respecting the GPL? After all, if it s a GPL-enforcement mechanism, why is it not a "further restriction" which is prohibited by the GPL? (The GPL contains no restrictions on what code can use what symbols if that code is not distributed, but EXPORT_SYMBOL_GPL does.) Are you now claiming that EXPORT_SYMBOL_GPL is intended to enforce the GPL? DS -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: namespace support requires network modules to say GPL
Then init_net needs to be not GPL limited. Sorry, we need to allow non GPL network drivers. There is a fine line between keeping the Why - they aren't exactly likely to be permissible by law Really? What law and/or what clause in the GPL says that derivative works have to be licensed under the GPL? Or does the kernel have some new technique to determine whether or not code has been distributed? As I read the GPL, it only requires you to release something under the GPL if you distribute it. The kernel has no idea whether or not code has been distributed. So if it's enforcing the GPL, it cannot prohibit anything non-distributed code can lawfully do. (Ergo, it's *NOT* *ENFORCING* the GPL.) binary seething masses from accessing random kernel functions, and allowing reasonable (but still non GPL) things like ndiswrapper to use network device interface. Its up to the ndiswrapper authors how the licence their code, but they should respect how we licence ours. You license yours under the GPL, so they should respect the GPL. It sounds like we're back to where we were years ago. Didn't we already agree that EXPORT_SYMBOL_GPL was *NOT* a GPL-enforcement mechanism and had nothing to do with respecting the GPL? After all, if it s a GPL-enforcement mechanism, why is it not a further restriction which is prohibited by the GPL? (The GPL contains no restrictions on what code can use what symbols if that code is not distributed, but EXPORT_SYMBOL_GPL does.) Are you now claiming that EXPORT_SYMBOL_GPL is intended to enforce the GPL? DS -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Question regarding mutex locking
> Thanks for the help. Someday, I hope to understand this stuff. > > Larry Any code either deals with an object or it doesn't. If it doesn't deal with that object, it should not be acquiring locks on that object. If it does deal with that object, it must know the internal details of that object, including when and whether locks are held, or it cannot deal with that object sanely. So your question starts out broken, it says, "I need to lock an object, but I have no clue what's going on with that very same object." If you don't know what's going on with the object, you don't know enough about the object to lock it. If you do, you should know whether you hold the lock or not. Either architect so this function doesn't deal with that object and so doesn't need to lock it or architect it so that this function knows what's going on with that object and so knows whether it holds the lock or not. If you don't follow this rule, a lot of things can go horribly wrong. The two biggest issues are: 1) You don't know the semantic effect of locking and unlocking the mutex. So any code placed before the mutex is acquired or after its released may not do what's expected. For example, you cannot unlock the mutex and yield, because you might not actually wind up unlocking the mutex. 2) A function that acquires a lock normally expects the object it locks to be in a consistent state when it acquires the lock. However, since your code may or may not acquire the mutex, it is not assured that its lock gets the object in a consistent state. Requiring the caller to know this and call the function with the object in a consistent state creates brokenness of varying kinds. (If the object may change, why not just release the lock before calling? If the object may not change, why is the sub-function releasing the lock?) DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Question regarding mutex locking
Thanks for the help. Someday, I hope to understand this stuff. Larry Any code either deals with an object or it doesn't. If it doesn't deal with that object, it should not be acquiring locks on that object. If it does deal with that object, it must know the internal details of that object, including when and whether locks are held, or it cannot deal with that object sanely. So your question starts out broken, it says, I need to lock an object, but I have no clue what's going on with that very same object. If you don't know what's going on with the object, you don't know enough about the object to lock it. If you do, you should know whether you hold the lock or not. Either architect so this function doesn't deal with that object and so doesn't need to lock it or architect it so that this function knows what's going on with that object and so knows whether it holds the lock or not. If you don't follow this rule, a lot of things can go horribly wrong. The two biggest issues are: 1) You don't know the semantic effect of locking and unlocking the mutex. So any code placed before the mutex is acquired or after its released may not do what's expected. For example, you cannot unlock the mutex and yield, because you might not actually wind up unlocking the mutex. 2) A function that acquires a lock normally expects the object it locks to be in a consistent state when it acquires the lock. However, since your code may or may not acquire the mutex, it is not assured that its lock gets the object in a consistent state. Requiring the caller to know this and call the function with the object in a consistent state creates brokenness of varying kinds. (If the object may change, why not just release the lock before calling? If the object may not change, why is the sub-function releasing the lock?) DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC/PATCH] SO_NO_CHECK for IPv6
> David Schwartz <[EMAIL PROTECTED]> wrote: > >> Regardless of whatever verifications your application is doing > >> on the data, it is not checksumming the ports and that's what > >> the pseudo-header is helping with. > > So what? We are in the case where the data has already gotten > > to him. If it > > got to him in error, he'll reject it anyway. The receive > > checksum check will > > only reject packets that he would reject anyway. That makes it needless. > What if it goes to the wrong recipient who doesn't have the upper- > level checksums? Since that's not him, he has no control over its policy and thus no ability to harm it or help it. > This is the whole point, IPv6 unlike IPv4 does not have IP header > checksums so the high-level needs to protect it by checksumming > the pseudo-header. Exactly. But *he* doesn't need to check that checksum, given that he already got the packet, since he has an upper-level checksum. He is not saying that his reasoning applies to everyone, just that it applies to him. He is not talking about disabling the send checksum, but the receive checksum. He knows that he does not need it. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC/PATCH] SO_NO_CHECK for IPv6
David Schwartz [EMAIL PROTECTED] wrote: Regardless of whatever verifications your application is doing on the data, it is not checksumming the ports and that's what the pseudo-header is helping with. So what? We are in the case where the data has already gotten to him. If it got to him in error, he'll reject it anyway. The receive checksum check will only reject packets that he would reject anyway. That makes it needless. What if it goes to the wrong recipient who doesn't have the upper- level checksums? Since that's not him, he has no control over its policy and thus no ability to harm it or help it. This is the whole point, IPv6 unlike IPv4 does not have IP header checksums so the high-level needs to protect it by checksumming the pseudo-header. Exactly. But *he* doesn't need to check that checksum, given that he already got the packet, since he has an upper-level checksum. He is not saying that his reasoning applies to everyone, just that it applies to him. He is not talking about disabling the send checksum, but the receive checksum. He knows that he does not need it. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC/PATCH] SO_NO_CHECK for IPv6
> Regardless of whatever verifications your application is doing > on the data, it is not checksumming the ports and that's what > the pseudo-header is helping with. So what? We are in the case where the data has already gotten to him. If it got to him in error, he'll reject it anyway. The receive checksum check will only reject packets that he would reject anyway. That makes it needless. Of course, if the check is nearly free, there's no potential win, so no point in bothering. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC/PATCH] SO_NO_CHECK for IPv6
Regardless of whatever verifications your application is doing on the data, it is not checksumming the ports and that's what the pseudo-header is helping with. So what? We are in the case where the data has already gotten to him. If it got to him in error, he'll reject it anyway. The receive checksum check will only reject packets that he would reject anyway. That makes it needless. Of course, if the check is nearly free, there's no potential win, so no point in bothering. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Time-based RFC 4122 UUID generator
> > I use libuuid and I assume libuuid uses some uuid generator support > > from the kernel. > > No, it does not. It's pure userspace and may produce double UUIDs. > > > libuuid comes from a package that Ted's maintain IIRC. > > > > I (my company) use uuid to uniquely identify objects in a distributed > > database. > > [Proprietary closed source stuff]. > > Same here. > > Helge Any UUID generator that can produce duplicate UUIDs with probability significantly less than purely random UUIDs is so badly broken that it should not ever be used. Anyone who finds such a UUID generator should immediately either fix it or throw it on the junk heap. Anyone who knowingly uses such a UUID generator should be publically shamed. Rather than (or at the very least, in addition to) adding a new UUID generator, let's fix the one(s) we have. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Time-based RFC 4122 UUID generator
I use libuuid and I assume libuuid uses some uuid generator support from the kernel. No, it does not. It's pure userspace and may produce double UUIDs. libuuid comes from a package that Ted's maintain IIRC. I (my company) use uuid to uniquely identify objects in a distributed database. [Proprietary closed source stuff]. Same here. Helge Any UUID generator that can produce duplicate UUIDs with probability significantly less than purely random UUIDs is so badly broken that it should not ever be used. Anyone who finds such a UUID generator should immediately either fix it or throw it on the junk heap. Anyone who knowingly uses such a UUID generator should be publically shamed. Rather than (or at the very least, in addition to) adding a new UUID generator, let's fix the one(s) we have. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Policy on dual licensing?
> What I suppose is that people porting BSD code to Linux don't mean > closing the doors for back-porting changes. They are simply unaware > or forget about the possibility of dual licensing. Obviously, each > submitter should read Documentation/SubmittingDrivers, where it is > explicitly stated. Yet humans are prone to forgetting, so this may > seem not enough. > > What I propose is implementing a policy on accepting such code. > According to it, every time a maintainer is considering a driver > that is derived from BSD and licensed GPL-only, should request > for dual licensing before accepting the patch. If the submitter is > reluctant to do so - what can we do, it's better to have this inside > this way than not at all. However, this should minimize such cases > and, hopefully, satisfy the claims about Linux maintainers not doing > all that they could to make the world a better place. > > Best regards, > Remigiusz Modrzejewski This will result in more code in the Linux tree that has a license other than the project default. This will impose a greater and greater burden on developers who have to carefully check the license of files every time they cut and paste code from one file into another. It creates a serious risk of incorrect license notices (because someone cuts/pastes a substantial chunk of GPL-only code into a dual-licensed file without changing the license notice) and accidental copyright violations (because someone else took the cut/pasted part into a BSD-licensed project) if intimately-connected files are under different licenses. Every effort should be made to avoid this. Having a clear policy would be a good idea. I think the general policy should be that any dual-licensed file should contain a clear notice that the Linux kernel is GPL (that is the only license 'guaranteed' to cover the entire distribution) and that development may result in the file being "contaminated" by code that is not dual-licensed. Just a notice referring to a 'dual license FAQ' in Documention would be fine, of course. That file should advise developers that they should remove the dual license if they cause the file to be no longer dual-licensable due to code they've added, cut/pasted, or modified. Gratuitous removal of dual-licensing should be discouraged, but removing it should be encouraged where it's a genuine impediment to development. The example I always use is if we have a filesystem with a different license. Imagine if a new function is added to the filesystem interface. It is offered in a 'generic' version, with the expectation that filesystems will override it to provide a better-performing version. Imagine if the generic version is GPLv2-only and a filesystem in-tree is dual licensed. A developer probably cannot cut/paste the generic version as a base without breaking the dual license. If they want to keep the dual license, they have to re-implement the function. This creates an increased risk of bugs or incompatibilities. Worse, it creates a maintenance headache in that this function will need to be understood separately from other filesystems' implementation of the same function. A little imagination will allow one to imagine many ways this can cause problems. The only good way this can end is if they change the license on that file to GPL only. Possible bad ways include accidentally contaminating the apparently dual-licensed file with code that was offered by its author only under the GPL. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Policy on dual licensing?
What I suppose is that people porting BSD code to Linux don't mean closing the doors for back-porting changes. They are simply unaware or forget about the possibility of dual licensing. Obviously, each submitter should read Documentation/SubmittingDrivers, where it is explicitly stated. Yet humans are prone to forgetting, so this may seem not enough. What I propose is implementing a policy on accepting such code. According to it, every time a maintainer is considering a driver that is derived from BSD and licensed GPL-only, should request for dual licensing before accepting the patch. If the submitter is reluctant to do so - what can we do, it's better to have this inside this way than not at all. However, this should minimize such cases and, hopefully, satisfy the claims about Linux maintainers not doing all that they could to make the world a better place. Best regards, Remigiusz Modrzejewski This will result in more code in the Linux tree that has a license other than the project default. This will impose a greater and greater burden on developers who have to carefully check the license of files every time they cut and paste code from one file into another. It creates a serious risk of incorrect license notices (because someone cuts/pastes a substantial chunk of GPL-only code into a dual-licensed file without changing the license notice) and accidental copyright violations (because someone else took the cut/pasted part into a BSD-licensed project) if intimately-connected files are under different licenses. Every effort should be made to avoid this. Having a clear policy would be a good idea. I think the general policy should be that any dual-licensed file should contain a clear notice that the Linux kernel is GPL (that is the only license 'guaranteed' to cover the entire distribution) and that development may result in the file being contaminated by code that is not dual-licensed. Just a notice referring to a 'dual license FAQ' in Documention would be fine, of course. That file should advise developers that they should remove the dual license if they cause the file to be no longer dual-licensable due to code they've added, cut/pasted, or modified. Gratuitous removal of dual-licensing should be discouraged, but removing it should be encouraged where it's a genuine impediment to development. The example I always use is if we have a filesystem with a different license. Imagine if a new function is added to the filesystem interface. It is offered in a 'generic' version, with the expectation that filesystems will override it to provide a better-performing version. Imagine if the generic version is GPLv2-only and a filesystem in-tree is dual licensed. A developer probably cannot cut/paste the generic version as a base without breaking the dual license. If they want to keep the dual license, they have to re-implement the function. This creates an increased risk of bugs or incompatibilities. Worse, it creates a maintenance headache in that this function will need to be understood separately from other filesystems' implementation of the same function. A little imagination will allow one to imagine many ways this can cause problems. The only good way this can end is if they change the license on that file to GPL only. Possible bad ways include accidentally contaminating the apparently dual-licensed file with code that was offered by its author only under the GPL. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
> Another conclusion from the cited text is that in contrast with what > was stated before on the gcc mailing list, it is not required to > declare thread-shared variables volatile if that thread-shared data is > consistently protected by calls to locking functions. > > Bart Van Assche. It all depends upon what threading standard you are using. If GCC is going to support POSIX threading, it cannot require that thread-shared data be marked 'volatile' since POSIX does not require this. It can offer semantic guarantees for volatile-qualified data if it wants to. But POSIX provides a set of guarantees that do not require marking data as 'volatile' and if GCC is going to support POSIX threading, it has to support providing those guarantees. As far as I know, no threading standard either requires 'volatile' or states that it is sufficient for any particular purpose. So there seems to be no reason to declare thread-shared variables as volatile except as some kind of platform-specific optimization. POSIX mutexes are sufficient. They are necessary if there is no other way to get the guarantees you need. Nothing prevents GCC from providing any guarantees it wants for 'volatile' qualified data. But POSIX mutexes must work as POSIX specifies or GCC cannot support POSIX threading. This is the nightmare scenario (thanks to Hans-J. Boehm): int x; bool need_to_lock; pthread_mutex_t mutex; for(int i=0; i<50; i++) { if(unlikely(need_to_lock)) pthread_mutex_lock(); x++; if(unlikely(need_to_lock)) pthread_mutex_unlock(); } Now suppose the compiler optimizes this as follows: register=x; for(int i=0; i<50; i++) { if(need_to_lock) { x=register; pthread_mutex_lock() register=x; } register++; if(need_to_lock) { x=register; pthread_mutex_unlock(); register=x; } } x=register; This is a perfectly legal optimization for single-threaded code. It may in fact be an actual optimization. Clearly, it totally destroys threaded code. This shows that, unfortunately, the normal assumption that not knowing anything about the pthread functions ensures that optimizations won't break them is incorrect. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
Another conclusion from the cited text is that in contrast with what was stated before on the gcc mailing list, it is not required to declare thread-shared variables volatile if that thread-shared data is consistently protected by calls to locking functions. Bart Van Assche. It all depends upon what threading standard you are using. If GCC is going to support POSIX threading, it cannot require that thread-shared data be marked 'volatile' since POSIX does not require this. It can offer semantic guarantees for volatile-qualified data if it wants to. But POSIX provides a set of guarantees that do not require marking data as 'volatile' and if GCC is going to support POSIX threading, it has to support providing those guarantees. As far as I know, no threading standard either requires 'volatile' or states that it is sufficient for any particular purpose. So there seems to be no reason to declare thread-shared variables as volatile except as some kind of platform-specific optimization. POSIX mutexes are sufficient. They are necessary if there is no other way to get the guarantees you need. Nothing prevents GCC from providing any guarantees it wants for 'volatile' qualified data. But POSIX mutexes must work as POSIX specifies or GCC cannot support POSIX threading. This is the nightmare scenario (thanks to Hans-J. Boehm): int x; bool need_to_lock; pthread_mutex_t mutex; for(int i=0; i50; i++) { if(unlikely(need_to_lock)) pthread_mutex_lock(mutex); x++; if(unlikely(need_to_lock)) pthread_mutex_unlock(mutex); } Now suppose the compiler optimizes this as follows: register=x; for(int i=0; i50; i++) { if(need_to_lock) { x=register; pthread_mutex_lock(mutex) register=x; } register++; if(need_to_lock) { x=register; pthread_mutex_unlock(mutex); register=x; } } x=register; This is a perfectly legal optimization for single-threaded code. It may in fact be an actual optimization. Clearly, it totally destroys threaded code. This shows that, unfortunately, the normal assumption that not knowing anything about the pthread functions ensures that optimizations won't break them is incorrect. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: epoll design problems with common fork/exec patterns
Eric Dumazet wrote: > Events are not necessarly reported "by descriptors". epoll uses an opaque > field provided by the user. > > It's up to the user to properly chose a tag that will makes sense > if the user > app is playing dup()/close() games for example. Great. So the only issue then is that the documentation is confusing. It frequently uses the term "fd" where it means file. For example, it says: Q1 What happens if you add the same fd to an epoll_set twice? A1 You will probably get EEXIST. However, it is possible that two threads may add the same fd twice. This is a harmless condition. This gives no reason to think there's anything wrong with adding the same file twice so long as you do so through different descriptors. (One can imagine an application that does this to segregate read and write operations to avoid a race where the descriptor is closed from under a writer due to handling a fatal read error.) Obviously, that won't work. And this part: Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. This is incorrect. Closing an fd will not cause it to be removed from all epoll sets automatically. Only closing a file will. This is what caused the OP's confusion, and it is at best imprecise and, at worst, flat out wrong. DS PS: It is customary to trim individuals off of CC lists when replying to a list when the subject matter of the post is squarely inside the subject of the list. If the person CC'd was interested in the list's subject, he or she would presumably subscribe to the list. Not everyone wants two copies of every post. Not everyone wants a personal copy of every sub-thread that results from a post they make. In the past few years, I've received approximately an equal number of complaints about trimming CC's on posts to LKML and not trimming CC's on such posts. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: epoll design problems with common fork/exec patterns
Eric Dumazet wrote: Events are not necessarly reported by descriptors. epoll uses an opaque field provided by the user. It's up to the user to properly chose a tag that will makes sense if the user app is playing dup()/close() games for example. Great. So the only issue then is that the documentation is confusing. It frequently uses the term fd where it means file. For example, it says: Q1 What happens if you add the same fd to an epoll_set twice? A1 You will probably get EEXIST. However, it is possible that two threads may add the same fd twice. This is a harmless condition. This gives no reason to think there's anything wrong with adding the same file twice so long as you do so through different descriptors. (One can imagine an application that does this to segregate read and write operations to avoid a race where the descriptor is closed from under a writer due to handling a fatal read error.) Obviously, that won't work. And this part: Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. This is incorrect. Closing an fd will not cause it to be removed from all epoll sets automatically. Only closing a file will. This is what caused the OP's confusion, and it is at best imprecise and, at worst, flat out wrong. DS PS: It is customary to trim individuals off of CC lists when replying to a list when the subject matter of the post is squarely inside the subject of the list. If the person CC'd was interested in the list's subject, he or she would presumably subscribe to the list. Not everyone wants two copies of every post. Not everyone wants a personal copy of every sub-thread that results from a post they make. In the past few years, I've received approximately an equal number of complaints about trimming CC's on posts to LKML and not trimming CC's on such posts. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: epoll design problems with common fork/exec patterns
> 6) Epoll removes the file from the set, when the *kernel* object gets >closed (internal use-count goes to zero) > > With that in mind, how can the code snippet above trigger a removal from > the epoll set? I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd 5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set, since there no longer is an fd 8? Events on files registered for epoll notification are reported by descriptor, so the set membership has to be associated (as reflected into userspace) with the descriptor, not the file. For example, consider: 1) Process creates an epoll set, the set gets fd 4. 2) Process creates a socket, it gets fd 5. 3) The process adds fd 5 to set 4. 4) The process forks. 5) The child inherits the epoll set but not the socket. Here the kernel cannot quite do the right thing. Ideally, the parent would still have fd 5 in its version of the epoll set. After all, it has not closed fd 5. However, the child *cannot* see fd 5 in its version of the epoll set since it has no fd 5. An event reported for fd 5 would be nonsense. So it seems the kernel either has to break one of these "would/cannot" requirements, or it has to split the epoll set in two. However, splitting the set into two sets is clearly wrong since the processes should share it. Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. Note that this talks of the close of an "fd", not a file. The 'close' function in fact closes an fd, as that fd is then reusable. So it sounds like the problem above is solved by removing the fd from the set, but in practice this doesn't happen. I have programs that call 'close' between 'fork' and 'exec' and do not see the socket removed from the poll set. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: epoll design problems with common fork/exec patterns
6) Epoll removes the file from the set, when the *kernel* object gets closed (internal use-count goes to zero) With that in mind, how can the code snippet above trigger a removal from the epoll set? I don't see how that can be. Suppose I add fd 8 to an epoll set. Suppose fd 5 is a dup of fd 8. Now, I close fd 8. How can fd 8 remain in my epoll set, since there no longer is an fd 8? Events on files registered for epoll notification are reported by descriptor, so the set membership has to be associated (as reflected into userspace) with the descriptor, not the file. For example, consider: 1) Process creates an epoll set, the set gets fd 4. 2) Process creates a socket, it gets fd 5. 3) The process adds fd 5 to set 4. 4) The process forks. 5) The child inherits the epoll set but not the socket. Here the kernel cannot quite do the right thing. Ideally, the parent would still have fd 5 in its version of the epoll set. After all, it has not closed fd 5. However, the child *cannot* see fd 5 in its version of the epoll set since it has no fd 5. An event reported for fd 5 would be nonsense. So it seems the kernel either has to break one of these would/cannot requirements, or it has to split the epoll set in two. However, splitting the set into two sets is clearly wrong since the processes should share it. Q6 Will the close of an fd cause it to be removed from all epoll sets automatically? A6 Yes. Note that this talks of the close of an fd, not a file. The 'close' function in fact closes an fd, as that fd is then reusable. So it sounds like the problem above is solved by removing the fd from the set, but in practice this doesn't happen. I have programs that call 'close' between 'fork' and 'exec' and do not see the socket removed from the poll set. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
> Well, yeah. I know what you mean. However, at this moment, some gcc > developers are trying really hard not to be total d*ckheads about this > issue, but get gcc fixed. Give us a chance. > > Andrew. Can we get some kind of consensus that 'optimizations' that add writes to any object that the programmer might have taken the address of are invalid on any platform that supports memory protection? That seems like obvious common sense to me. And it has the advantage that it can't be language-lawyered. There is no document that states the rational requirements of a compiler that's going to support a memory protection model. So they can be anything rational people think they should be. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
Well, yeah. I know what you mean. However, at this moment, some gcc developers are trying really hard not to be total d*ckheads about this issue, but get gcc fixed. Give us a chance. Andrew. Can we get some kind of consensus that 'optimizations' that add writes to any object that the programmer might have taken the address of are invalid on any platform that supports memory protection? That seems like obvious common sense to me. And it has the advantage that it can't be language-lawyered. There is no document that states the rational requirements of a compiler that's going to support a memory protection model. So they can be anything rational people think they should be. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
I asked a collection of knowledgeable people I know about the issue. The consensus is that the optimization is not permitted in POSIX code but that it is permitted in pure C code. The basic argument goes like this: To make POSIX-compliant code even possible, surely optimizations that add writes to variables must be prohibited. That is -- if POSIX prohibits writing to a variable in certain cases only the programmer can detect, then a POSIX-compliant compiler cannot write to a variable except where explicitly told to do so. Any optimization that *adds* a write to a variable that would not otherwise occur *must* be prohibited. Otherwise, it is literally impossible to comply with the POSIX requirement that concurrent modifications and reads to shared variables take place while holding a mutex. The simplest solution is simply to ditch the optimization. If it really isn't even an optimization, then that's an easy way out. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
I asked a collection of knowledgeable people I know about the issue. The consensus is that the optimization is not permitted in POSIX code but that it is permitted in pure C code. The basic argument goes like this: To make POSIX-compliant code even possible, surely optimizations that add writes to variables must be prohibited. That is -- if POSIX prohibits writing to a variable in certain cases only the programmer can detect, then a POSIX-compliant compiler cannot write to a variable except where explicitly told to do so. Any optimization that *adds* a write to a variable that would not otherwise occur *must* be prohibited. Otherwise, it is literally impossible to comply with the POSIX requirement that concurrent modifications and reads to shared variables take place while holding a mutex. The simplest solution is simply to ditch the optimization. If it really isn't even an optimization, then that's an easy way out. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
> Well that's exactly right. For threaded programs (and maybe even > real-world non-threaded ones in general), you don't want to be > even _reading_ global variables if you don't need to. Cache misses > and cacheline bouncing could easily cause performance to completely > tank in some cases while only gaining a cycle or two in > microbenchmarks for doing these funny x86 predication things. For some CPUs, replacing an conditional branch with a conditional move is a *huge* win because it cannot be mispredicted. In general, compilers should optimize for unshared data since that's much more common in typical code. Even for shared data, the usual case is that you are going to access the data few times, so pulling the cache line to the CPU is essentially free since it will happen eventually. Heuristics may show that the vast majority of such constructs write anyway. So the optimization may also be valid based on such heuristics. A better question is whether it's legal for a compiler that claims to support POSIX threads. I'm going to post on comp.programming.threads, where the threading experts hang out. A very interesting case to be sure. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Is gcc thread-unsafe?
Well that's exactly right. For threaded programs (and maybe even real-world non-threaded ones in general), you don't want to be even _reading_ global variables if you don't need to. Cache misses and cacheline bouncing could easily cause performance to completely tank in some cases while only gaining a cycle or two in microbenchmarks for doing these funny x86 predication things. For some CPUs, replacing an conditional branch with a conditional move is a *huge* win because it cannot be mispredicted. In general, compilers should optimize for unshared data since that's much more common in typical code. Even for shared data, the usual case is that you are going to access the data few times, so pulling the cache line to the CPU is essentially free since it will happen eventually. Heuristics may show that the vast majority of such constructs write anyway. So the optimization may also be valid based on such heuristics. A better question is whether it's legal for a compiler that claims to support POSIX threads. I'm going to post on comp.programming.threads, where the threading experts hang out. A very interesting case to be sure. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [rfc][patch 3/3] x86: optimise barriers
> From: Intel(R) 64 and IA-32 Architectures Software Developer's Manual > Volume 3A: > >"7.2.2 Memory Ordering in P6 and More Recent Processor Families > ... > 1. Reads can be carried out speculatively and in any order. > ..." > > So, it looks to me like almost the 1-st Commandment. Some people (like > me) did believe this, others tried to check, and it was respected for > years notwithstanding nobody had ever seen such an event. When Intel first added speculative loads to the x86 family, they pegged the speculative load to the cache line. If the cache line is invalidated, so is the speculative load. As a result, out-of-order reads to normal memory are invisible to software. If a write to the same memory location on another CPU would make the fetched value invalid, it will make the cache line invalid, which invalidates the fetch. I think it's extremely unlikely that any x86 CPU will do this any differently. It's hard to imagine Intel and AMD would go to all this trouble for so long just to stop so late in the line's lifetime. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [rfc][patch 3/3] x86: optimise barriers
From: Intel(R) 64 and IA-32 Architectures Software Developer's Manual Volume 3A: 7.2.2 Memory Ordering in P6 and More Recent Processor Families ... 1. Reads can be carried out speculatively and in any order. ... So, it looks to me like almost the 1-st Commandment. Some people (like me) did believe this, others tried to check, and it was respected for years notwithstanding nobody had ever seen such an event. When Intel first added speculative loads to the x86 family, they pegged the speculative load to the cache line. If the cache line is invalidated, so is the speculative load. As a result, out-of-order reads to normal memory are invisible to software. If a write to the same memory location on another CPU would make the fetched value invalid, it will make the cache line invalid, which invalidates the fetch. I think it's extremely unlikely that any x86 CPU will do this any differently. It's hard to imagine Intel and AMD would go to all this trouble for so long just to stop so late in the line's lifetime. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Aggregation in embedded context, is kernel GPL2 prejudiceagainst embedded systems?
Adrian Bunk wrote: > even for dynamically linking including non-GPL code is not white but > already dark grey. IANAL, but personally, I think it's perfectly black and white. No mechanical combination (that means compressing, linking, tarring, compiling, or whatever) can create a work for copyright purposes. It can only convert the original work into a new form or aggregate works. There are a few exceptions to this by statute. For example, translation (by explicit law) can create a derivative work. Presumably this was because nobody ever imagined an automated process that could translate a work. It was assumed such a process must always be creative. To create a 'derivative work', you must create a new *work*, and a compiler and linker can't do that. Under copyright law, the creation of a work requires creative input. Compilers and linkers are not creative. If you link two works together, the result is an aggregate of those two works (and possibly the linker). This must be the case because there is no creative combination, and without creativity, a new work (for copyright purposes) cannot be formed. No amount of mechanical automated combination of works can create a new work for copyright purposes. If you feed A and B into a linker, all you can get out is A, B, and perhaps the linker. This doesn't mean that the result isn't a derivative work of one of the inputs. But this can only happen if one of the input works was a derivative to begin with. "Mere aggregation" must mean as opposed to creative combination. Think about a tar/gzip. Bits of each work are mixed into the other as the subsequent work has elements in common to the previous work compressed out. This is just as much mixing as a linker does, perhaps arguably more. The key is that no creativity is used, and thus no *new* work (and a derivative work is a new work) is created. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Aggregation in embedded context, is kernel GPL2 prejudiceagainst embedded systems?
Adrian Bunk wrote: even for dynamically linking including non-GPL code is not white but already dark grey. IANAL, but personally, I think it's perfectly black and white. No mechanical combination (that means compressing, linking, tarring, compiling, or whatever) can create a work for copyright purposes. It can only convert the original work into a new form or aggregate works. There are a few exceptions to this by statute. For example, translation (by explicit law) can create a derivative work. Presumably this was because nobody ever imagined an automated process that could translate a work. It was assumed such a process must always be creative. To create a 'derivative work', you must create a new *work*, and a compiler and linker can't do that. Under copyright law, the creation of a work requires creative input. Compilers and linkers are not creative. If you link two works together, the result is an aggregate of those two works (and possibly the linker). This must be the case because there is no creative combination, and without creativity, a new work (for copyright purposes) cannot be formed. No amount of mechanical automated combination of works can create a new work for copyright purposes. If you feed A and B into a linker, all you can get out is A, B, and perhaps the linker. This doesn't mean that the result isn't a derivative work of one of the inputs. But this can only happen if one of the input works was a derivative to begin with. Mere aggregation must mean as opposed to creative combination. Think about a tar/gzip. Bits of each work are mixed into the other as the subsequent work has elements in common to the previous work compressed out. This is just as much mixing as a linker does, perhaps arguably more. The key is that no creativity is used, and thus no *new* work (and a derivative work is a new work) is created. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: SLUB performance regression vs SLAB
> On 10/04/2007 07:39 PM, David Schwartz wrote: > > But this is just a preposterous position to put him in. If there's no > > reproduceable test case, then why should he care that one > > program he can't > > even see works badly? If you care, you fix it. > People have been trying for years to make reproducible test cases > for huge and complex workloads. It doesn't work. The tests that do > work take weeks to run and need to be carefully validated before > they can be officially released. The open source community can and > should be working on similar tests, but they will never be simple. That's true, but irrelevent. Either the test can identify a problem that applies generally, or it's doing nothing but measuring how good the system is at doing the test. If the former, it should be possible to create a simple test case once you know from the complex test where the problem is. If the latter, who cares about a supposed regression? It should be possible to identify exactly what portion of the test shows the regression the most and exactly what the system is doing during that moment. The test may be great at finding regressions, but once it finds them, they should be forever *found*. Did you follow the recent incident when iperf fout what seemed to be a significnat CFS networking regression? The only way to identify that it was a quirk in what iperf was doing was by looking at exactly what iperf was doing. The only efficient way was to look at iperf's source and see that iperf's weird yielding meant it didn't replicate typical use cases like it was supposed to. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: SLUB performance regression vs SLAB
David Miller wrote: > Using an unpublishable benchmark, whose results even cannot be > published, really stretches the limits of "reasonable" don't you > think? > > This "SLUB isn't ready yet" bullshit is just a shamans dance which > distracts attention away from the real problem, which is that a > reproducable, publishable test case, is not being provided to the > developer so he can work on fixing the problem. > > I can tell you this thing would be fixed overnight if a proper test > case had been provided by now. I would just like to echo what you said just a bit angrier. This is the same as someone asking him to fix a bug that they can only see with a binary-only kernel module. I think he's perfectly justified in simply responding "the bug is as likely to be in your code as mine". Now, just because he's justified in doing that doesn't mean he should. I presume he has an honest desire to improve his own code and if they've found a real problem, I'm sure he'd love to fix it. But this is just a preposterous position to put him in. If there's no reproduceable test case, then why should he care that one program he can't even see works badly? If you care, you fix it. Matthew Wilcox wrote: > Yet here we stand. Christoph is aggressively trying to get slab removed > from the tree. There is a testcase which shows slub performing worse > than slab. It's not my fault I can't publish it. And just because I > can't publish it doesn't mean it doesn't exist. It means it may or may not exist. All we have is your word that slub is the problem. If I said I found a bug in the Linux kernel that caused it to panic but I could only reproduce it with the nVidia driver, I'd be laughed at. It may even be that slub is better, your benchmark simply interprets this as worse. Without the details of your benchmark, we can't know. For example, I've seen benchmarks that (usually unintentionally) actually do a *variable* amount of work and details of the implementation may result in the benchmark actually doing *more* work, so it taking longer does not mean it ran slower. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: SLUB performance regression vs SLAB
David Miller wrote: Using an unpublishable benchmark, whose results even cannot be published, really stretches the limits of reasonable don't you think? This SLUB isn't ready yet bullshit is just a shamans dance which distracts attention away from the real problem, which is that a reproducable, publishable test case, is not being provided to the developer so he can work on fixing the problem. I can tell you this thing would be fixed overnight if a proper test case had been provided by now. I would just like to echo what you said just a bit angrier. This is the same as someone asking him to fix a bug that they can only see with a binary-only kernel module. I think he's perfectly justified in simply responding the bug is as likely to be in your code as mine. Now, just because he's justified in doing that doesn't mean he should. I presume he has an honest desire to improve his own code and if they've found a real problem, I'm sure he'd love to fix it. But this is just a preposterous position to put him in. If there's no reproduceable test case, then why should he care that one program he can't even see works badly? If you care, you fix it. Matthew Wilcox wrote: Yet here we stand. Christoph is aggressively trying to get slab removed from the tree. There is a testcase which shows slub performing worse than slab. It's not my fault I can't publish it. And just because I can't publish it doesn't mean it doesn't exist. It means it may or may not exist. All we have is your word that slub is the problem. If I said I found a bug in the Linux kernel that caused it to panic but I could only reproduce it with the nVidia driver, I'd be laughed at. It may even be that slub is better, your benchmark simply interprets this as worse. Without the details of your benchmark, we can't know. For example, I've seen benchmarks that (usually unintentionally) actually do a *variable* amount of work and details of the implementation may result in the benchmark actually doing *more* work, so it taking longer does not mean it ran slower. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: SLUB performance regression vs SLAB
On 10/04/2007 07:39 PM, David Schwartz wrote: But this is just a preposterous position to put him in. If there's no reproduceable test case, then why should he care that one program he can't even see works badly? If you care, you fix it. People have been trying for years to make reproducible test cases for huge and complex workloads. It doesn't work. The tests that do work take weeks to run and need to be carefully validated before they can be officially released. The open source community can and should be working on similar tests, but they will never be simple. That's true, but irrelevent. Either the test can identify a problem that applies generally, or it's doing nothing but measuring how good the system is at doing the test. If the former, it should be possible to create a simple test case once you know from the complex test where the problem is. If the latter, who cares about a supposed regression? It should be possible to identify exactly what portion of the test shows the regression the most and exactly what the system is doing during that moment. The test may be great at finding regressions, but once it finds them, they should be forever *found*. Did you follow the recent incident when iperf fout what seemed to be a significnat CFS networking regression? The only way to identify that it was a quirk in what iperf was doing was by looking at exactly what iperf was doing. The only efficient way was to look at iperf's source and see that iperf's weird yielding meant it didn't replicate typical use cases like it was supposed to. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: get amount of "entropy" in /dev/random ?
> From the userlevel, can I get an estimate of "amount of entropy" > in /dev/random, that is, the estimate of number of bytes > readable until it blocks ? Of course multiple processes > can read bytes and this would not be exact ... but still .. as an upper > boundary estimate ? > > Thanks > Yakov Yes. Look in drivers/char/random.c at the random_ioctl handler. You will see RNDGETENTCNT. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
This is a combined response to Arjan's: > that's also what trylock is for... as well as spinaphores... > (you can argue that futexes should be more intelligent and do > spinaphore stuff etc... and I can buy that, lets improve them in the > kernel by any means. But userspace yield() isn't the answer. A > yield_to() would have been a ton better (which would return immediately > if the thing you want to yield to is running already somethere), a > blind "yield" isn't, since it doesn't say what you want to yield to. And Ingo's: > but i'll attempt to weave the chain of argument one step forward (in the > hope of not distorting your point in any way): _if_ the sched_yield() > call in that memory allocator is done because it uses a locking > primitive that is unfair (hence the memory pool lock can be starved), > then the "guaranteed large latency" is caused by "guaranteed > unfairness". The solution is not to insert a random latency (via a > sched_yield() call) that also has a side-effect of fairness to other > tasks, because this random latency introduces guaranteed unfairness for > this particular task. The correct solution IMO is to make the locking > primitive more fair _without_ random delays, and there are a number of > good techniques for that. (they mostly center around the use of futexes) So now I not only have to come up with an example where sched_yield is the best practical choice, I have to come up with one where sched_yield is the best conceivable choice? Didn't we start out by agreeing these are very rare cases? Why are we designing new APIs for them (Arjan) and why do we care about their performance (Ingo)? These are *rare* cases. It is a waste of time to optimize them. In this case, nobody cares about fairness to the service thread. It is a cleanup task that probably runs every few minutes. It could be delayed for minutes and nobody would care. What they do care about is the impact of the service thread on the threads doing real work. You two challenged me to present any legitimate use case for sched_yield. I see now that was not a legitimate challenge and you two were determined to shoot down any response no matter how reasonable on the grounds that there is some way to do it better, no matter how complex, impractical, or unjustified by the real-world problem. I think if a pthread_mutex had a 'yield to others blocking on this mutex' kind of a 'go to the back of the line' option, that would cover the majority of cases where sched_yield is your best choice currently. Unfortunately, POSIX gave us yield. Note that I think we all agree that any program whose performance relies on quirks of sched_yield (such as the examples that have been cited as CFS 'regressions') are broken horribly. None of the cases I am suggesting use sched_yield as anything more than a minor optimization. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
This is a combined response to Arjan's: that's also what trylock is for... as well as spinaphores... (you can argue that futexes should be more intelligent and do spinaphore stuff etc... and I can buy that, lets improve them in the kernel by any means. But userspace yield() isn't the answer. A yield_to() would have been a ton better (which would return immediately if the thing you want to yield to is running already somethere), a blind yield isn't, since it doesn't say what you want to yield to. And Ingo's: but i'll attempt to weave the chain of argument one step forward (in the hope of not distorting your point in any way): _if_ the sched_yield() call in that memory allocator is done because it uses a locking primitive that is unfair (hence the memory pool lock can be starved), then the guaranteed large latency is caused by guaranteed unfairness. The solution is not to insert a random latency (via a sched_yield() call) that also has a side-effect of fairness to other tasks, because this random latency introduces guaranteed unfairness for this particular task. The correct solution IMO is to make the locking primitive more fair _without_ random delays, and there are a number of good techniques for that. (they mostly center around the use of futexes) So now I not only have to come up with an example where sched_yield is the best practical choice, I have to come up with one where sched_yield is the best conceivable choice? Didn't we start out by agreeing these are very rare cases? Why are we designing new APIs for them (Arjan) and why do we care about their performance (Ingo)? These are *rare* cases. It is a waste of time to optimize them. In this case, nobody cares about fairness to the service thread. It is a cleanup task that probably runs every few minutes. It could be delayed for minutes and nobody would care. What they do care about is the impact of the service thread on the threads doing real work. You two challenged me to present any legitimate use case for sched_yield. I see now that was not a legitimate challenge and you two were determined to shoot down any response no matter how reasonable on the grounds that there is some way to do it better, no matter how complex, impractical, or unjustified by the real-world problem. I think if a pthread_mutex had a 'yield to others blocking on this mutex' kind of a 'go to the back of the line' option, that would cover the majority of cases where sched_yield is your best choice currently. Unfortunately, POSIX gave us yield. Note that I think we all agree that any program whose performance relies on quirks of sched_yield (such as the examples that have been cited as CFS 'regressions') are broken horribly. None of the cases I am suggesting use sched_yield as anything more than a minor optimization. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: get amount of entropy in /dev/random ?
From the userlevel, can I get an estimate of amount of entropy in /dev/random, that is, the estimate of number of bytes readable until it blocks ? Of course multiple processes can read bytes and this would not be exact ... but still .. as an upper boundary estimate ? Thanks Yakov Yes. Look in drivers/char/random.c at the random_ioctl handler. You will see RNDGETENTCNT. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
> yielding IS blocking. Just with indeterminate fuzzyness added to it Yielding is sort of blocking, but the difference is that yielding will not idle the CPU while blocking might. Yielding is sometimes preferable to blocking in a case where the thread knows it can make forward progress even if it doesn't get the resource. (As in the examples I explained.) DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
Arjan van de Ven wrote: > > It can occasionally be an optimization. You may have a case where you > > can do something very efficiently if a lock is not held, but you > > cannot afford to wait for the lock to be released. So you check the > > lock, if it's held, you yield and then check again. If that fails, > > you do it the less optimal way (for example, dispatching it to a > > thread that *can* afford to wait). > at this point it's "use a futex" instead; once you're doing system > calls you might as well use the right one for what you're trying to > achieve. There are two answers to this. One is that you sometimes are writing POSIX code and Linux-specific optimizations don't change the fact that you still need a portable implementation. The other answer is that futexes don't change anything in this case. In fact, in the last time I hit this, the lock was a futex on Linux. Nevertheless, that doesn't change the basic issue. The lock is locked, you cannot afford to wait for it, but not getting the lock is expensive. The solution is to yield and check the lock again. If it's still held, you dispatch to another thread, but many times, yielding can avoid that. A futex doesn't change the fact that sometimes you can't afford to block on a lock but nevertheless would save significant effort if you were able to acquire it. Odds are the thread that holds it is about to release it anyway. That is, you need something in-between "non-blocking trylock, fail easily" and "blocking lock, do not fail", but you'd rather make forward progress without the lock than actually block/sleep. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
> These are generic statements, but i'm _really_ interested in the > specifics. Real, specific code that i can look at. The typical Linux > distro consists of in execess of 500 millions of lines of code, in tens > of thousands of apps, so there really must be some good, valid and > "right" use of sched_yield() somewhere in there, in some mainstream app, > right? (because, as you might have guessed it, in the past decade of > sched_yield() existence i _have_ seen my share of sched_yield() > utilizing user-space code, and at the moment i'm not really impressed by > those examples.) Maybe, maybe not. Even if so, it would be very difficult to find. Simply grepping for sched_yield is not going to help because determining whether a given use of sched_yield is smart is not going to be easy. > (user-space spinlocks are broken beyond words for anything but perhaps > SCHED_FIFO tasks.) User-space spinlocks are broken so spinlocks can only be implemented in kernel-space? Even if you use the kernel to schedule/unschedule the tasks, you still have to spin in user-space. > > One example I know of is a defragmenter for a multi-threaded memory > > allocator, and it has to lock whole pools. When it releases these > > locks, it calls yield before re-acquiring them to go back to work. The > > idea is to "go to the back of the line" if any threads are blocking on > > those mutexes. > at a quick glance this seems broken too - but if you show the specific > code i might be able to point out the breakage in detail. (One > underlying problem here appears to be fairness: a quick unlock/lock > sequence may starve out other threads. yield wont solve that fundamental > problem either, and it will introduce random latencies into apps using > this memory allocator.) You are assuming that random latencies are necessarily bad. Random latencies may be significantly better than predictable high latency. > > Can you explain what the current sched_yield behavior *is* for CFS and > > what the tunable does to change it? > sure. (and i described that flag on lkml before) The sched_yield flag > does two things: > - if 0 ("opportunistic mode"), then the task will reschedule to any >other task that is in "bigger need for CPU time" than the currently >running task, as indicated by CFS's ->wait_runtime metric. (or as >indicated by the similar ->vruntime metric in sched-devel.git) > > - if 1 ("agressive mode"), then the task will be one-time requeued to >the right end of the CFS rbtree. This means that for one instance, >all other tasks will run before this task will run again - after that >this task's natural ordering within the rbtree is restored. Thank you. Unfortunately, neither of these does what sched_yiled is really supposed to do. Opportunistic mode does too little and agressive mode does too much. > > The desired behavior is for the current thread to not be rescheduled > > until every thread at the same static priority as this thread has had > > a chance to be scheduled. > do you realize that this "desired behavior" you just described is not > achieved by the old scheduler, and that this random behavior _is_ the > main problem here? If yield was well-specified then we could implement > it in a well-specified way - even if the API was poor. > But fact is that it is _not_ well-specified, and apps grew upon a random > scheduler implementation details in random ways. (in the lkml discussion > about this topic, Linus offered a pretty sane theoretical definition for > yield but it's not simple to implement [and no scheduler implements it > at the moment] - nor will it map to the old scheduler's yield behavior > so we'll end up breaking more apps.) I don't have a problem with failing to emulate the old scheduler's behavior if we can show that the new behavior has saner semantics. Unfortunately, in this case, I think CFS' semantics are pretty bad. Neither of these is what sched_yield is supposed to do. Note that I'm not saying this is a particularly big deal. And I'm not calling CFS' behavior a regression, since it's not really better or worse than the old behavior, simply different. I'm not familiar enough with CFS' internals to help much on the implementation, but there may be some simple compromise yield that might work well enough. How about simply acting as if the task used up its timeslice and scheduling the next one? (Possibly with a slight reduction in penalty or reward for not really using all the time, if possible?) DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
> * Jarek Poplawski <[EMAIL PROTECTED]> wrote: > > > BTW, it looks like risky to criticise sched_yield too much: some > > people can misinterpret such discussions and stop using this at all, > > even where it's right. > Really, i have never seen a _single_ mainstream app where the use of > sched_yield() was the right choice. It can occasionally be an optimization. You may have a case where you can do something very efficiently if a lock is not held, but you cannot afford to wait for the lock to be released. So you check the lock, if it's held, you yield and then check again. If that fails, you do it the less optimal way (for example, dispatching it to a thread that *can* afford to wait). It is also sometimes used in the implementation of spinlock-type primitives. After spinning fails, yielding is tried. I think it's also sometimes appropriate when a thread may monopolize a mutex. For example, consider a rarely-run task that cleans up some expensive structures. It may need to hold locks that are only held during this complex clean up. One example I know of is a defragmenter for a multi-threaded memory allocator, and it has to lock whole pools. When it releases these locks, it calls yield before re-acquiring them to go back to work. The idea is to "go to the back of the line" if any threads are blocking on those mutexes. There are certainly other ways to do these things, but I have seen cases where, IMO, yielding was the best solution. Doing nothing would have been okay too. > Fortunately, the sched_yield() API is already one of the most rarely > used scheduler functionalities, so it does not really matter. [ In my > experience a Linux scheduler is stabilizing pretty well when the > discussion shifts to yield behavior, because that shows that everything > else is pretty much fine ;-) ] Can you explain what the current sched_yield behavior *is* for CFS and what the tunable does to change it? The desired behavior is for the current thread to not be rescheduled until every thread at the same static priority as this thread has had a chance to be scheduled. Of course, it's not clear exactly what a "chance" is. The semantics with respect to threads at other static priority levels is not clear. Ditto for SMP issues. It's also not clear whether threads that yield should be rewarded or punished for doing so. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
* Jarek Poplawski [EMAIL PROTECTED] wrote: BTW, it looks like risky to criticise sched_yield too much: some people can misinterpret such discussions and stop using this at all, even where it's right. Really, i have never seen a _single_ mainstream app where the use of sched_yield() was the right choice. It can occasionally be an optimization. You may have a case where you can do something very efficiently if a lock is not held, but you cannot afford to wait for the lock to be released. So you check the lock, if it's held, you yield and then check again. If that fails, you do it the less optimal way (for example, dispatching it to a thread that *can* afford to wait). It is also sometimes used in the implementation of spinlock-type primitives. After spinning fails, yielding is tried. I think it's also sometimes appropriate when a thread may monopolize a mutex. For example, consider a rarely-run task that cleans up some expensive structures. It may need to hold locks that are only held during this complex clean up. One example I know of is a defragmenter for a multi-threaded memory allocator, and it has to lock whole pools. When it releases these locks, it calls yield before re-acquiring them to go back to work. The idea is to go to the back of the line if any threads are blocking on those mutexes. There are certainly other ways to do these things, but I have seen cases where, IMO, yielding was the best solution. Doing nothing would have been okay too. Fortunately, the sched_yield() API is already one of the most rarely used scheduler functionalities, so it does not really matter. [ In my experience a Linux scheduler is stabilizing pretty well when the discussion shifts to yield behavior, because that shows that everything else is pretty much fine ;-) ] Can you explain what the current sched_yield behavior *is* for CFS and what the tunable does to change it? The desired behavior is for the current thread to not be rescheduled until every thread at the same static priority as this thread has had a chance to be scheduled. Of course, it's not clear exactly what a chance is. The semantics with respect to threads at other static priority levels is not clear. Ditto for SMP issues. It's also not clear whether threads that yield should be rewarded or punished for doing so. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
These are generic statements, but i'm _really_ interested in the specifics. Real, specific code that i can look at. The typical Linux distro consists of in execess of 500 millions of lines of code, in tens of thousands of apps, so there really must be some good, valid and right use of sched_yield() somewhere in there, in some mainstream app, right? (because, as you might have guessed it, in the past decade of sched_yield() existence i _have_ seen my share of sched_yield() utilizing user-space code, and at the moment i'm not really impressed by those examples.) Maybe, maybe not. Even if so, it would be very difficult to find. Simply grepping for sched_yield is not going to help because determining whether a given use of sched_yield is smart is not going to be easy. (user-space spinlocks are broken beyond words for anything but perhaps SCHED_FIFO tasks.) User-space spinlocks are broken so spinlocks can only be implemented in kernel-space? Even if you use the kernel to schedule/unschedule the tasks, you still have to spin in user-space. One example I know of is a defragmenter for a multi-threaded memory allocator, and it has to lock whole pools. When it releases these locks, it calls yield before re-acquiring them to go back to work. The idea is to go to the back of the line if any threads are blocking on those mutexes. at a quick glance this seems broken too - but if you show the specific code i might be able to point out the breakage in detail. (One underlying problem here appears to be fairness: a quick unlock/lock sequence may starve out other threads. yield wont solve that fundamental problem either, and it will introduce random latencies into apps using this memory allocator.) You are assuming that random latencies are necessarily bad. Random latencies may be significantly better than predictable high latency. Can you explain what the current sched_yield behavior *is* for CFS and what the tunable does to change it? sure. (and i described that flag on lkml before) The sched_yield flag does two things: - if 0 (opportunistic mode), then the task will reschedule to any other task that is in bigger need for CPU time than the currently running task, as indicated by CFS's -wait_runtime metric. (or as indicated by the similar -vruntime metric in sched-devel.git) - if 1 (agressive mode), then the task will be one-time requeued to the right end of the CFS rbtree. This means that for one instance, all other tasks will run before this task will run again - after that this task's natural ordering within the rbtree is restored. Thank you. Unfortunately, neither of these does what sched_yiled is really supposed to do. Opportunistic mode does too little and agressive mode does too much. The desired behavior is for the current thread to not be rescheduled until every thread at the same static priority as this thread has had a chance to be scheduled. do you realize that this desired behavior you just described is not achieved by the old scheduler, and that this random behavior _is_ the main problem here? If yield was well-specified then we could implement it in a well-specified way - even if the API was poor. But fact is that it is _not_ well-specified, and apps grew upon a random scheduler implementation details in random ways. (in the lkml discussion about this topic, Linus offered a pretty sane theoretical definition for yield but it's not simple to implement [and no scheduler implements it at the moment] - nor will it map to the old scheduler's yield behavior so we'll end up breaking more apps.) I don't have a problem with failing to emulate the old scheduler's behavior if we can show that the new behavior has saner semantics. Unfortunately, in this case, I think CFS' semantics are pretty bad. Neither of these is what sched_yield is supposed to do. Note that I'm not saying this is a particularly big deal. And I'm not calling CFS' behavior a regression, since it's not really better or worse than the old behavior, simply different. I'm not familiar enough with CFS' internals to help much on the implementation, but there may be some simple compromise yield that might work well enough. How about simply acting as if the task used up its timeslice and scheduling the next one? (Possibly with a slight reduction in penalty or reward for not really using all the time, if possible?) DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
Arjan van de Ven wrote: It can occasionally be an optimization. You may have a case where you can do something very efficiently if a lock is not held, but you cannot afford to wait for the lock to be released. So you check the lock, if it's held, you yield and then check again. If that fails, you do it the less optimal way (for example, dispatching it to a thread that *can* afford to wait). at this point it's use a futex instead; once you're doing system calls you might as well use the right one for what you're trying to achieve. There are two answers to this. One is that you sometimes are writing POSIX code and Linux-specific optimizations don't change the fact that you still need a portable implementation. The other answer is that futexes don't change anything in this case. In fact, in the last time I hit this, the lock was a futex on Linux. Nevertheless, that doesn't change the basic issue. The lock is locked, you cannot afford to wait for it, but not getting the lock is expensive. The solution is to yield and check the lock again. If it's still held, you dispatch to another thread, but many times, yielding can avoid that. A futex doesn't change the fact that sometimes you can't afford to block on a lock but nevertheless would save significant effort if you were able to acquire it. Odds are the thread that holds it is about to release it anyway. That is, you need something in-between non-blocking trylock, fail easily and blocking lock, do not fail, but you'd rather make forward progress without the lock than actually block/sleep. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
yielding IS blocking. Just with indeterminate fuzzyness added to it Yielding is sort of blocking, but the difference is that yielding will not idle the CPU while blocking might. Yielding is sometimes preferable to blocking in a case where the thread knows it can make forward progress even if it doesn't get the resource. (As in the examples I explained.) DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
> > I think the real fix would be for iperf to use blocking network IO > > though, or maybe to use a POSIX mutex or POSIX semaphores. > > So it's definitely not a bug in the kernel, only in iperf? Martin: Actually, in this case I think iperf is doing the right thing (though not the best thing) and the kernel is doing the wrong thing. It's calling 'sched_yield' to ensure that every other thread gets a chance to run before the current thread runs again. CFS is not doing that, allowing the yielding thread to hog the CPU to the exclusion of the other threads. (It can allow the yielding thread to hog the CPU, of course, just not to the exclusion of other threads.) It's still better to use some kind of rational synchronization primitive (like mutex/sempahore) so that the other threads can tell you when there's something for you to do. It's still better to use blocking network IO, so the kernel will let you know exactly when to try I/O and your dynamic priority can rise. Ingo: Can you clarify what CFS' current default sched_yield implementation is and what setting sched_compat_yield to 1 does? Which way do we get the right semantics (all threads of equal static priority are scheduled, with some possible SMP fuzziness, before this thread is scheduled again)? DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Network slowdown due to CFS
I think the real fix would be for iperf to use blocking network IO though, or maybe to use a POSIX mutex or POSIX semaphores. So it's definitely not a bug in the kernel, only in iperf? Martin: Actually, in this case I think iperf is doing the right thing (though not the best thing) and the kernel is doing the wrong thing. It's calling 'sched_yield' to ensure that every other thread gets a chance to run before the current thread runs again. CFS is not doing that, allowing the yielding thread to hog the CPU to the exclusion of the other threads. (It can allow the yielding thread to hog the CPU, of course, just not to the exclusion of other threads.) It's still better to use some kind of rational synchronization primitive (like mutex/sempahore) so that the other threads can tell you when there's something for you to do. It's still better to use blocking network IO, so the kernel will let you know exactly when to try I/O and your dynamic priority can rise. Ingo: Can you clarify what CFS' current default sched_yield implementation is and what setting sched_compat_yield to 1 does? Which way do we get the right semantics (all threads of equal static priority are scheduled, with some possible SMP fuzziness, before this thread is scheduled again)? DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
Ack, sorry, I'm wrong. Please ignore me, if you weren't already. I'm glad to hear this will be fixed. The task should be moved last for its priority level. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
Chris Friesen wrote: > > The yielding task has given up the cpu. The other task should get to > > run for a timeslice (or whatever the equivalent is in CFS) until the > > yielding task again "becomes head of the thread list". > Are you sure this isn't happening? I'll run some tests on my SMP > system running CFS. But I'll bet the context switch rate is quite rapid. Yep, that's exactly what's happening. The tasks are alternating. They are both always ready-to-run. The yielding task is put at the end of the queue for its priority level. There is no reason the yielding task should get less CPU since they're both always ready-to-run. The only downside here is that a yielding task results in very small timeslices which causes cache inefficiencies. A sane lower bound on the timeslice might be a good idea. But there is no semantic problem. DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
> David Schwartz wrote: > > Nonsense. The task is always ready-to-run. There is no reason > > its CPU should > > be low. This bug report is based on a misunderstanding of what yielding > > means. > The yielding task has given up the cpu. The other task should get to > run for a timeslice (or whatever the equivalent is in CFS) until the > yielding task again "becomes head of the thread list". Are you sure this isn't happening? I'll run some tests on my SMP system running CFS. But I'll bet the context switch rate is quite rapid. Honestly, I can't imagine what else could be happening here. Does CFS spin in a loop doing nothing when you call sched_yield even though another task is ready-to-run? That seems kind of bizarre. Is sched_yield acting as a no-op? DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
> The CFS scheduler does not seem to implement sched_yield correctly. If one > program loops with a sched_yield and another program prints out timing > information in a loop. You will see that if both are taskset to > the same core > that the timing stats will be twice as long as when they are on > different cores. > This problem was not in 2.6.21-1.3194 but showed up in > 2.6.22.4-65 and continues > in the newest released kernel 2.6.22.5-76. I disagree with the bug report. > You will see that both tasks use 50% of the CPU. > Then kill task2 and run: > "taskset -c 1 ./task2" This seems right. They're both always ready to run. They're at the same priority. Neither ever blocks. There is no reason one should get more CPU than the other. > Now task2 will run twice as fast verifying that it is not some > anomaly with the > way top calculates CPU usage with sched_yield. > > Actual results: > Tasks with sched_yield do not yield like they are suppose to. Umm, how does he get that? It's yielding at blinding speed. > Expected results: > The sched_yield task's CPU usage should go to near 0% when > another task is on > the same CPU. Nonsense. The task is always ready-to-run. There is no reason its CPU should be low. This bug report is based on a misunderstanding of what yielding means. The Linux page says: "A process can relinquish the processor voluntarily without blocking by calling sched_yield(). The process will then be moved to the end of the queue for its static priority and a new process gets to run." Notice the "without blocking" part? POSIX says: "The sched_yield() function forces the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments." CFS is perfectly complying with both of these. This bug report is a great example of how sched_yield can be misunderstood and misused. You can even argue that the sched_yield process should get even more CPU, since it's voluntarily relinquishing (which should be rewarded) rather than infinitely spinning (which should be punished). (Not that I agree with this argument, I'm just using it to counter-balance the other argument.) DS - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
The CFS scheduler does not seem to implement sched_yield correctly. If one program loops with a sched_yield and another program prints out timing information in a loop. You will see that if both are taskset to the same core that the timing stats will be twice as long as when they are on different cores. This problem was not in 2.6.21-1.3194 but showed up in 2.6.22.4-65 and continues in the newest released kernel 2.6.22.5-76. I disagree with the bug report. You will see that both tasks use 50% of the CPU. Then kill task2 and run: taskset -c 1 ./task2 This seems right. They're both always ready to run. They're at the same priority. Neither ever blocks. There is no reason one should get more CPU than the other. Now task2 will run twice as fast verifying that it is not some anomaly with the way top calculates CPU usage with sched_yield. Actual results: Tasks with sched_yield do not yield like they are suppose to. Umm, how does he get that? It's yielding at blinding speed. Expected results: The sched_yield task's CPU usage should go to near 0% when another task is on the same CPU. Nonsense. The task is always ready-to-run. There is no reason its CPU should be low. This bug report is based on a misunderstanding of what yielding means. The Linux page says: A process can relinquish the processor voluntarily without blocking by calling sched_yield(). The process will then be moved to the end of the queue for its static priority and a new process gets to run. Notice the without blocking part? POSIX says: The sched_yield() function forces the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments. CFS is perfectly complying with both of these. This bug report is a great example of how sched_yield can be misunderstood and misused. You can even argue that the sched_yield process should get even more CPU, since it's voluntarily relinquishing (which should be rewarded) rather than infinitely spinning (which should be punished). (Not that I agree with this argument, I'm just using it to counter-balance the other argument.) DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
David Schwartz wrote: Nonsense. The task is always ready-to-run. There is no reason its CPU should be low. This bug report is based on a misunderstanding of what yielding means. The yielding task has given up the cpu. The other task should get to run for a timeslice (or whatever the equivalent is in CFS) until the yielding task again becomes head of the thread list. Are you sure this isn't happening? I'll run some tests on my SMP system running CFS. But I'll bet the context switch rate is quite rapid. Honestly, I can't imagine what else could be happening here. Does CFS spin in a loop doing nothing when you call sched_yield even though another task is ready-to-run? That seems kind of bizarre. Is sched_yield acting as a no-op? DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
Chris Friesen wrote: The yielding task has given up the cpu. The other task should get to run for a timeslice (or whatever the equivalent is in CFS) until the yielding task again becomes head of the thread list. Are you sure this isn't happening? I'll run some tests on my SMP system running CFS. But I'll bet the context switch rate is quite rapid. Yep, that's exactly what's happening. The tasks are alternating. They are both always ready-to-run. The yielding task is put at the end of the queue for its priority level. There is no reason the yielding task should get less CPU since they're both always ready-to-run. The only downside here is that a yielding task results in very small timeslices which causes cache inefficiencies. A sane lower bound on the timeslice might be a good idea. But there is no semantic problem. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: CFS: some bad numbers with Java/database threading [FIXED]
Ack, sorry, I'm wrong. Please ignore me, if you weren't already. I'm glad to hear this will be fixed. The task should be moved last for its priority level. DS - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: Wasting our Freedom
> "David Schwartz" <[EMAIL PROTECTED]> writes: > > My point is that you *cannot* prevent a recipient of a > > derivative work from > > receiving any rights under either the GPL or the BSD to any protectable > > elements in that work. > > Of course you can. No you can't. > What rights do you have to BSD-licenced works, made available > (under BSD) to MS exclusively? You only get the binary object... You are equating what rights I have with my ability to exercise those rights. They are not the same thing. For example, I once bought the rights to publically display the movie "Monty Python and the Holy Grail". To my surprise, the rights to public display did not include an actual copy of the film. In any event, I never claimed that anyone has rights to a protectable element that they do not possess a lawful copy of. That's a complete separate issue and one that has nothing to do with what's being discussed here because these are all cases where you have the work. > You know, this is quite common practice - instead of assigning > copyright, you can grant a BSD-style licence (for some fee, > something like "do what you want but I will do what I want with > my code"). Sure, *you* can grant a BSD-style license to any protectable elements *you* authored. But unless your recpients can obtain a BSD-style license to all protectable elements in the work from their respective authors, they cannot modify or distribute it. *You* cannot grant any rights to protectable elements authored by someone else, unless you have a relicensing agreement. Neither the GPL nor the BSD is one of those. > >> If A sold a BSD licence to B only and this B sold a proprietary > >> licence (for a derived work) to C, C (without that clause) wouldn't > >> have a BSD licence to the original work. This is BTW common scenario. > > > > C most certainly would have a BSD license, should he choose to > > comply with > > terms, to every protectable element that is in both the > > original work and > > the work he received. > But he may have received only binary program image - or the source > under NDA. > Sure, NDA doesn't cover public information, but BSD doesn't mean public. > Now what? What the hell does that have to do with anything? Are you just trying to be deliberately dense or waste time? Is it not totally obvious how the principles I explain apply to a case like that? Only someone who signs an NDA must comply with it. If you signed an NDA, you must comply with it. An NDA can definitely subtract rights. It's a complex question whether an NDA can subtract GPL rights, but again, that has nothing to do with what we're talking about here. Sure, you can have the right from me to do X and still not be allowed to do X because you agreed with someone else not to do it. So what? > > C has no right to license any protectable element he did not author to > > anyone else. He cannot set the license terms for those elements to C. > Sure, the licence covers the >>>entire work<<<, not some "elements". This is a misleading statement. The phrase "entire work" has two senses. The license definitely does not cover the "entire work" in the sense of every protectable element in the work unless each individual author of those elements chose to offer that element under that license. If by "entire work", you mean any compilation or derivative work copyright the "final" author has, then yes, that's available under whatever license the "final" author places it under. But that license does not actually permit you to distribute the work. This is really complicated and I wish I had a clear way to explain it. Suppose I write a work and then you modify it. Assume your modification includes adding new protectable elements to that work. When someone distributes that new derivative work, they are distributing protectable elements authored by both you and me. Absent a relicensing agreement, they must obtain some rights from you and some rights from me to do that. You cannot license the protectable elements that I authored that are still in the resulting derivative work. > > Neither the BSD nor the GPL ever give you the right to change the actual > > license a work is offered under by the original author. > > Of course, that's a very distant thing. Exactly. Every protectable element in the final work is licensed by the original author to every recipient who takes advantage of the license offer. > >> BTW: a work by multiple authors is a different thing than a work > >> derived from another. > > > > In practice it doesn't matter. > > Of course it does. Only author of a (derived) work can licence > it, in this case he/she could change the licence
RE: Wasting our Freedom
Kryzstof Halasa writes: > "David Schwartz" <[EMAIL PROTECTED]> writes: > > > Theodore Tso writes: > > hardly A apologize for the error in attribution. > > Of course you don't need a license to *use* the derived work. > > You never need > > a license to use a work. (In the United States. Some countries > > word this a > > bit differently but get the same effect.) > Really? I thought you need a licence to use, say, MS Windows. > Even to possess a copy. But I don't know about USA, I'm told > there are strange things happening there :-) No, you do not need a license to use MS Windows. Microsoft may choose to compel you to agree to a license in exchange for allowing you to install a copy, but that is not quite the same thing. If you read United States copyright law, you will see that *use* is not one of the rights reserved to the copyright holder. Every lawful possessor of a work may use it in the ordinary way, assuming they did not *agree* to some kind of restriction. > > If, however, you wanted to get the right to modify or distribute a > > derivative work, you would need to obtain the rights to every > > protectable > > element in that work. > Of course. > > Read GPL section 6, particularly this part: "Each time you > > redistribute the > > Program (or any work based on the Program), the recipient automatically > > receives a license from the original licensor to copy, > > distribute or modify > > the Program subject to > > these terms and conditions." > Seems fine, your point? My point is that you *cannot* prevent a recipient of a derivative work from receiving any rights under either the GPL or the BSD to any protectable elements in that work. > In addition to the rights from you (to the whole derived work), > the recipient receives rights to the original work, from original > author. > It makes perfect sense, making sure the original author can't sue > you like in the SCO case. > > If A sold a BSD licence to B only and this B sold a proprietary > licence (for a derived work) to C, C (without that clause) wouldn't > have a BSD licence to the original work. This is BTW common scenario. C most certainly would have a BSD license, should he choose to comply with terms, to every protectable element that is in both the original work and the work he received. C has no right to license any protectable element he did not author to anyone else. He cannot set the license terms for those elements to C. Again, read GPL section 6. (And this is true for the BSD license as well, at least in the United States, because it's the only way such a license could work.) Neither the BSD nor the GPL ever give you the right to change the actual license a work is offered under by the original author. In fact, they could not give you this right under US copyright law. Modify the license *text* is not the same thing as modifying the license. > > To distribute a derivative work that contains protectable elements from > > multiple authors, you are distributing all of those elements > > and need the > > rights to all of them. You need a license to each element and > > in the absence > > of any relicensing arrangements (which the GPL and BSD license are not), > > only the original author can grant that to you. > Of course. > > BTW: a work by multiple authors is a different thing than a work > derived from another. In practice it doesn't matter. All that matters is that you have a single fixed form or expression that contains creative elements contributed by different people potentially under different licenses. The issues of whether it's a derivative work or a combined work and whether the distributor has made sufficient protectable elements to assert their own copy really has no effect on any of the issues that matter here. > > It is a common confusion that just because the final author has > > copyright in > > the derivative work, that means he can license the work. > Of course he (and only he) can. It doesn't mean the end users can't > receive additional rights. No, he can't. He can only license those protectable elements that he authored. There is no way you can license protectable elements authored by another absent a relicenseing agreement. The GPL is explicitly not a relicensing agreement, see section 6. The BSD license is implicitly not a relicensing agreement. > Come on, licence = promise not to sue. Why would the copyright > holder be unable to promise not to sue? It just doesn't make sense. A license is not just a promise not to sue, it's an *enforceable* *committment* not to sue. It's an explicit grant of permission against legal rights. Would you argue that I can license Disney's "The Lion King" movie to you if I promise not t