chmod.c undefined behavior
I noticed that chmod.c have uninitialized variable char *ep that was used. This diff clarify what I mean. Index: chmod.c === RCS file: /OpenBSD/src/bin/chmod/chmod.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 chmod.c --- chmod.c 21 May 2014 06:23:01 - 1.30 +++ chmod.c 24 Sep 2014 08:47:58 - @@ -65,7 +65,7 @@ main(int argc, char *argv[]) uid_t uid; gid_t gid; u_int32_t fclear, fset; - char *ep, *mode, *cp, *flags; + char *mode, *cp, *flags; setlocale(LC_ALL, ); @@ -148,13 +148,11 @@ done: flags = *argv; if (*flags = '0' *flags = '7') { errno = 0; - val = strtoul(flags, ep, 8); + val = strtoul(flags, (char *)NULL, 8); if (val UINT_MAX) errno = ERANGE; if (errno) err(1, invalid flags: %s, flags); - if (*ep) - errx(1, invalid flags: %s, flags); fset = val; oct = 1; } else { @@ -167,13 +165,11 @@ done: mode = *argv; if (*mode = '0' *mode = '7') { errno = 0; - val = strtol(mode, ep, 8); + val = strtol(mode, (char *)NULL, 8); if (val INT_MAX || val 0) errno = ERANGE; if (errno) err(1, invalid file mode: %s, mode); - if (*ep) - errx(1, invalid file mode: %s, mode); omode = val; oct = 1; } else {
Re: chmod.c undefined behavior
I noticed that chmod.c have uninitialized variable char *ep that was used. This diff clarify what I mean. It might be a good idea to take a careful look at the man page of strtoul(3). Pay attention to what it does with errno and endptr. Also, take a look at the example.
Re: chmod.c undefined behavior
Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 12:10:47PM +0300: I noticed that chmod.c have uninitialized variable char *ep that was used. Your analysis is wrong. The variable ep is explicitly initialized by strtoul() before being used in main(). This diff clarify what I mean. Your diff is wrong. It breaks error detection. Try $ chmod 01x testfile without and with your diff. Yours, Ingo Index: chmod.c === RCS file: /OpenBSD/src/bin/chmod/chmod.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 chmod.c --- chmod.c 21 May 2014 06:23:01 - 1.30 +++ chmod.c 24 Sep 2014 08:47:58 - @@ -65,7 +65,7 @@ main(int argc, char *argv[]) uid_t uid; gid_t gid; u_int32_t fclear, fset; - char *ep, *mode, *cp, *flags; + char *mode, *cp, *flags; setlocale(LC_ALL, ); @@ -148,13 +148,11 @@ done: flags = *argv; if (*flags = '0' *flags = '7') { errno = 0; - val = strtoul(flags, ep, 8); + val = strtoul(flags, (char *)NULL, 8); if (val UINT_MAX) errno = ERANGE; if (errno) err(1, invalid flags: %s, flags); - if (*ep) - errx(1, invalid flags: %s, flags); fset = val; oct = 1; } else { @@ -167,13 +165,11 @@ done: mode = *argv; if (*mode = '0' *mode = '7') { errno = 0; - val = strtol(mode, ep, 8); + val = strtol(mode, (char *)NULL, 8); if (val INT_MAX || val 0) errno = ERANGE; if (errno) err(1, invalid file mode: %s, mode); - if (*ep) - errx(1, invalid file mode: %s, mode); omode = val; oct = 1; } else {
Re: chmod.c undefined behavior
Now I see it, in the last sentence. I was wrong, it was actually used. I got confused for that monster main function. That is definitely overly complex and that *ep looked like a forgotten. Hmm.. I counted cyclomatic complexity.. and that is 65 in main function :|
Re: chmod.c undefined behavior
I mght be wrong, but why not just initialize to NULL? On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com wrote: I noticed that chmod.c have uninitialized variable char *ep that was used. This diff clarify what I mean. Index: chmod.c === RCS file: /OpenBSD/src/bin/chmod/chmod.c,v retrieving revision 1.30 diff -u -p -u -p -r1.30 chmod.c --- chmod.c 21 May 2014 06:23:01 - 1.30 +++ chmod.c 24 Sep 2014 08:47:58 - @@ -65,7 +65,7 @@ main(int argc, char *argv[]) uid_t uid; gid_t gid; u_int32_t fclear, fset; - char *ep, *mode, *cp, *flags; + char *mode, *cp, *flags; setlocale(LC_ALL, ); @@ -148,13 +148,11 @@ done: flags = *argv; if (*flags = '0' *flags = '7') { errno = 0; - val = strtoul(flags, ep, 8); + val = strtoul(flags, (char *)NULL, 8); if (val UINT_MAX) errno = ERANGE; if (errno) err(1, invalid flags: %s, flags); - if (*ep) - errx(1, invalid flags: %s, flags); fset = val; oct = 1; } else { @@ -167,13 +165,11 @@ done: mode = *argv; if (*mode = '0' *mode = '7') { errno = 0; - val = strtol(mode, ep, 8); + val = strtol(mode, (char *)NULL, 8); if (val INT_MAX || val 0) errno = ERANGE; if (errno) err(1, invalid file mode: %s, mode); - if (*ep) - errx(1, invalid file mode: %s, mode); omode = val; oct = 1; } else {
Re: chmod.c undefined behavior
Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 12:55:14PM +0300: I got confused for that monster main function. That is definitely overly complex I strongly disagree. That main function is good, standard style. Actually, it's good enough to be shown off as an example. A small utility should be written just like that. It has one small getopt(3) loop, decrements argc/argv, has one small codeblock to handle arguments for each variant of the utility, and one concise fts(3) main loop to handle files, and finally a one-liner for error handling. The function is nearly linear, it's very easy to read. Complication definitely looks *way* different. and that *ep looked like a forgotten. Hmm.. I counted cyclomatic complexity.. and that is 65 in main function :| So what's the point? You can't *calculate* whether something is readable for a *human*, and that's what matters. Please forget about such purely theoretical calculations right now. Yours, Ingo
Re: chmod.c undefined behavior
Hi David, David Carlier wrote on Wed, Sep 24, 2014 at 10:19:57AM +0100: On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com wrote: I noticed that chmod.c have uninitialized variable char *ep that was used. This diff clarify what I mean. I might be wrong, You are. but why not just initialize to NULL? That is exactly how one must *not*, under any circumstances, ever react to the findings of an audit. It is very important to not react in such a way. If an audit - no matter whether a manual audit (like here) or tool-driven static analysis - reveals a potential bug or vulnerability, *never* proceed to merely sweep it under the carpet without caring what the code actually does. For example, if your auditing tool reports potential use of uninitialized variable, do *not* just add foo = NULL to the top of the function. Look at what the function does and decide whether it makes the codes safer or more readable to initialize. Adding code to merely shut up an audit is harmful in more than one way. First, it may hide an actually problem. Second, it may create a new problem if you happen to pick the wrong value for initialization. Third, it might in somes cases shift a problem elsewehere where it might be more harmful or harder to detect. I think you have earned yourself a flame, but i'm sorry to say i'm not in the mood right now. :-) In the case at hand, the code is obviously correct as it stands. Just search for ep from the top. The first line you find is the declaration, the second line is the strtoul() explicitly initializing it. Adding a second initialization before would be confusing. Any future auditor would wonder what the point of that obviously useless statement is and would probably waste some time trying to figure out whether it hides some logic oversight of the author of the code, probably ending up *removing* the ep = NULL to improve clarity for future audits. Yours, Ingo [ incorrect patch snipped ]
Re: chmod.c undefined behavior
I mght be wrong, but why not just initialize to NULL? I am watching this thread to spot people who should never be OpenBSD developers...
Re: chmod.c undefined behavior
You make the point indeed I had overreacted, just tried to participate :-) I mght be wrong, but why not just initialize to NULL? I am watching this thread to spot people who should never be OpenBSD developers.. Eheh you re probably right. On 24 September 2014 13:22, Ingo Schwarze schwa...@usta.de wrote: Hi David, David Carlier wrote on Wed, Sep 24, 2014 at 10:19:57AM +0100: On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com wrote: I noticed that chmod.c have uninitialized variable char *ep that was used. This diff clarify what I mean. I might be wrong, You are. but why not just initialize to NULL? That is exactly how one must *not*, under any circumstances, ever react to the findings of an audit. It is very important to not react in such a way. If an audit - no matter whether a manual audit (like here) or tool-driven static analysis - reveals a potential bug or vulnerability, *never* proceed to merely sweep it under the carpet without caring what the code actually does. For example, if your auditing tool reports potential use of uninitialized variable, do *not* just add foo = NULL to the top of the function. Look at what the function does and decide whether it makes the codes safer or more readable to initialize. Adding code to merely shut up an audit is harmful in more than one way. First, it may hide an actually problem. Second, it may create a new problem if you happen to pick the wrong value for initialization. Third, it might in somes cases shift a problem elsewehere where it might be more harmful or harder to detect. I think you have earned yourself a flame, but i'm sorry to say i'm not in the mood right now. :-) In the case at hand, the code is obviously correct as it stands. Just search for ep from the top. The first line you find is the declaration, the second line is the strtoul() explicitly initializing it. Adding a second initialization before would be confusing. Any future auditor would wonder what the point of that obviously useless statement is and would probably waste some time trying to figure out whether it hides some logic oversight of the author of the code, probably ending up *removing* the ep = NULL to improve clarity for future audits. Yours, Ingo [ incorrect patch snipped ]
Re: chmod.c undefined behavior
That main function is good, standard style. Unnecessary goto, variables defined far away from where they are used, monster function, variables are not commented what they do, not all functions are commented what they do.. To me, it looks like that there is no intention to optimize readability and testability. Instead it looks like there is put effort to minimize amount of functions or something. It has one small getopt(3) loop, decrements argc/argv, has one small codeblock to handle arguments for each variant of the utility, and one concise fts(3) main loop to handle files, and finally a one-liner for error handling. It would be better if they all are put to separate functions. That is the whole point in functions, to put task to single program fragment. So what's the point? You can't *calculate* whether something is readable for a *human*, and that's what matters. There are measured human factors, example size of working memory. If there is measured magic constants, it is possible to calculate what is somewhat better or worse. Something to aim for: -Less linearly independent paths in module -High cohesion -Low coupling These are basics. It also matters a lot for testability. Less independent paths means less test cases.
Re: chmod.c undefined behavior
That main function is good, standard style. Unnecessary goto, There is only one goto in chmod.c. If you consider it unnecessary, I'd advise you to read the code again, and pay attention to the comment explaining that particular chunk. (Hint: you can't break from more than one control structure at once in C). variables defined far away from where they are used, That program was written before C99 allowed for variable declarations in the middle of functions. There is currently no interest to move this declarations for the sake of it. monster function, variables are not commented what they do, not all functions are commented what they do.. Does every simple base command need more comments than actual lines of code? To me, it looks like that there is no intention to optimize readability and testability. Instead it looks like there is put effort to minimize amount of functions or something. There is indeed no such attempt. But the code is simple enough to be understandable to anyone with correct knowledge of the C language and brain cells in working condition. It would be better if they all are put to separate functions. That is the whole point in functions, to put task to single program fragment. It would arguably be better, due to the linear structure of this particular command. Also, when this command was written, BSD was running on some architectures where function calls have a significant overhead. Miod
Re: chmod.c undefined behavior
Unnecessary goto Or the most straightforward and obvious way to break out of a switch in a loop. variables defined far away from where they are used, Variables defined predictably at the start of the function, as the convention is in BSD code. Yes, they can be a little far if the function is long. You also always know exactly where to find them. monster function Two hundred lines is hardly long, and size matters less than readability. If a long function reads neatly from top to bottom, you can split it into a few dozen procedures, requiring the reader to jump back and forth to figure out what the code *actually* does. You then also need to shovel data back and forth between these functions. This is how you turn simple and straightforward into a clusterfuck. variables are not commented what they do, variables named well enough If you're familiar with the fts api and know what command line flags are, then half of these variables are immediately obvious. For the rest, the name is a good hint and if you spent a few seconds looking at where and how each variable is used, these are perfectly understandable too. Short names don't get in the way too much. How would you improve these names? It sounds like you're just trying to take snippets of code out of context and declare them bad because you're not reading the rest of the code? Of course you could comment these, but the code is simple enough that such excess verbiage would likely just get in the way. Much in the same manner that splitting the function into a few dozen would only make it harder to see the code for what it is. Mind you, the code isn't written for a first-year comp sci course where the students need annotated twenty-line snippets to help them come to grips with the basic syntax and structure of a computer program. not all functions are commented what they do.. Again trying to please some quality metric? Usage does not need to be documented. And main() is essentially the whole program, which is extensively documented in a man page. It's not called by another function; there is no internal API to document. What comment would you add there? What value would that comment add? To me, it looks like that there is no intention to optimize readability and testability. Instead it looks like there is put effort to minimize amount of functions or something. It looks like you're not actually reading the code, you're just looking at snippets of it, taken out of context. And then you're applying some cargo- cult pseudo-science to make statements about these snippets. Goto is evil! Something to aim for: -Less linearly independent paths in module -High cohesion -Low coupling Did you intend to achieve better cohesion and lower coupling by breaking the function into a dozen small functions that appease whatever metric you're using? Of course, I'm a nobody to say how this code should look, but it looks fine to me; and so far as I can tell, it's not broken. Don't fix it if it's not broken. But if you think it can be improved, I'm sure someone here might take a look at the diff once you send it. But if your comments about the current code are any indication of what your ideal version would look like, I'm not sure there are many who would be willing to commit it. But don't let me discourage you; I'm a nobody here. -Henri
Re: chmod.c undefined behavior
Hi Miod, Miod Vallat wrote on Wed, Sep 24, 2014 at 04:27:26PM +0200: There is only one goto in chmod.c. If you consider it unnecessary, I'd advise you to read the code again, and pay attention to the comment explaining that particular chunk. Heh. Read the code is almost always good advice (except when abused to dismiss reports about defective manuals, maybe :). So let's turn this thread into something useful and tech@-worthy and fix an actual bug. The --optind happens in too few cases. It does not only need to happen when the mode argument stands alone, but also when it merely stands last in a group, in which case getopt() has also moved past it. Right now, in that case, the invalid syntax is silently ignored, and if any valid mode arguments follow, they take effect. For example, right now: $ touch tf $ chmod 755 tf chmod -Pr -w tf ; ls -l tf # pun intended -r-xr-xr-x 1 ischwarze wsrc 0 Sep 24 18:14 tf Oops, -Pr is bad and the -r must not be silently ignored. With my patch: $ chmod 755 tf ./obj/chmod -Pr -w tf ; ls -l tf chmod: invalid file mode: -Pr -rwxr-xr-x 1 ischwarze wsrc 0 Sep 24 18:14 tf OK? Ingo P.S. Regarding the code quality of chmod.c, Miod and Henri have brought up various good points, so i'll refrain from more comments on style for now. Index: chmod.c === RCS file: /cvs/src/bin/chmod/chmod.c,v retrieving revision 1.30 diff -u -p -r1.30 chmod.c --- chmod.c 21 May 2014 06:23:01 - 1.30 +++ chmod.c 24 Sep 2014 16:40:19 - @@ -107,19 +107,23 @@ main(int argc, char *argv[]) hflag = 1; break; /* -* XXX -* -[rwx] are valid mode commands. If they are the entire -* argument, getopt has moved past them, so decrement optind. -* Regardless, we're done argument processing. +* If this is a symbolic mode argument rather than +* an option, we are done with option processing. */ case 'g': case 'o': case 'r': case 's': case 't': case 'u': case 'w': case 'X': case 'x': if (!ischmod) usage(); - if (argv[optind - 1][0] == '-' - argv[optind - 1][1] == ch - argv[optind - 1][2] == '\0') - --optind; + /* +* If getopt() moved past the argument, back up. +* If the argument contains option letters before +* mode letters, setmode() will catch them. +*/ + if (optind 1) { + cp = argv[optind - 1]; + if (cp[strlen(cp) - 1] == ch) + --optind; + } goto done; default: usage();
Re: chmod.c undefined behavior
Or the most straightforward and obvious way to break out of a switch in a loop. True. It is also good way to model exeptions. I take my word back, this may be most elegant way to exit that loop in this case. Variables defined predictably at the start of the function, as the convention is in BSD code. That is good convention. But while the function is so big, it is harder to read. You then also need to shovel data back and forth between these functions. Not necessarily. When the program is this tiny, single file source, it may not so bad thing to use global scope for that shared data. Two hundred lines is hardly long I've used to write less than twenty lines functions. How would you improve these names? Naming is ok me, but I may add comments what they are used for. It sounds like you're just trying to take snippets of code out of context and declare them bad because you're not reading the rest of the code? I don't say that this code is bad. I think it is good gode, but there is plenty of room to improve readability. Much in the same manner that splitting the function into a few dozen would only make it harder to see the code for what it is. And when the lower level details are hidden, it is easier to see what the code does. This is idea of abstraction. Mind you, the code isn't written for a first-year comp sci course where the students need annotated twenty-line snippets to help them come to grips with the basic syntax and structure of a computer program. I like to write self documenting code. My practice is to comment ~all functions with block comments, all parameters (with ranges), return values, exceptions, external effects, all variable declarations (with ranges). And, I prefer to write tests first. Again trying to please some quality metric? Yes, cyclomatic complexity. If that is high, it is usually good sign that code needs refactoring. Did you intend to achieve better cohesion and lower coupling by breaking the function into a dozen small functions that appease whatever metric you're using? No, lower CC index per function instead. This kind of refactoring often reveals opportunities to increase cohesion in more complex programs and helps to reduce the amount of code. Of course, I'm a nobody to say how this code should look, but it looks fine to me; and so far as I can tell, it's not broken. Don't fix it if it's not broken. Yes, it is considerably harder to refactor because there is no tests for that. I think that should be starting point. But if your comments about the current code are any indication of what your ideal version would look like, I'm not sure there are many who would be willing to commit it. I think that it is not defined enough unambiguously, how ideal code looks like. It reduces motivation to improve code better, if it is not defined, what is better. style(9) is good, but it would be better if there is reference component that is optimized to perfection, including tests, naming conventions, file hierarchy etc. Then ALL code changes and new code can be compared to that reference, and that reference code can be used to build templates automated testing.
Re: chmod.c undefined behavior
Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 08:14:27PM +0300: I think that it is not defined enough unambiguously, how ideal code looks like. It reduces motivation to improve code better, if it is not defined, what is better. style(9) is good, but it would be better if there is reference component that is optimized to perfection, including tests, naming conventions, file hierarchy etc. Then ALL code changes and new code can be compared to that reference, and that reference code can be used to build templates automated testing. It's not as easy as that. Not only is the code constantly improved, our standards which idioms are considered good and which are considered dangerous constantly evolve as well, slowly turning new, high quality code into old code of lower quality - even if the code remains completely unchanged. Actually, the quality of code diminishes in time *in particular* when it remains unchanged. As a recent example, imagine that you have an array allocated with calloc(3) and want to change its size. Until quite recently, the following was considered reasonable code: elemsz = ...; nelem = ...; if (SIZE_MAX / nelem elemsz) handle_overflow(); newp = realloc(pointer, nelem * elemsz); if (newp == NULL) handle_oom(); pointer = newp; Now, that is considered a dangerous idiom, to be rewritten as follows: elemsz = ...; nelem = ...; newp = reallocarray(pointer, nelem, elemsz); if (newp == NULL) handle_oom(); pointer = newp; Besides, as i already said at an earlier occasion, it is highly non-trivial to write down a comprehensive set of rules to be followed, it will end up being incomplete, too dogmatic, and outdated all at the same time. The best way to learn is to read code, understand it, find weaknesses, suggest improvements, and when we agree on the improvements, systematically audit the tree for similar problems. Actually, that's not just the way for newbies, it's how the most experienced developers do it, too. And even they, now and then, have to delete suggestions for improvments from their trees that, when reviewed, turn out to have downsides. Yours, Ingo
Re: chmod.c undefined behavior
Bringing that up is the only way to find out what OpenBSD developers do consider poor quality, what they don't, and what is merely considered a matter of style here Thanks for your patience. I feel like I'm querying preferred coding style by this way. I also think that better way to do that is that it would be good thing if there is some reference component in source tree, and rest of the code should be aimed to same quality. In testsuite, style and implementation. I don't think that the goto is problem, even there may be possibility to make clean, readable exit using while comparison. That monster size main function is the issue and reduces readability. mostly a matter of style, not of code quality; I think these are somewhat bonded. Especially in C, because the language is kind of unsecured gun like some Javascript. While the language lacks certain things like classes, exceptions, coroutines etc. this gap should be compensate by improving the coding conventions and checking of those can be automated too. Obviously, you are honestly hunting for bugs. I have about million lines of build and other errors under review where I have found anomalies. Some of them seems to be errors in OpenBSD, most of them are errors in code. It just not work if I point out every single anomaly, bug, missing parameter etc. Example, I think it would be best if I write proper document where I collect ALL known issues related to POSIX conformance that can be reviewed and made to public knowledge.
Re: chmod.c undefined behavior
It's not as easy as that. Not only is the code constantly improved, our standards which idioms are considered good and which are considered dangerous constantly evolve as well, slowly turning new, high quality code into old code of lower quality - even if the code remains completely unchanged. Actually, the quality of code diminishes in time *in particular* when it remains unchanged. I understand. And, the code itself is ultimately the specification. And that reason, why not select single component to be as a reference? And the reference component: This is finest peace of code at the moment, Make rest of the code looks like this, Everything that is diffent to reference is wrong way to do, If some convention just doesn't work, fixed it to reference first The best way to learn is to read code, understand it, find weaknesses, suggest improvements, and when we agree on the improvements, systematically audit the tree for similar problems. Some of the improvements may require enormous amount of effort to apply whole tree, and requires also that when applying changes, have to continuously learn/keep in own knowledge different source components. I don't expect that everyone have interest to all of the components, so it doesn't motivate to hack all components. That must be taken into account Like you say, quality standards are rising all the time, I think it is good to have reference component and accept, that rest of the source tree may have little behind in quality. I have used this practice in my own code. Every time I learn new tool/language/framework etc. I make first component to reference and then apply all practices rest of the components. I've never tried this on larger scale, in open source projects.
Re: chmod.c undefined behavior
Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:14:45PM +0300: Thanks for your patience. I feel like I'm querying preferred coding style by this way. That's not the *goal*, but it's an unavoidable side effect, and the more it succeeds, the easier everuthing gets for both sides. [...] I have about million lines of build and other errors under review where I have found anomalies. Some of them seems to be errors in OpenBSD, most of them are errors in code. It is likely that a sizeable fraction is just false positives, too; but i don't doubt some things hide in there that merit improvement. It just not work if I point out every single anomaly, bug, missing parameter etc. Example, I think it would be best if I write proper document where I collect ALL known issues related to POSIX conformance that can be reviewed and made to public knowledge. Not really. Usually, the main effort when approaching a list of hundreds or thousands of potential issues is - to figure out what is most relevant - to figure out whether it is really an issue - to figure out how exactly it can be improved If you try to write down a long list, that is not ideal. You will waste a lot of time, and others may not even find the time to read the whole list. So really try to pick - out of your huge pile of potential issues - some of those that you consider among the most serious and at the same time the most easy to fix and start with those. Once you found something that is confirmed to be an actual issue and a fix was committed, sweep the tree for it, then move on to the next class of issues. There is no way anyone can fix the world. Everybody has to pick something. Picking something relevant one is able to fix is part of the art - actually, a very important part of the art. Yours, Ingo
Re: chmod.c undefined behavior
If you try to write down a long list, that is not ideal. You will waste a lot of time, and others may not even find the time to read the whole list. True. As OpenBSD aims for POSIX compatibility, I'm thinking about document of missing features, known anomalies, known bugs and reasons for those. Example, if some certain standard feature is simply crap, or there is legacy reasons why things works differently or there is known bugs or some features are not just implemented, these can be checked from single document, there are possible workarounds, OpenBSD specific #ifdef / #endif etc. After I fully review my self all those millions error messages, end result is less than 50 relevant issues. And after that data is reviewed together, @tech there may be left less than 20. And if issues are fixed, new features done, list can be cleaned up, or if new problems are found these can be added. 2014-09-24 21:54 GMT+03:00 Ingo Schwarze schwa...@usta.de: Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:14:45PM +0300: Thanks for your patience. I feel like I'm querying preferred coding style by this way. That's not the *goal*, but it's an unavoidable side effect, and the more it succeeds, the easier everuthing gets for both sides. [...] I have about million lines of build and other errors under review where I have found anomalies. Some of them seems to be errors in OpenBSD, most of them are errors in code. It is likely that a sizeable fraction is just false positives, too; but i don't doubt some things hide in there that merit improvement. It just not work if I point out every single anomaly, bug, missing parameter etc. Example, I think it would be best if I write proper document where I collect ALL known issues related to POSIX conformance that can be reviewed and made to public knowledge. Not really. Usually, the main effort when approaching a list of hundreds or thousands of potential issues is - to figure out what is most relevant - to figure out whether it is really an issue - to figure out how exactly it can be improved If you try to write down a long list, that is not ideal. You will waste a lot of time, and others may not even find the time to read the whole list. So really try to pick - out of your huge pile of potential issues - some of those that you consider among the most serious and at the same time the most easy to fix and start with those. Once you found something that is confirmed to be an actual issue and a fix was committed, sweep the tree for it, then move on to the next class of issues. There is no way anyone can fix the world. Everybody has to pick something. Picking something relevant one is able to fix is part of the art - actually, a very important part of the art. Yours, Ingo
Re: chmod.c undefined behavior
Hi Matti, Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:44:52PM +0300: And that reason, why not select single component to be as a reference? It's not really needed. If you want good examples, just look for new programs or library functions recently written from scratch by experienced developers, for example signify(1), identd(8), cu(1), mfii(4), tftp-proxy(8), ldomctl(8). There are no doubt many more. The release announcements tell you what is new. And the reference component: This is finest peace of code at the moment, That's in part a matter of taste, and it doesn't matter that you get *the* finest (whatever that is), you just need very good ones. Make rest of the code looks like this, Everything that is diffent to reference is wrong way to do, Not quite. Even in very good code, there is stil considerable room for personal style, and i don't doubt you will see that when looking at the above examples (though i admit i didn't check). Some of the improvements may require enormous amount of effort to apply whole tree, and requires also that when applying changes, have to continuously learn/keep in own knowledge different source components. I don't expect that everyone have interest to all of the components, so it doesn't motivate to hack all components. That must be taken into account Yes. If something is found that is really important, others will help. And if anyone gets bored, they move on to something else - though, as a rule, if something is really important, someone will hopeful show the required persistence. It need not necessarily be the developer who originally found the issue. Yours, Ingo
Re: chmod.c undefined behavior
You were probably trained in the Object-oriented (and likely a bit of post-OO) tradition. Most likely you learned Java, and then possibly either C# or Ruby. I started C-64 Basic, then I learned assembler, then C. After C, I have learned bunch object oriented and functional languages. Lately I have programmed in Typescript. I actually switch language to another all the time. Language is a tool and similar patterns can be implemented everywhere. OO is not the only valid way to program. I agree. Usually, I'm thinking functional approach first, and if that doesn't cut, I look problem from dataflow perspective and after that I try OO. That usually solve the problem if everything else fails but that it is not of course only way. I don't always agree with the OpenBSD developers (and sometimes I vehemently disagree with their take on things), but as a user I'm much happier with the quality, consistency and predictability that OpenBSD provides, than with some project that's always chasing the moving target of the perfect programming paradigm. I also like how directory structure is done in that scale and that consistency that ~everything is C and sh scripts. and so perhaps more familiar to many modern programmers, but not intrinsically or automatically more readable. I don't afraid of long functions. I also write long functions if I have fully implemented DbC or there is just lot to do. However, I do avoid complex functions. and it's highly unlikely that any random group of programmers will be willing to follow the style you were trained to think is best. best is not best if it is not possible to measure it any practical way. I can measure speed, I can measure memory usage, I can even measure how much implementation uses energy (which is actually good optimum for speed/memory usage) but I agree that coding style is often a matter of opinion. However, there is some factors that actually can be measured and those tells something about quality. Example, -Code coverage in tests (complex code is not easy to test!) -Cyclomatic complexity -Coupling Relating to coding style, it can't practically measure what is better name for function. Despite for that, I consider naming convention extemely important, because that is part of the process that is not so straightforward but I don't ever start argue how to shorten names or is camelCase better. I don't care as long as the style doesn't change. OpenBSD defines (AFAIK) results to include safety of code, and/or freedom from unanticipated side-effects (i.e. bugs). I'm fascinated many ways the clean source tree in that scale and I have personal itch to implement methods to find bugs. Instead of pointing out single bug, I'm thinking about implementing methods to make much as possible bugs in code visible.
Re: chmod.c undefined behavior
Can you two please continue this spam off-list, or at least in misc? On 24 September 2014 23:52, Matti Karnaattu mkarnaa...@gmail.com wrote: You were probably trained in the Object-oriented (and likely a bit of post-OO) tradition. Most likely you learned Java, and then possibly either C# or Ruby. I started C-64 Basic, then I learned assembler, then C. After C, I have learned bunch object oriented and functional languages. Lately I have programmed in Typescript. I actually switch language to another all the time. Language is a tool and similar patterns can be implemented everywhere. OO is not the only valid way to program. I agree. Usually, I'm thinking functional approach first, and if that doesn't cut, I look problem from dataflow perspective and after that I try OO. That usually solve the problem if everything else fails but that it is not of course only way. I don't always agree with the OpenBSD developers (and sometimes I vehemently disagree with their take on things), but as a user I'm much happier with the quality, consistency and predictability that OpenBSD provides, than with some project that's always chasing the moving target of the perfect programming paradigm. I also like how directory structure is done in that scale and that consistency that ~everything is C and sh scripts. and so perhaps more familiar to many modern programmers, but not intrinsically or automatically more readable. I don't afraid of long functions. I also write long functions if I have fully implemented DbC or there is just lot to do. However, I do avoid complex functions. and it's highly unlikely that any random group of programmers will be willing to follow the style you were trained to think is best. best is not best if it is not possible to measure it any practical way. I can measure speed, I can measure memory usage, I can even measure how much implementation uses energy (which is actually good optimum for speed/memory usage) but I agree that coding style is often a matter of opinion. However, there is some factors that actually can be measured and those tells something about quality. Example, -Code coverage in tests (complex code is not easy to test!) -Cyclomatic complexity -Coupling Relating to coding style, it can't practically measure what is better name for function. Despite for that, I consider naming convention extemely important, because that is part of the process that is not so straightforward but I don't ever start argue how to shorten names or is camelCase better. I don't care as long as the style doesn't change. OpenBSD defines (AFAIK) results to include safety of code, and/or freedom from unanticipated side-effects (i.e. bugs). I'm fascinated many ways the clean source tree in that scale and I have personal itch to implement methods to find bugs. Instead of pointing out single bug, I'm thinking about implementing methods to make much as possible bugs in code visible. -- The best the little guy can do is what the little guy does right
Re: chmod.c undefined behavior
That main function is good, standard style. Unnecessary goto, variables defined far away from where they are used, monster function, variables are not commented what they do, not all functions are commented what they do.. To me, it looks like that there is no intention to optimize readability and testability. Instead it looks like there is put effort to minimize amount of functions or something. It has one small getopt(3) loop, decrements argc/argv, has one small codeblock to handle arguments for each variant of the utility, and one concise fts(3) main loop to handle files, and finally a one-liner for error handling. It would be better if they all are put to separate functions. That is the whole point in functions, to put task to single program fragment. So what's the point? You can't *calculate* whether something is readable for a *human*, and that's what matters. There are measured human factors, example size of working memory. If there is measured magic constants, it is possible to calculate what is somewhat better or worse. Something to aim for: -Less linearly independent paths in module -High cohesion -Low coupling These are basics. It also matters a lot for testability. Less independent paths means less test cases. It really sounds like you could be more useful elsewhere. Like maybe helping to improve GNU echo.
Re: chmod.c undefined behavior
I have about million lines of build and other errors under review where I have found anomalies. Some of them seems to be errors in OpenBSD, most of them are errors in code. Yet somehow out of those million you decided to send in a patch which introduced a bug. I believe I know your type.
Re: chmod.c undefined behavior
That is good convention. But while the function is so big, it is harder to read. You then also need to shovel data back and forth between these functions. Not necessarily. When the program is this tiny, single file source, it may not so bad thing to use global scope for that shared data. Don't you see the irony in claiming that main is too big and that the variable definitions are too far away, while at the same time saying the program is tiny and proposing to move some variables into file scope? That should say something about paying too much attention to per-function metrics... I don't say that this code is bad. I think it is good gode, but there is plenty of room to improve readability. I could bikeshed about it too, plenty. It's harder to make up changes with real substenance into a program that is tiny and already good code. Much in the same manner that splitting the function into a few dozen would only make it harder to see the code for what it is. And when the lower level details are hidden, it is easier to see what the code does. This is idea of abstraction. And you can go too far. You could break things into smaller and smaller pieces until you have a set of functions that resemble the instruction set of some virtual CPU. Yes, absraction is a core element of programming, and a critical one at that. However, all abstraction comes at a cost, and sometimes that cost may be greater than the value. A program is the sum of its parts, and by splitting parts of it into smaller pieces the program doesn't always grow simpler or more understandable. On the contrary; all but the best and most generic of abstractions tend to be leaky and flawed, and the programmer needs to keep many details about them in mind in order to underdstand what the code really does. Also, hiding low-level details can be the exact opposite of what you want in order to make it easy to see what the code does. I've seen this pattern with aspiring hackers who learn from contrived textbook examples with excessively annotated commented code snippets. They learn to give every little thing a name, and call it an abstraction. What they don't learn is idioms known to experienced programmers. So they hand that code to an experienced programmer and after removing layers upon layers of unneeded abstraction (or obfuscation), including all the extra machinery needed to communicate program state between those layers, the experienced programmer finally sees the code for all its bugs. Alternatively, he can try to keep all the made-up abstractions in his head, but that can be quite a burden. He'll end up constantly jumping back and forth between them and re-checking details. Cohesion? You wanted local variables to be nearby so that they are easy to find. That's one detail. Functions hide other details which you might also want to keep nearby. Typedefs and macros make yet another way to hide details that really matter. Try looking for subtle arithmetic overflows when you don't see the operators and don't know the *real* types of the operands. If you have the time, take a look at what kind of layers were removed in LibreSSL. By reading lots and lots of code, you'll start to pick up patterns and idioms. Code becomes easier to read, and you'll spend less time trying to understand specific parts. And because you do not need to explain these parts to yourself, you also won't need to single them out and give them a name to remind yourself of what they do. The result is that you'll find larger functions perfectly readable and understandable. And perhaps you'll start to see why extra layers would actually make it harder to read. That's not to say that small functions are bad, or that larger functions are always better, or that chmod shouldn't ever be split into smaller parts. But the ability to say which abstractions are good and which are useless (or outright harmful) is something that grows with experience. And not all functions are alike; a long function can really read like a list of instructions where you can forget the past steps as soon as they are behind. A small thirty-line function with some twisted logic (add recursion?) could be a real brain teaser. Mind you, the code isn't written for a first-year comp sci course where the students need annotated twenty-line snippets to help them come to grips with the basic syntax and structure of a computer program. I like to write self documenting code. My practice is to comment ~all functions with block comments, all parameters (with ranges), return values, exceptions, external effects, all variable declarations (with ranges). That sounds like dogma. It's not a terrible attitude to start with, but as you pick up idioms and conventions, you'll find that some things are so obvious that comments add absolutely nothing. And useless comments do not make the code prettier or easier to read. On the other hand, you'll learn to add comments in seemingly
Re: chmod.c undefined behavior
But you didn't write any tests before this diff, and as a result you broke chmod. Retrofitting test suite requires some effort and didn't do that now because... My apologies for two things: 1. I do, forgot to put question mark. My intention was to _ask_ for that code: WTF this? 2. I put that message to wrong mailing list. Questions with diff should definitely go to misc@ And now this thread sprawled long way to off course in wrong list... So, I don't answer in this thread any longer, and I hope we all can now shut down this spamming. Even I could easily refactor code to more sane in terms of function complexity, I don't want do that without test suite (there is probably available) because I don't want to take risk of breaking it. And, I don't do that even if I have that test suite, because I have doubt no one will not commit refactored version because there is opinions that monster function can be more readable. Nothing more to see here.