chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
I noticed that chmod.c have uninitialized variable char *ep that was
used. This diff clarify what I mean.


 Index: chmod.c
===
RCS file: /OpenBSD/src/bin/chmod/chmod.c,v
retrieving revision 1.30
diff -u -p -u -p -r1.30 chmod.c
--- chmod.c 21 May 2014 06:23:01 -  1.30
+++ chmod.c 24 Sep 2014 08:47:58 -
@@ -65,7 +65,7 @@ main(int argc, char *argv[])
uid_t uid;
gid_t gid;
u_int32_t fclear, fset;
-   char *ep, *mode, *cp, *flags;
+   char *mode, *cp, *flags;
setlocale(LC_ALL, );
@@ -148,13 +148,11 @@ done:
flags = *argv;
if (*flags = '0'  *flags = '7') {
errno = 0;
-   val = strtoul(flags, ep, 8);
+   val = strtoul(flags, (char *)NULL, 8);
if (val  UINT_MAX)
errno = ERANGE;
if (errno)
err(1, invalid flags: %s, flags);
-   if (*ep)
-   errx(1, invalid flags: %s, flags);
fset = val;
oct = 1;
} else {
@@ -167,13 +165,11 @@ done:
mode = *argv;
if (*mode = '0'  *mode = '7') {
errno = 0;
-   val = strtol(mode, ep, 8);
+   val = strtol(mode, (char *)NULL, 8);
if (val  INT_MAX || val  0)
errno = ERANGE;
if (errno)
err(1, invalid file mode: %s, mode);
-   if (*ep)
-   errx(1, invalid file mode: %s, mode);
omode = val;
oct = 1;
} else {



Re: chmod.c undefined behavior

2014-09-24 Thread Henri Kemppainen
 I noticed that chmod.c have uninitialized variable char *ep that was
 used. This diff clarify what I mean.

It might be a good idea to take a careful look at the man page of
strtoul(3).  Pay attention to what it does with errno and endptr.
Also, take a look at the example.



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Matti,

Matti Karnaattu wrote on Wed, Sep 24, 2014 at 12:10:47PM +0300:

 I noticed that chmod.c have uninitialized variable char *ep
 that was used.

Your analysis is wrong.  The variable ep is explicitly initialized
by strtoul() before being used in main().

 This diff clarify what I mean.

Your diff is wrong.  It breaks error detection.  Try

  $ chmod 01x testfile

without and with your diff.

Yours,
  Ingo


  Index: chmod.c
 ===
 RCS file: /OpenBSD/src/bin/chmod/chmod.c,v
 retrieving revision 1.30
 diff -u -p -u -p -r1.30 chmod.c
 --- chmod.c 21 May 2014 06:23:01 -  1.30
 +++ chmod.c 24 Sep 2014 08:47:58 -
 @@ -65,7 +65,7 @@ main(int argc, char *argv[])
 uid_t uid;
 gid_t gid;
 u_int32_t fclear, fset;
 -   char *ep, *mode, *cp, *flags;
 +   char *mode, *cp, *flags;
 setlocale(LC_ALL, );
 @@ -148,13 +148,11 @@ done:
 flags = *argv;
 if (*flags = '0'  *flags = '7') {
 errno = 0;
 -   val = strtoul(flags, ep, 8);
 +   val = strtoul(flags, (char *)NULL, 8);
 if (val  UINT_MAX)
 errno = ERANGE;
 if (errno)
 err(1, invalid flags: %s, flags);
 -   if (*ep)
 -   errx(1, invalid flags: %s, flags);
 fset = val;
 oct = 1;
 } else {
 @@ -167,13 +165,11 @@ done:
 mode = *argv;
 if (*mode = '0'  *mode = '7') {
 errno = 0;
 -   val = strtol(mode, ep, 8);
 +   val = strtol(mode, (char *)NULL, 8);
 if (val  INT_MAX || val  0)
 errno = ERANGE;
 if (errno)
 err(1, invalid file mode: %s, mode);
 -   if (*ep)
 -   errx(1, invalid file mode: %s, mode);
 omode = val;
 oct = 1;
 } else {



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
Now I see it, in the last sentence. I was wrong, it was actually used.

I got confused for that monster main function. That is definitely
overly complex and that *ep looked like a forgotten.

Hmm.. I counted cyclomatic complexity.. and that is 65 in main function :|



Re: chmod.c undefined behavior

2014-09-24 Thread David Carlier
I mght be wrong, but why not just initialize to NULL?

On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com wrote:

 I noticed that chmod.c have uninitialized variable char *ep that was
 used. This diff clarify what I mean.


  Index: chmod.c
 ===
 RCS file: /OpenBSD/src/bin/chmod/chmod.c,v
 retrieving revision 1.30
 diff -u -p -u -p -r1.30 chmod.c
 --- chmod.c 21 May 2014 06:23:01 -  1.30
 +++ chmod.c 24 Sep 2014 08:47:58 -
 @@ -65,7 +65,7 @@ main(int argc, char *argv[])
 uid_t uid;
 gid_t gid;
 u_int32_t fclear, fset;
 -   char *ep, *mode, *cp, *flags;
 +   char *mode, *cp, *flags;
 setlocale(LC_ALL, );
 @@ -148,13 +148,11 @@ done:
 flags = *argv;
 if (*flags = '0'  *flags = '7') {
 errno = 0;
 -   val = strtoul(flags, ep, 8);
 +   val = strtoul(flags, (char *)NULL, 8);
 if (val  UINT_MAX)
 errno = ERANGE;
 if (errno)
 err(1, invalid flags: %s, flags);
 -   if (*ep)
 -   errx(1, invalid flags: %s, flags);
 fset = val;
 oct = 1;
 } else {
 @@ -167,13 +165,11 @@ done:
 mode = *argv;
 if (*mode = '0'  *mode = '7') {
 errno = 0;
 -   val = strtol(mode, ep, 8);
 +   val = strtol(mode, (char *)NULL, 8);
 if (val  INT_MAX || val  0)
 errno = ERANGE;
 if (errno)
 err(1, invalid file mode: %s, mode);
 -   if (*ep)
 -   errx(1, invalid file mode: %s, mode);
 omode = val;
 oct = 1;
 } else {




Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Matti,

Matti Karnaattu wrote on Wed, Sep 24, 2014 at 12:55:14PM +0300:

 I got confused for that monster main function.
 That is definitely overly complex

I strongly disagree.

That main function is good, standard style.
Actually, it's good enough to be shown off as an example.
A small utility should be written just like that.

It has one small getopt(3) loop, decrements argc/argv, has one small
codeblock to handle arguments for each variant of the utility, and
one concise fts(3) main loop to handle files, and finally a one-liner
for error handling.

The function is nearly linear, it's very easy to read.
Complication definitely looks *way* different.

 and that *ep looked like a forgotten.
 
 Hmm.. I counted cyclomatic complexity..
 and that is 65 in main function :|

So what's the point?  You can't *calculate* whether something is
readable for a *human*, and that's what matters.

Please forget about such purely theoretical calculations right now.

Yours,
  Ingo



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi David,

David Carlier wrote on Wed, Sep 24, 2014 at 10:19:57AM +0100:
 On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com wrote:

 I noticed that chmod.c have uninitialized variable char *ep
 that was used.  This diff clarify what I mean.

 I might be wrong,

You are.

 but why not just initialize to NULL?

That is exactly how one must *not*, under any circumstances, ever
react to the findings of an audit.  It is very important to not
react in such a way.

If an audit - no matter whether a manual audit (like here) or
tool-driven static analysis - reveals a potential bug or vulnerability,
*never* proceed to merely sweep it under the carpet without caring
what the code actually does.  For example, if your auditing tool
reports potential use of uninitialized variable, do *not* just
add foo = NULL to the top of the function.  Look at what the
function does and decide whether it makes the codes safer or more
readable to initialize.

Adding code to merely shut up an audit is harmful in more than one
way.  First, it may hide an actually problem.  Second, it may create
a new problem if you happen to pick the wrong value for initialization.
Third, it might in somes cases shift a problem elsewehere where it
might be more harmful or harder to detect.

I think you have earned yourself a flame, but i'm sorry to say i'm
not in the mood right now.  :-)


In the case at hand, the code is obviously correct as it stands.
Just search for ep from the top.  The first line you find is the
declaration, the second line is the strtoul() explicitly initializing
it.  Adding a second initialization before would be confusing.  Any
future auditor would wonder what the point of that obviously useless
statement is and would probably waste some time trying to figure
out whether it hides some logic oversight of the author of the code,
probably ending up *removing* the ep = NULL to improve clarity
for future audits.

Yours,
  Ingo


[ incorrect patch snipped ]



Re: chmod.c undefined behavior

2014-09-24 Thread Theo de Raadt
I mght be wrong, but why not just initialize to NULL?

I am watching this thread to spot people who should never be OpenBSD
developers...



Re: chmod.c undefined behavior

2014-09-24 Thread David Carlier
You make the point indeed I had overreacted, just tried to participate :-)

I mght be wrong, but why not just initialize to NULL?

I am watching this thread to spot people who should never be OpenBSD
developers..

Eheh you re probably right.

On 24 September 2014 13:22, Ingo Schwarze schwa...@usta.de wrote:

 Hi David,

 David Carlier wrote on Wed, Sep 24, 2014 at 10:19:57AM +0100:
  On 24 September 2014 10:10, Matti Karnaattu mkarnaa...@gmail.com
 wrote:

  I noticed that chmod.c have uninitialized variable char *ep
  that was used.  This diff clarify what I mean.

  I might be wrong,

 You are.

  but why not just initialize to NULL?

 That is exactly how one must *not*, under any circumstances, ever
 react to the findings of an audit.  It is very important to not
 react in such a way.

 If an audit - no matter whether a manual audit (like here) or
 tool-driven static analysis - reveals a potential bug or vulnerability,
 *never* proceed to merely sweep it under the carpet without caring
 what the code actually does.  For example, if your auditing tool
 reports potential use of uninitialized variable, do *not* just
 add foo = NULL to the top of the function.  Look at what the
 function does and decide whether it makes the codes safer or more
 readable to initialize.

 Adding code to merely shut up an audit is harmful in more than one
 way.  First, it may hide an actually problem.  Second, it may create
 a new problem if you happen to pick the wrong value for initialization.
 Third, it might in somes cases shift a problem elsewehere where it
 might be more harmful or harder to detect.

 I think you have earned yourself a flame, but i'm sorry to say i'm
 not in the mood right now.  :-)


 In the case at hand, the code is obviously correct as it stands.
 Just search for ep from the top.  The first line you find is the
 declaration, the second line is the strtoul() explicitly initializing
 it.  Adding a second initialization before would be confusing.  Any
 future auditor would wonder what the point of that obviously useless
 statement is and would probably waste some time trying to figure
 out whether it hides some logic oversight of the author of the code,
 probably ending up *removing* the ep = NULL to improve clarity
 for future audits.

 Yours,
   Ingo


 [ incorrect patch snipped ]



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
That main function is good, standard style.

Unnecessary goto, variables defined far away from where they are used,
monster function, variables are not commented what they do, not all
functions are commented what they do..

To me, it looks like that there is no intention to optimize readability and
testability. Instead it looks like there is put effort to minimize amount of
functions or something.

It has one small getopt(3) loop, decrements argc/argv, has one small
codeblock to handle arguments for each variant of the utility, and
one concise fts(3) main loop to handle files, and finally a one-liner
for error handling.

It would be better if they all are put to separate functions. That is
the whole point in functions, to put task to single program fragment.

So what's the point?  You can't *calculate* whether something is
readable for a *human*, and that's what matters.

There are measured human factors, example size of working memory.
If there is measured magic constants, it is possible to calculate what
is somewhat better or worse.

Something to aim for:
-Less linearly independent paths in module
-High cohesion
-Low coupling

These are basics. It also matters a lot for testability. Less
independent paths means less test cases.



Re: chmod.c undefined behavior

2014-09-24 Thread Miod Vallat

That main function is good, standard style.


Unnecessary goto,


There is only one goto in chmod.c. If you consider it unnecessary, I'd
advise you to read the code again, and pay attention to the comment
explaining that particular chunk. (Hint: you can't break from more than
one control structure at once in C).


  variables defined far away from where they are used,


That program was written before C99 allowed for variable declarations
in the middle of functions. There is currently no interest to move this
declarations for the sake of it.


monster function, variables are not commented what they do, not all
functions are commented what they do..


Does every simple base command need more comments than actual lines of
code?


To me, it looks like that there is no intention to optimize readability and
testability. Instead it looks like there is put effort to minimize amount of
functions or something.


There is indeed no such attempt. But the code is simple enough to be
understandable to anyone with correct knowledge of the C language and
brain cells in working condition.


It would be better if they all are put to separate functions. That is
the whole point in functions, to put task to single program fragment.


It would arguably be better, due to the linear structure of this particular
command. Also, when this command was written, BSD was running on some  
architectures where function calls have a significant overhead.


Miod



Re: chmod.c undefined behavior

2014-09-24 Thread Henri Kemppainen
 Unnecessary goto

Or the most straightforward and obvious way to break out of a switch
in a loop.

 variables defined far away from where they are used,

Variables defined predictably at the start of the function, as the
convention is in BSD code.  Yes, they can be a little far if the
function is long.  You also always know exactly where to find them.

 monster function

Two hundred lines is hardly long, and size matters less than readability.
If a long function reads neatly from top to bottom, you can split it into
a few dozen procedures, requiring the reader to jump back and forth to
figure out what the code *actually* does.  You then also need to shovel
data back and forth between these functions.

This is how you turn simple and straightforward into a clusterfuck.

 variables are not commented what they do,
 variables named well enough 

If you're familiar with the fts api and know what command line flags are,
then half of these variables are immediately obvious.  For the rest, the
name is a good hint and if you spent a few seconds looking at where and
how each variable is used, these are perfectly understandable too.  Short
names don't get in the way too much.  How would you improve these names?

It sounds like you're just trying to take snippets of code out of context
and declare them bad because you're not reading the rest of the code?

Of course you could comment these, but the code is simple enough that such
excess verbiage would likely just get in the way.  Much in the same manner
that splitting the function into a few dozen would only make it harder to
see the code for what it is.

Mind you, the code isn't written for a first-year comp sci course where
the students need annotated twenty-line snippets to help them come to
grips with the basic syntax and structure of a computer program.

 not all functions are commented what they do..

Again trying to please some quality metric?  Usage does not need to be
documented.  And main() is essentially the whole program, which is
extensively documented in a man page.  It's not called by another function;
there is no internal API to document.  What comment would you add there?
What value would that comment add?

 To me, it looks like that there is no intention to optimize readability and
 testability. Instead it looks like there is put effort to minimize amount of
 functions or something.

It looks like you're not actually reading the code, you're just looking at
snippets of it, taken out of context.  And then you're applying some cargo-
cult pseudo-science to make statements about these snippets.  Goto is evil!

 Something to aim for:
 -Less linearly independent paths in module
 -High cohesion
 -Low coupling

Did you intend to achieve better cohesion and lower coupling by breaking the
function into a dozen small functions that appease whatever metric you're
using?

Of course, I'm a nobody to say how this code should look, but it looks fine
to me; and so far as I can tell, it's not broken.  Don't fix it if it's not
broken.  But if you think it can be improved, I'm sure someone here might
take a look at the diff once you send it.

But if your comments about the current code are any indication of what your
ideal version would look like, I'm not sure there are many who would be
willing to commit it.  But don't let me discourage you; I'm a nobody here.

-Henri



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Miod,

Miod Vallat wrote on Wed, Sep 24, 2014 at 04:27:26PM +0200:

 There is only one goto in chmod.c. If you consider it unnecessary, I'd
 advise you to read the code again, and pay attention to the comment
 explaining that particular chunk.

Heh.  Read the code is almost always good advice (except when
abused to dismiss reports about defective manuals, maybe :).

So let's turn this thread into something useful and tech@-worthy
and fix an actual bug.  The --optind happens in too few cases.
It does not only need to happen when the mode argument stands
alone, but also when it merely stands last in a group, in which
case getopt() has also moved past it.

Right now, in that case, the invalid syntax is silently ignored,
and if any valid mode arguments follow, they take effect.
For example, right now:

   $ touch tf
   $ chmod 755 tf  chmod -Pr -w tf ; ls -l tf  # pun intended
  -r-xr-xr-x  1 ischwarze  wsrc  0 Sep 24 18:14 tf

Oops, -Pr is bad and the -r must not be silently ignored.
With my patch:

   $ chmod 755 tf  ./obj/chmod -Pr -w tf ; ls -l tf 
  chmod: invalid file mode: -Pr
  -rwxr-xr-x  1 ischwarze  wsrc  0 Sep 24 18:14 tf

OK?
  Ingo

P.S.
Regarding the code quality of chmod.c, Miod and Henri have brought
up various good points, so i'll refrain from more comments on style
for now.


Index: chmod.c
===
RCS file: /cvs/src/bin/chmod/chmod.c,v
retrieving revision 1.30
diff -u -p -r1.30 chmod.c
--- chmod.c 21 May 2014 06:23:01 -  1.30
+++ chmod.c 24 Sep 2014 16:40:19 -
@@ -107,19 +107,23 @@ main(int argc, char *argv[])
hflag = 1;
break;
/*
-* XXX
-* -[rwx] are valid mode commands.  If they are the entire
-* argument, getopt has moved past them, so decrement optind.
-* Regardless, we're done argument processing.
+* If this is a symbolic mode argument rather than
+* an option, we are done with option processing.
 */
case 'g': case 'o': case 'r': case 's':
case 't': case 'u': case 'w': case 'X': case 'x':
if (!ischmod)
usage();
-   if (argv[optind - 1][0] == '-' 
-   argv[optind - 1][1] == ch 
-   argv[optind - 1][2] == '\0')
-   --optind;
+   /*
+* If getopt() moved past the argument, back up.
+* If the argument contains option letters before
+* mode letters, setmode() will catch them.
+*/
+   if (optind  1) {
+   cp = argv[optind - 1];
+   if (cp[strlen(cp) - 1] == ch)
+   --optind;
+   }
goto done;
default:
usage();



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu

Or the most straightforward and obvious way to break out of a switch
in a loop.

True. It is also good way to model exeptions. I take my word back,
this may be most elegant way to exit that loop in this case.

Variables defined predictably at the start of the function, as the
convention is in BSD code.

That is good convention. But while the function is so big, it is harder
to read.

You then also need to shovel
data back and forth between these functions.

Not necessarily. When the program is this tiny, single file source,
it may not so bad thing to use global scope for that shared data.

Two hundred lines is hardly long

I've used to write less than twenty lines functions.

How would you improve these names?

Naming is ok me, but I may add comments what they are used for.

 It sounds like you're just trying to take snippets of code out of
 context and declare them bad because you're not reading the rest of
 the code?

I don't say that this code is bad. I think it is good gode, but there is
plenty of room to improve readability.

 Much in the same manner that splitting the function into
 a few dozen would only make it harder to  see the code for what it is.

And when the lower level details are hidden, it is easier to see what
the code does. This is idea of abstraction.

 Mind you, the code isn't written for a first-year comp sci course
 where the students need annotated twenty-line snippets to help them
 come to grips with the basic syntax and structure of a computer
 program.

I like to write self documenting code. My practice is to comment ~all
functions with block comments, all parameters (with ranges), return
values, exceptions, external effects, all variable declarations (with
ranges).

And, I prefer to write tests first.

 Again trying to please some quality metric?

Yes, cyclomatic complexity. If that is high, it is usually good sign that
code needs refactoring.

 Did you intend to achieve better cohesion and lower coupling by
 breaking the function into a dozen small functions that appease
 whatever metric you're using?

No, lower CC index per function instead. This kind of refactoring
often reveals opportunities to increase cohesion in more complex
programs and helps to reduce the amount of code.

 Of course, I'm a nobody to say how this code should look, but it looks
 fine to me; and so far as I can tell, it's not broken.  Don't fix it
 if it's not broken.

Yes, it is considerably harder to refactor because there is no tests for
that. I think that should be starting point.

 But if your comments about the current code are any indication of what
 your ideal version would look like, I'm not sure there are many who
 would be willing to commit it.

I think that it is not defined enough unambiguously, how ideal code
looks like. It reduces motivation to improve code better, if it is not
defined, what is better.

style(9) is good, but it would be better if there is reference component
that is optimized to perfection, including tests, naming conventions,
file hierarchy etc.

Then ALL code changes and new code can be compared to that reference,
and that reference code can be used to build templates  automated
testing.



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Matti,

Matti Karnaattu wrote on Wed, Sep 24, 2014 at 08:14:27PM +0300:

 I think that it is not defined enough unambiguously, how ideal code
 looks like. It reduces motivation to improve code better, if it is not
 defined, what is better.
 
 style(9) is good, but it would be better if there is reference component
 that is optimized to perfection, including tests, naming conventions,
 file hierarchy etc.
 
 Then ALL code changes and new code can be compared to that reference,
 and that reference code can be used to build templates  automated
 testing.

It's not as easy as that.  Not only is the code constantly improved,
our standards which idioms are considered good and which are considered
dangerous constantly evolve as well, slowly turning new, high quality
code into old code of lower quality - even if the code remains
completely unchanged.  Actually, the quality of code diminishes in
time *in particular* when it remains unchanged.

As a recent example, imagine that you have an array allocated with
calloc(3) and want to change its size.  Until quite recently, the
following was considered reasonable code:

elemsz = ...;
nelem = ...;
if (SIZE_MAX / nelem  elemsz)
handle_overflow();
newp = realloc(pointer, nelem * elemsz);
if (newp == NULL)
handle_oom();
pointer = newp;

Now, that is considered a dangerous idiom, to be rewritten as
follows:

elemsz = ...;
nelem = ...;
newp = reallocarray(pointer, nelem, elemsz);
if (newp == NULL)
handle_oom();
pointer = newp;

Besides, as i already said at an earlier occasion, it is highly
non-trivial to write down a comprehensive set of rules to be
followed, it will end up being incomplete, too dogmatic, and
outdated all at the same time.

The best way to learn is to read code, understand it, find
weaknesses, suggest improvements, and when we agree on the
improvements, systematically audit the tree for similar
problems.

Actually, that's not just the way for newbies, it's how the most
experienced developers do it, too.  And even they, now and then,
have to delete suggestions for improvments from their trees that,
when reviewed, turn out to have downsides.

Yours,
  Ingo



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
Bringing that up is the only way to find out what OpenBSD developers
do consider poor quality, what they don't, and what is merely
considered a matter of style here

Thanks for your patience. I feel like I'm querying preferred
coding style by this way.

I also think that better way to do that is that it would be good thing
if there is some reference component in source tree, and rest of the
code should be aimed to same quality. In testsuite, style
and implementation.

I don't think that the goto is problem, even there may be possibility to
make clean, readable exit using while comparison.

That monster size main function is the issue and reduces
readability.

 mostly a matter of style, not of code quality;

I think these are somewhat bonded. Especially in C, because
the language is kind of unsecured gun like some Javascript.

While the language lacks certain things like classes, exceptions,
coroutines etc. this gap should be compensate by improving the
coding conventions and checking of those can be automated too.

 Obviously, you are honestly hunting for bugs.

I have about million lines of build and other errors under review
where I have found anomalies. Some of them seems to be errors
in OpenBSD, most of them are errors in code.

It just not work if I point out every single anomaly, bug, missing
parameter etc.

Example, I think it would be best if I write proper document where I
collect ALL known issues related to POSIX conformance that can be
reviewed and made to public knowledge.



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
 It's not as easy as that.  Not only is the code constantly improved,
 our standards which idioms are considered good and which are considered
 dangerous constantly evolve as well, slowly turning new, high quality
 code into old code of lower quality - even if the code remains
 completely unchanged.  Actually, the quality of code diminishes in
 time *in particular* when it remains unchanged.

I understand. And, the code itself is ultimately the specification.

And that reason, why not select single component  to be as a reference?

And the reference component:

This is finest peace of code at the moment,
Make rest of the code looks like this,
Everything that is diffent to reference is wrong way to do,
If some convention just doesn't work, fixed it to reference first

 The best way to learn is to read code, understand it, find
 weaknesses, suggest improvements, and when we agree on the
 improvements, systematically audit the tree for similar
 problems.

Some of the improvements may require enormous amount of effort to
apply whole tree, and requires also that when applying changes, have
to continuously learn/keep in own knowledge different source components.

I don't expect that everyone have interest to all of the components,
so it doesn't motivate to hack all components. That  must be taken into
account

Like you say, quality standards are rising all the time, I think it is
good to have reference component and accept, that rest of the source tree
may have little behind in quality.

I have used this practice in my own code. Every time I learn new
tool/language/framework etc. I make first component to reference and
then apply all practices rest of the components.  I've never tried this
on larger scale, in open source projects.



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Matti,

Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:14:45PM +0300:

 Thanks for your patience. I feel like I'm querying preferred
 coding style by this way.

That's not the *goal*, but it's an unavoidable side effect, and the
more it succeeds, the easier everuthing gets for both sides.

[...]
 I have about million lines of build and other errors under review
 where I have found anomalies. Some of them seems to be errors
 in OpenBSD, most of them are errors in code.

It is likely that a sizeable fraction is just false positives, too;
but i don't doubt some things hide in there that merit improvement.

 It just not work if I point out every single anomaly, bug, missing
 parameter etc.
 
 Example, I think it would be best if I write proper document where I
 collect ALL known issues related to POSIX conformance that can be
 reviewed and made to public knowledge.

Not really.  Usually, the main effort when approaching a list of
hundreds or thousands of potential issues is

 - to figure out what is most relevant
 - to figure out whether it is really an issue
 - to figure out how exactly it can be improved

If you try to write down a long list, that is not ideal.
You will waste a lot of time, and others may not even
find the time to read the whole list.

So really try to pick - out of your huge pile of potential
issues - some of those that you consider among the most serious
and at the same time the most easy to fix and start with those.
Once you found something that is confirmed to be an actual issue
and a fix was committed, sweep the tree for it, then move on
to the next class of issues.

There is no way anyone can fix the world.  Everybody has to pick
something.  Picking something relevant one is able to fix is
part of the art - actually, a very important part of the art.

Yours,
  Ingo



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
 If you try to write down a long list, that is not ideal.
 You will waste a lot of time, and others may not even
 find the time to read the whole list.

True. As OpenBSD aims for POSIX compatibility,
I'm thinking about document of missing features,
known anomalies, known bugs and reasons for those.

Example, if some certain standard feature is simply crap,
or there is legacy reasons why things works differently
or there is known bugs or some features are not just
implemented, these can be checked from single document, there
are possible workarounds, OpenBSD specific #ifdef / #endif etc.

After I fully review my self all those millions error messages,
end result is less than 50 relevant issues. And after that data
is reviewed together, @tech there may be left less than 20.

And if issues are fixed, new features done, list can be cleaned up,
or if new problems are found these can be added.

2014-09-24 21:54 GMT+03:00 Ingo Schwarze schwa...@usta.de:
 Hi Matti,

 Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:14:45PM +0300:

 Thanks for your patience. I feel like I'm querying preferred
 coding style by this way.

 That's not the *goal*, but it's an unavoidable side effect, and the
 more it succeeds, the easier everuthing gets for both sides.

 [...]
 I have about million lines of build and other errors under review
 where I have found anomalies. Some of them seems to be errors
 in OpenBSD, most of them are errors in code.

 It is likely that a sizeable fraction is just false positives, too;
 but i don't doubt some things hide in there that merit improvement.

 It just not work if I point out every single anomaly, bug, missing
 parameter etc.

 Example, I think it would be best if I write proper document where I
 collect ALL known issues related to POSIX conformance that can be
 reviewed and made to public knowledge.

 Not really.  Usually, the main effort when approaching a list of
 hundreds or thousands of potential issues is

  - to figure out what is most relevant
  - to figure out whether it is really an issue
  - to figure out how exactly it can be improved

 If you try to write down a long list, that is not ideal.
 You will waste a lot of time, and others may not even
 find the time to read the whole list.

 So really try to pick - out of your huge pile of potential
 issues - some of those that you consider among the most serious
 and at the same time the most easy to fix and start with those.
 Once you found something that is confirmed to be an actual issue
 and a fix was committed, sweep the tree for it, then move on
 to the next class of issues.

 There is no way anyone can fix the world.  Everybody has to pick
 something.  Picking something relevant one is able to fix is
 part of the art - actually, a very important part of the art.

 Yours,
   Ingo



Re: chmod.c undefined behavior

2014-09-24 Thread Ingo Schwarze
Hi Matti,

Matti Karnaattu wrote on Wed, Sep 24, 2014 at 09:44:52PM +0300:

 And that reason, why not select single component to be as a reference?

It's not really needed.  If you want good examples, just look for
new programs or library functions recently written from scratch
by experienced developers, for example signify(1), identd(8), cu(1),
mfii(4), tftp-proxy(8), ldomctl(8).  There are no doubt many more.
The release announcements tell you what is new.

 And the reference component:
 
 This is finest peace of code at the moment,

That's in part a matter of taste, and it doesn't matter that you
get *the* finest (whatever that is), you just need very good ones.

 Make rest of the code looks like this,
 Everything that is diffent to reference is wrong way to do,

Not quite.  Even in very good code, there is stil considerable room for
personal style, and i don't doubt you will see that when looking at
the above examples (though i admit i didn't check).

 Some of the improvements may require enormous amount of effort to
 apply whole tree, and requires also that when applying changes, have
 to continuously learn/keep in own knowledge different source components.
 
 I don't expect that everyone have interest to all of the components,
 so it doesn't motivate to hack all components. That  must be taken into
 account

Yes.  If something is found that is really important, others will
help.  And if anyone gets bored, they move on to something else -
though, as a rule, if something is really important, someone will
hopeful show the required persistence.  It need not necessarily be
the developer who originally found the issue.

Yours,
  Ingo



Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
 You were probably trained in the Object-oriented (and likely a bit of 
 post-OO) tradition.  Most likely you learned
 Java, and then possibly either C# or Ruby.

I started C-64 Basic, then I learned assembler, then C.

After C, I have learned bunch object oriented and functional
languages. Lately I have programmed in Typescript.

I actually switch language to another all the time. Language is a tool
and similar patterns can be implemented everywhere.

OO is not the only valid way to program.

I agree. Usually, I'm thinking functional approach first, and if that
doesn't cut, I look problem from dataflow perspective and after that I
try OO. That usually solve the problem if everything else fails but
that it is not of course only way.

I don't always agree with the OpenBSD developers (and sometimes I vehemently 
disagree with their take on things), but as a user I'm much happier with the 
quality, consistency and predictability that OpenBSD provides, than with some 
project that's always chasing the moving target of the perfect programming 
paradigm.

I also like how directory structure is done in that scale and that
consistency that ~everything is C and sh scripts.

and so perhaps more familiar to many modern programmers, but not intrinsically 
or automatically more readable.

I don't afraid of long functions. I also write long functions if I
have fully implemented DbC or there is just lot to do.

However, I do avoid complex functions.

 and it's highly unlikely that any random group of programmers will be willing 
 to follow the style you were trained to  think is best.

best is not best if it is not possible to measure it any practical way.

I can measure speed, I can measure memory usage, I can even measure
how much implementation uses energy (which is actually good optimum
for speed/memory usage) but I agree that coding style is often a
matter of opinion.

However, there is some factors that actually can be measured and those
tells something about quality. Example,

-Code coverage in tests (complex code is not easy to test!)
-Cyclomatic complexity
-Coupling

Relating to coding style, it can't practically measure what is better
name for function. Despite for that,
I consider naming convention extemely important, because that is part
of the process that is not so straightforward but I don't ever start
argue how to shorten names or is camelCase better. I don't care as
long as the style doesn't change.

OpenBSD defines (AFAIK) results to include safety of code, and/or freedom 
from unanticipated side-effects (i.e. bugs).

I'm fascinated many ways the clean source tree in that scale and I
have personal itch to implement methods to find bugs.

Instead of pointing out single bug, I'm thinking about implementing
methods to make much as possible bugs in code visible.



Re: chmod.c undefined behavior

2014-09-24 Thread Eugene Yunak
Can you two please continue this spam off-list, or at least in misc?

On 24 September 2014 23:52, Matti Karnaattu mkarnaa...@gmail.com wrote:

  You were probably trained in the Object-oriented (and likely a bit of
 post-OO) tradition.  Most likely you learned
  Java, and then possibly either C# or Ruby.

 I started C-64 Basic, then I learned assembler, then C.

 After C, I have learned bunch object oriented and functional
 languages. Lately I have programmed in Typescript.

 I actually switch language to another all the time. Language is a tool
 and similar patterns can be implemented everywhere.

 OO is not the only valid way to program.

 I agree. Usually, I'm thinking functional approach first, and if that
 doesn't cut, I look problem from dataflow perspective and after that I
 try OO. That usually solve the problem if everything else fails but
 that it is not of course only way.

 I don't always agree with the OpenBSD developers (and sometimes I
 vehemently disagree with their take on things), but as a user I'm much
 happier with the quality, consistency and predictability that OpenBSD
 provides, than with some project that's always chasing the moving target
 of the perfect programming paradigm.

 I also like how directory structure is done in that scale and that
 consistency that ~everything is C and sh scripts.

 and so perhaps more familiar to many modern programmers, but not
 intrinsically or automatically more readable.

 I don't afraid of long functions. I also write long functions if I
 have fully implemented DbC or there is just lot to do.

 However, I do avoid complex functions.

  and it's highly unlikely that any random group of programmers will be
 willing to follow the style you were trained to  think is best.

 best is not best if it is not possible to measure it any practical way.

 I can measure speed, I can measure memory usage, I can even measure
 how much implementation uses energy (which is actually good optimum
 for speed/memory usage) but I agree that coding style is often a
 matter of opinion.

 However, there is some factors that actually can be measured and those
 tells something about quality. Example,

 -Code coverage in tests (complex code is not easy to test!)
 -Cyclomatic complexity
 -Coupling

 Relating to coding style, it can't practically measure what is better
 name for function. Despite for that,
 I consider naming convention extemely important, because that is part
 of the process that is not so straightforward but I don't ever start
 argue how to shorten names or is camelCase better. I don't care as
 long as the style doesn't change.

 OpenBSD defines (AFAIK) results to include safety of code, and/or
 freedom from unanticipated side-effects (i.e. bugs).

 I'm fascinated many ways the clean source tree in that scale and I
 have personal itch to implement methods to find bugs.

 Instead of pointing out single bug, I'm thinking about implementing
 methods to make much as possible bugs in code visible.




-- 
The best the little guy can do is what
the little guy does right


Re: chmod.c undefined behavior

2014-09-24 Thread Theo de Raadt
That main function is good, standard style.

Unnecessary goto, variables defined far away from where they are used,
monster function, variables are not commented what they do, not all
functions are commented what they do..

To me, it looks like that there is no intention to optimize readability and
testability. Instead it looks like there is put effort to minimize amount of
functions or something.

It has one small getopt(3) loop, decrements argc/argv, has one small
codeblock to handle arguments for each variant of the utility, and
one concise fts(3) main loop to handle files, and finally a one-liner
for error handling.

It would be better if they all are put to separate functions. That is
the whole point in functions, to put task to single program fragment.

So what's the point?  You can't *calculate* whether something is
readable for a *human*, and that's what matters.

There are measured human factors, example size of working memory.
If there is measured magic constants, it is possible to calculate what
is somewhat better or worse.

Something to aim for:
-Less linearly independent paths in module
-High cohesion
-Low coupling

These are basics. It also matters a lot for testability. Less
independent paths means less test cases.

It really sounds like you could be more useful elsewhere.

Like maybe helping to improve GNU echo.



Re: chmod.c undefined behavior

2014-09-24 Thread Theo de Raadt
I have about million lines of build and other errors under review
where I have found anomalies. Some of them seems to be errors
in OpenBSD, most of them are errors in code.

Yet somehow out of those million you decided to send in a patch
which introduced a bug.

I believe I know your type.



Re: chmod.c undefined behavior

2014-09-24 Thread Henri Kemppainen
 That is good convention. But while the function is so big, it is harder
 to read.

 You then also need to shovel
 data back and forth between these functions.

 Not necessarily. When the program is this tiny, single file source,
 it may not so bad thing to use global scope for that shared data.

Don't you see the irony in claiming that main is too big and that the
variable definitions are too far away, while at the same time saying the
program is tiny and proposing to move some variables into file scope?

That should say something about paying too much attention to per-function
metrics...

 I don't say that this code is bad. I think it is good gode, but there is
 plenty of room to improve readability.

I could bikeshed about it too, plenty.  It's harder to make up changes
with real substenance into a program that is tiny and already good code.

  Much in the same manner that splitting the function into
  a few dozen would only make it harder to  see the code for what it is.

 And when the lower level details are hidden, it is easier to see what
 the code does. This is idea of abstraction.

And you can go too far.  You could break things into smaller and smaller
pieces until you have a set of functions that resemble the instruction
set of some virtual CPU.

Yes, absraction is a core element of programming, and a critical one at
that.  However, all abstraction comes at a cost, and sometimes that cost
may be greater than the value.  A program is the sum of its parts, and
by splitting parts of it into smaller pieces the program doesn't always
grow simpler or more understandable.  On the contrary; all but the best
and most generic of abstractions tend to be leaky and flawed, and the
programmer needs to keep many details about them in mind in order to
underdstand what the code really does.

Also, hiding low-level details can be the exact opposite of what you
want in order to make it easy to see what the code does.  I've seen
this pattern with aspiring hackers who learn from contrived textbook
examples with excessively annotated  commented code snippets.  They
learn to give every little thing a name, and call it an abstraction.
What they don't learn is idioms known to experienced programmers.

So they hand that code to an experienced programmer and after removing
layers upon layers of unneeded abstraction (or obfuscation), including
all the extra machinery needed to communicate program state between
those layers, the experienced programmer finally sees the code for all
its bugs.  Alternatively, he can try to keep all the made-up abstractions
in his head, but that can be quite a burden.  He'll end up constantly
jumping back and forth between them and re-checking details.  Cohesion?

You wanted local variables to be nearby so that they are easy to find.
That's one detail.  Functions hide other details which you might also want
to keep nearby.  Typedefs and macros make yet another way to hide details
that really matter.  Try looking for subtle arithmetic overflows when you
don't see the operators and don't know the *real* types of the operands.
If you have the time, take a look at what kind of layers were removed in
LibreSSL.

By reading lots and lots of code, you'll start to pick up patterns and
idioms.  Code becomes easier to read, and you'll spend less time trying
to understand specific parts.  And because you do not need to explain
these parts to yourself, you also won't need to single them out and give
them a name to remind yourself of what they do.  The result is that you'll
find larger functions perfectly readable and understandable.  And perhaps
you'll start to see why extra layers would actually make it harder to read.

That's not to say that small functions are bad, or that larger functions
are always better, or that chmod shouldn't ever be split into smaller
parts.  But the ability to say which abstractions are good and which are
useless (or outright harmful) is something that grows with experience.
And not all functions are alike; a long function can really read like
a list of instructions where you can forget the past steps as soon as
they are behind.  A small thirty-line function with some twisted logic
(add recursion?) could be a real brain teaser.

  Mind you, the code isn't written for a first-year comp sci course
  where the students need annotated twenty-line snippets to help them
  come to grips with the basic syntax and structure of a computer
  program.

 I like to write self documenting code. My practice is to comment ~all
 functions with block comments, all parameters (with ranges), return
 values, exceptions, external effects, all variable declarations (with
 ranges).

That sounds like dogma.  It's not a terrible attitude to start with, but
as you pick up idioms and conventions, you'll find that some things are so
obvious that comments add absolutely nothing.  And useless comments do not
make the code prettier or easier to read.  On the other hand, you'll learn
to add comments in seemingly 

Re: chmod.c undefined behavior

2014-09-24 Thread Matti Karnaattu
But you didn't write any tests before this diff, and as a result you broke 
chmod.

Retrofitting test suite requires some effort and didn't do that now
because...

My apologies for two things:

1. I do, forgot to put question mark. My intention was to _ask_ for that
code: WTF this?

2. I put that message to wrong mailing list. Questions with diff should
definitely go to misc@

And now this thread sprawled long way to off course in wrong list...
So, I don't answer in this thread any longer, and I hope we all can now
shut down this spamming.

Even I could easily refactor code to more sane in terms of function
complexity, I don't want do that without test suite (there is probably
available) because I don't want to take risk of breaking it.

And, I don't do that even if I have that test suite, because I have doubt
no one will not commit refactored version because there is opinions that
monster function can be more readable.

Nothing more to see here.