Re: null-terminated vs. nul-terminated

2022-03-30 Thread Warner Losh
On Tue, Mar 29, 2022, 5:40 AM Greg Troxel  wrote:

>
> "David H. Gutteridge"  writes:
>
> Thanks for the history and it is  all sensible.
>
> > "nul-terminated" and "null-terminated" seemed more common in man pages
> > that originated from historical BSD sources, so, lacking any style
> > guide, I inferred the lowercase "nul" was more "correct" as "BSD style"
> > (excepting modern OpenBSD), even though that looks a bit odd to me. I
> > then examined where "nul-terminated" came from, and found these bulk
> > commits, which imply a standard.
>
> > date: 2005-01-02 18:38:04 +;  author: wiz;
> > Mark up NULL, and replace null by nul where appropriate.
> >
> > date: 2006-10-16 08:48:45 +;  author: wiz;
> > nul/null/NULL cleanup:
> > when talking about characters/bytes, use "nul" and "nul-terminate"
> > when talking about pointers, use "null pointer" or ".Dv NULL"
> >
> > So that seemed to me the established style.
>
> It may have been BSD style, but I think it's wrong to use lowercase for
> an ASCII codepoint.  And therefore it is confusing to people who know
> that the ASCII zero byte is written NUL.
>

FreeBSD has adopted the POSIX language (null terminated) because it mirrors
the standard and the xopen folks have blanket permission to use it in open
source man pages...

Warner

>


Re: null-terminated vs. nul-terminated

2022-03-29 Thread Robert Elz
And yes I know nl isnot really ascii, but lf and cr are also
typically used in lower case.

This whole discussion is childish.  It doesn't matter.

kre


Re: null-terminated vs. nul-terminated

2022-03-29 Thread Robert Elz
Date:Tue, 29 Mar 2022 07:40:04 -0400
From:Greg Troxel 
Message-ID:  

  | It may have been BSD style, but I think it's wrong to use lowercase for
  | an ASCII codepoint.

But we use soh esc nl del (etc) in lower case all the time.

You might also want to look at share/misc/ascii

kre


Re: null-terminated vs. nul-terminated

2022-03-29 Thread Greg Troxel

"David H. Gutteridge"  writes:

Thanks for the history and it is  all sensible.

> "nul-terminated" and "null-terminated" seemed more common in man pages
> that originated from historical BSD sources, so, lacking any style
> guide, I inferred the lowercase "nul" was more "correct" as "BSD style"
> (excepting modern OpenBSD), even though that looks a bit odd to me. I
> then examined where "nul-terminated" came from, and found these bulk
> commits, which imply a standard.

> date: 2005-01-02 18:38:04 +;  author: wiz;
> Mark up NULL, and replace null by nul where appropriate.
>
> date: 2006-10-16 08:48:45 +;  author: wiz;
> nul/null/NULL cleanup:
> when talking about characters/bytes, use "nul" and "nul-terminate"
> when talking about pointers, use "null pointer" or ".Dv NULL"
>
> So that seemed to me the established style.

It may have been BSD style, but I think it's wrong to use lowercase for
an ASCII codepoint.  And therefore it is confusing to people who know
that the ASCII zero byte is written NUL.




signature.asc
Description: PGP signature


Re: null-terminated vs. nul-terminated

2022-03-28 Thread David H. Gutteridge

On 2022-03-26 11:57, Roland Illig wrote:

The term "null-terminated string" is quite common when talking about C.
In contrast, the word "nul" in "nul-terminated" always reminds me of
the character abbreviation in ASCII, which has a narrower scope than C.
I prefer to keep "null-terminated" here.


Hi all,

While I don't really want to prolong this debate, as the committer who
triggered this discussion, I felt I should respond, in part to explain
why I made my choice (which I reverted, though I don't agree "null-
terminated" is more correct). TL;DR: there is no consistency here in
NetBSD's code base in man pages or comments in source code, and no
applicable style guide I know of, but "NUL-terminated" is the most
common form found. It seems there was also an attempt at standardization
in man pages made in 2005-2006, settling on "nul-terminated".

I was taught (several decades ago) that the short form for the null
byte or null character was NUL in ANSI C parlance (not just ASCII), and
that "null-terminated" was incorrect as it's ambiguous. If someone were
to say "null-byte-terminated", "null-character-terminated", or for the
other context "null-pointer-terminated", that would be fine.
"NUL-terminated" was the unambiguous contraction. (As others have
pointed out, a cleverer way to avoid this debate would be to use
entirely different terms.)

The most common form found in man pages at present installed in NetBSD
-current is actually "NUL-terminated", by a significant margin. That's
in part because many of those are from third-party projects, e.g.,
OpenBSD and OpenSSL, which standardized on that form. The next most
common is "null-terminated", then (following slightly behind) "nul-
terminated", then (much less commonly) "NULL-terminated" (which seems
quite incorrect to me). I didn't look as closely at comments, but a
similar pattern emerged, with "NUL-terminated" the most common under
/usr/include, for example (in part due to the origins of some upstream
code). (It's not my intent here to quote or debate exact statistics, so
I haven't provided any. I'm sharing my perception of practice, rightly
or wrongly.)

"nul-terminated" and "null-terminated" seemed more common in man pages
that originated from historical BSD sources, so, lacking any style
guide, I inferred the lowercase "nul" was more "correct" as "BSD style"
(excepting modern OpenBSD), even though that looks a bit odd to me. I
then examined where "nul-terminated" came from, and found these bulk
commits, which imply a standard.

date: 2005-01-02 18:38:04 +;  author: wiz;
Mark up NULL, and replace null by nul where appropriate.

date: 2006-10-16 08:48:45 +;  author: wiz;
nul/null/NULL cleanup:
when talking about characters/bytes, use "nul" and "nul-terminate"
when talking about pointers, use "null pointer" or ".Dv NULL"

So that seemed to me the established style.

Regards,

Dave


Re: null-terminated vs. nul-terminated

2022-03-26 Thread Greg Troxel

Taylor R Campbell  writes:

>> Date: Sat, 26 Mar 2022 16:53:19 +0100
>> From: Roland Illig 
>> 
>> The term "null-terminated string" is quite common when talking about C.
>>   In contrast, the word "nul" in "nul-terminated" always reminds me of
>> the character abbreviation in ASCII, which has a narrower scope than C.
>>   I prefer to keep "null-terminated" here.
>
> I feel like I've usually seen it as NUL-terminated.  I thought it was
> in /usr/share/misc/style but I must have been thinking of a different
> style guide.
>
> `NUL' is better than `null' or `NULL' here because it's not a null
> pointer, unlike, e.g., the execve argv terminator.  Even if the string
> isn't US-ASCII, what character encoding calls a nonzero byte `NUL'?
>
> `NUL' is better than `zero' or `0' here because it's unambiguously the
> all-bits-zero byte, not the US-ASCII encoding of `0' (i.e., decimal 48
> or 0x30).

For background I'm a native en_US speaker whose second computer language
was K C from the pre-ANSI edition.

There are three separate concepts.

NULLRefers to a pointer, never to a character

NUL ASCII codepoint, 7 zero bits, and 8 zero bits when stored in an
8-bit byte.  NUL is never properly written nul; the ASCII
codepoints are upper case in formal usage.

nullAn English word that can mean various things, including

   null pointer => NULL

   null character => NUL in ASCII

   null character => 0 in something else, theoretically maybe,
   but C just cannot deal with a character set that uses 0 to
   represent something that gets used in strings.


So one can talk about a "null-terminated string" implying "null
character" which means NUL, and one could also write "NUL-terminated
string".  I find the from NUL-terminated to be artificial.

I perceive "nul-terminated" as an error due to the lower case nul.


signature.asc
Description: PGP signature


Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Jason Thorpe


> On Mar 26, 2022, at 9:39 AM, Taylor R Campbell 
>  wrote:
> 
> `C string' is ambiguous because there are also char arrays that
> function as strings but which are not guaranteed to be NUL-terminated,
> as strncpy is intended for.

A non-terminated char array is not a C-string.  The term C-string is not 
ambiguous.  This is something that, amazingly, even Internet trolls appear to 
agree on.  However, they do disagree as to the spelling of the terminating 
character's name, which is why I think it's best to elide it altogether.

-- thorpej



Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Jason Thorpe


> On Mar 26, 2022, at 9:09 AM, Warner Losh  wrote:
> 
> Since all the 'C' standards[*] use "null-terminated" and "null character", 
> it's likely best to use that terminology because there is a source of truth 
> for its definition in case of ambiguity or doubt.

Ah, but you're giving up the opportunity to use indirection to solve the 
problem.  By calling it a "C-string", then those who care what the standard 
calls the terminating character can go look it up! :-)

-- thorpej



Re: null-terminated vs. nul-terminated

2022-03-26 Thread Roland Illig

Am 26.03.2022 um 17:09 schrieb Warner Losh:

[*] I've not gone the extra mile and checked to see if K used this
phrase, to be honest.


It does.  The book from 1978 says in its tutorial section:

> getline puts the character \0 (the null character, whose value
> is zero) at the end of the array it is creating, to mark the
> end of the string of characters.

Interestingly, the section from the above quote is named "Character
Arrays", not "Strings". The definition of "string" on page 181 doesn't
mention the word "terminated", it gives the name "null byte" to the \0.

So using the word "null" to mean all kinds of nothing, including a null
pointer, a null byte and a null character, has a long tradition.

Roland


Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Warner Losh
On Sat, Mar 26, 2022 at 9:53 AM Roland Illig  wrote:

> Am 24.03.2022 um 02:55 schrieb David H. Gutteridge:
> > Module Name:  src
> > Committed By: gutteridge
> > Date: Thu Mar 24 01:55:15 UTC 2022
> >
> > Modified Files:
> >   src/lib/libc/gen: popen.3
> >
> > Log Message:
> > popen.3: minor spelling, grammar, style, and xref tweaks
> >
> >
> > To generate a diff of this commit:
> > cvs rdiff -u -r1.22 -r1.23 src/lib/libc/gen/popen.3
>
> The term "null-terminated string" is quite common when talking about C.
>   In contrast, the word "nul" in "nul-terminated" always reminds me of
> the character abbreviation in ASCII, which has a narrower scope than C.
>   I prefer to keep "null-terminated" here.
>

The standard uses "null-terminated" and "null character" (see Character
Sets section 5.2.1 (from the C2x draft, but this term dates back to C89):
"A byte with all bits set to 0, called the null character, shall exist in
the basic execution character set; it is used to terminate a character
string."
I couldn't find the definition for null-terminated though. This is
different than the NULL #define

Not to be confused with the all zeros ASCII charater, whose mnemonic is
NUL, which is where some pressure to use NUL terminated comes from. I agree
that it's usage is narrower and really only relevant for certain ASCII and
ASCII-derived character sets, which is why the standard chose the spelling
it did.

Since all the 'C' standards[*] use "null-terminated" and "null character",
it's likely best to use that terminology because there is a source of truth
for its definition in case of ambiguity or doubt.

Warner

[*] I've not gone the extra mile and checked to see if K used this
phrase, to be honest.


Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Taylor R Campbell
> Date: Sat, 26 Mar 2022 16:53:19 +0100
> From: Roland Illig 
> 
> The term "null-terminated string" is quite common when talking about C.
>   In contrast, the word "nul" in "nul-terminated" always reminds me of
> the character abbreviation in ASCII, which has a narrower scope than C.
>   I prefer to keep "null-terminated" here.

I feel like I've usually seen it as NUL-terminated.  I thought it was
in /usr/share/misc/style but I must have been thinking of a different
style guide.

`NUL' is better than `null' or `NULL' here because it's not a null
pointer, unlike, e.g., the execve argv terminator.  Even if the string
isn't US-ASCII, what character encoding calls a nonzero byte `NUL'?

`NUL' is better than `zero' or `0' here because it's unambiguously the
all-bits-zero byte, not the US-ASCII encoding of `0' (i.e., decimal 48
or 0x30).

`C string' is ambiguous because there are also char arrays that
function as strings but which are not guaranteed to be NUL-terminated,
as strncpy is intended for.


Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Jason Thorpe


> On Mar 26, 2022, at 9:17 AM, Martin Husemann  wrote:
> When talking about it I prefer "zero terminated", or C-string, in
> contrast to C++ std::string (which are objects) or Pascal strings
> (which have an explicit length at the beginning).

Yes, I also prefer the term “C-string"

-- thorpej







Re: null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Martin Husemann
On Sat, Mar 26, 2022 at 04:53:19PM +0100, Roland Illig wrote:
> The term "null-terminated string" is quite common when talking about C.

NULL terminated lists/array are quite common, but NULL is a pointer and
the string is terminated by a 0 char (sometimes spelled as \0 in a string
literal, but implicitly added by the compiler at the end of a literal,
and spelled as NUL in the ascii table).

>  I prefer to keep "null-terminated" here.

I think it is a bug.

When talking about it I prefer "zero terminated", or C-string, in
contrast to C++ std::string (which are objects) or Pascal strings
(which have an explicit length at the beginning).

Martin


null-terminated vs. nul-terminated (was: Re: CVS commit: src/lib/libc/gen)

2022-03-26 Thread Roland Illig

Am 24.03.2022 um 02:55 schrieb David H. Gutteridge:

Module Name:src
Committed By:   gutteridge
Date:   Thu Mar 24 01:55:15 UTC 2022

Modified Files:
src/lib/libc/gen: popen.3

Log Message:
popen.3: minor spelling, grammar, style, and xref tweaks


To generate a diff of this commit:
cvs rdiff -u -r1.22 -r1.23 src/lib/libc/gen/popen.3


The term "null-terminated string" is quite common when talking about C.
 In contrast, the word "nul" in "nul-terminated" always reminds me of
the character abbreviation in ASCII, which has a narrower scope than C.
 I prefer to keep "null-terminated" here.

Roland