from:"Steffen Nurpmeso"

Re: [Issue 8 drafts 0001798]: Must posix_getdents remember file offsets across exec?

2024-03-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker via austin-group-l at The Open Group wrote in
 <2b5daa16a52f70c01b47c5d1ad8b3...@austingroupbugs.net>:
 ...
 |https://austingroupbugs.net/view.php?id=1798 
 ...
 | (0006710) geoffclare (manager) - 2024-03-07 18:14
 | https://austingroupbugs.net/view.php?id=1798#c6710 
 ...
 |> You're missing the fact that the underlying OS does *not* maintain a file
 |position on directory descriptors.
 |
 |Actually, I think I knew that, but had forgotten it.
 |
 |So the Cygwin lseek() must have to fake an offset for fds associated with a
 |directory stream - presumably returning the read count - and accept those
 |faked offsets as input.
 |
 |To make it work for an lseek() on an fd obtained from dup(), as in
 |https://austingroupbugs.net/view.php?id=1798#c6703, couldn't you have dup()
 |notice that the fd passed in is
 |associated with a directory stream and create an association between the
 |new fd and the same directory stream?  Admittedly the code would be more
 |complicated if a directory stream can be associated with more than one fd,
 |but it seems to me that this could be a promising approach that would
 |provide better compatibility with other systems. 

hihihi.  I should have read all emails before sending noise.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001798]: Must posix_getdents remember file offsets across exec?

2024-03-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker via austin-group-l at The Open Group wrote in
 :
 ...
 |https://austingroupbugs.net/view.php?id=1798 
 ...
 | (0006709) corinna_vinschen (reporter) - 2024-03-07 15:00
 | https://austingroupbugs.net/view.php?id=1798#c6709 
 ...
 |You're missing the fact that the underlying OS does *not* maintain a file
 |position on directory descriptors.  The function returning the file
 |position
 |always returns 0 on a directory, independent of the actually read directory
 |entries.  Also, there's no way to lseek on a directory. The only operation
 |available is a "restart" flag to the directory read operation, which
 |allows
 |to specify to start at position 0.
 |
 |So, to be able to implement telldir/seekdir, the DIR struct has to
 |maintain
 |a read counter.  telldir() simply returns the number of directory entries
 |read so far.  Seekdir() is implemented as a "restart" and then reading
 |directory entries
 |in a loop until the counter matches the one given as argument.
 |
 |Having said that, as soon as you fork() a directory descriptor with
 |posix_getdent operation, you not only generate a copy of the underlying OS
 |descriptor, you also duplicate the DIR struct into the new process.  Now
 |the
 |DIR structs are independent from each other.  If you call posix_getdents
 |on
 |one of them, the DIR strucxt in the other process is obviously *not*
 |updated
 |accordingly.  Thus, any lseek() on the directory descriptor in one process
 |is lost on the one in the other directory.
 |
 |I used fork() as an example, but the same goes for dup(), unless you share
 |the same DIR structure for all the directory descriptors in shared memory.
 |
 |Does that clear things up?

How about a very expensive temporary file storing a binary dirent
dump of the entire directory content (in case of a fork)?
Then the child can pick up the read pointer to that file.
If it is not a real directory FD anyway.  Or a file / xy with only
the counter.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001797]: strftime "%s" should be able to examine tm_gmtoff

2024-02-29 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare via austin-group-l at The Open Group wrote in
 :
 |Steffen Nurpmeso wrote, on 26 Feb 2024:
 |>|https://austingroupbugs.net/view.php?id=1797 
 |>  ...
 |>| (0006689) eblake (manager) - 2024-02-26 19:32
 |>| https://austingroupbugs.net/view.php?id=1797#c6689 
 |>  ...
 |>|[.]inconsistent with existing practice where some implementations \
 |>|set it to
 |>|"GMT"[.]
 |> 
 |> The TZ project itself as such changed just recently GMT to UTC.
 |> Hm, 50df7d69f3af5dbb210326b0b25257e48d11983b as of 2022-07-19
 |> already, how time flies.  I remembered what P. Eggert said,
 |> however
 |> 
 |>   POSIX is being revised to require this.
 |> 
 |> Ie this could be some hen and egg, and i think the standard was
 |> right to go to UTC given GMT is no longer publically used afaik.
 |> Therefore
 ...
 |> So i do not think that GMT should be used in any future wording.
 |
 |The main decision point on this was, I think, when it came to light
 |that the standard explicitly allows "date -u" to write either UTC or
 |GMT.  There are systems where "date -u" writes GMT.
 |
 |It came down to a choice of disallowing GMT for "date -u" or allowing
 |GMT for gmtime() (or making no change) and we felt that if the issue
 |had come to light during work on bug 1533 (which added tm_zone) we
 |would likely have allowed both UTC and GMT for gmtime() at that time.
 |
 |We didn't want to force anyone to change an existing "GMT" string to
 |"UTC".

Thanks for the explanation.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001797]: strftime "%s" should be able to examine tm_gmtoff

2024-02-26 Thread Steffen Nurpmeso via austin-group-l at The Open Group



 |https://austingroupbugs.net/view.php?id=1797 
 ...
 | (0006689) eblake (manager) - 2024-02-26 19:32
 | https://austingroupbugs.net/view.php?id=1797#c6689 
 ...
 |[.]inconsistent with existing practice where some implementations set it to
 |"GMT"[.]

The TZ project itself as such changed just recently GMT to UTC.
Hm, 50df7d69f3af5dbb210326b0b25257e48d11983b as of 2022-07-19
already, how time flies.  I remembered what P. Eggert said,
however

  POSIX is being revised to require this.

Ie this could be some hen and egg, and i think the standard was
right to go to UTC given GMT is no longer publically used afaik.
Therefore

 |On page 1211 line 41381 section gmtime() DESCRIPTION, change (already in CX
 ...
 |to:
 ...
 |pointer to an implementation-defined string set to "UTC" or "GMT", which
 |shall have static storage duration.

So i do not think that GMT should be used in any future wording.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001801]: xargs: add -P option

2024-02-21 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker via austin-group-l at The Open Group wrote in
 <5fd3ccb27ac7b65bcfbd97abb3a03...@austingroupbugs.net>:
 ...
 |https://austingroupbugs.net/view.php?id=1801 
 ...
 |related to  0001811 xargs: add -P option to FUTURE DIRECTIO...
 ...
 | (0006670) gabravier (reporter) - 2024-02-21 00:20
 | https://austingroupbugs.net/view.php?id=1801#c6670 
 ...
 |> And even more, how are these parallel invocations expected to interact
 |with the CONSEQUENCES OF ERRORS section of the spec (page 3603 in D4 of
 |I8). That is, if an invocation of utility does exit 255, or is killed by a
 ...
 |> And what is to be done with those other invocations still running when
 |one exits in one of those ways - are they to be killed, orphaned, or is
 |xargs to wait for them to finish before terminating? In that latter case,
 |should more diagnostic messages be written if more of the invocations also
 |exit 255, or via a signal? 
 |
 |Pretty much every implementation differs in this regard:
 |- GNU findutils writes a diagnostic and waits for other invocations to
 |finish before terminating. If another invocation also exits with 255, it
 |writes another diagnostic and then proceeds to immediately invoke undefined
 |behavior by calling exit within an atexit handler. On my machine that
 |results in it exiting after printing that second diagnostic (leaving the
 |remaining invocations orphaned)
 |- FreeBSD writes a diagnostic and waits for other invocations to finish
 |before terminating, and prints more diagnostics if other invocations also
 |exit with 255
 |- OpenBSD and illumos write a diagnostic and immediately exit, leaving the
 |other invocations orphaned
 |- BusyBox has the same behavior as GNU findutils has on my machine (prints
 |diagnostic, waits, prints another diagnostic if another invocation also
 |exits with 255 and exits then) but manages to do so without invoking
 |undefined behavior, at least
 |- Toybox writes a diagnostic and then proceeds to continue execution as
 |though everything is fine (it does this without -P too so that seems
 |clearly just non-conforming...)

That is an almost unbelievable disastrous finding.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001800]: be*toh() have no entries

2024-01-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

 |https://austingroupbugs.net/view.php?id=1800 
 ...
 |Summary:be*toh() have no entries
 ...
 | (0006640) steffen (reporter) - 2024-01-25 21:29
 | https://austingroupbugs.net/view.php?id=1800#c6640 
 |-- 
 |No it should not.
 |There is the very same thing for the le*toh() series, please see draft 4,
 |page 1327.  I am only asking for the very same thing for the be*toh()
 |series. 

Because Mantis did not submit it to the ML, i had posted

  You are right. The entry on page 660 actually *is* the
  description for the be*toh() series, even though it starts with
  htobe16() and appears like an endian.h overview as such, but
  that in turn is on page 240.

  So that is a false issue that i withdraw. (I would resort the
  functions alphabetically at least, with an empty line in between
  the series even.

in a successive message.

Yep, rapid searching in less(1) on a PDF-to-text conversion of
draft 4 it was, hit "n" twice in a row and you are on page 660 not
240; *but* i personally would resort the function order on page
660 for sure.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001799]: endian.h unconditionally requires 64-bit integers

2024-01-23 Thread Steffen Nurpmeso via austin-group-l at The Open Group

  ...
 |https://austingroupbugs.net/view.php?id=1799 
 ...
 |Summary:endian.h unconditionally requires 64-bit \
 |integers
 ...

Mantis did not post the follow-up message, neither to me nor to

  https://www.mail-archive.com/austin-group-l@opengroup.org/

Something is wrong with it.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

64-bit integer types?

2024-01-22 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

I just read Paul Eggert on the IANA TZ list saying that POSIX does
not require 64-bit integer times, and when i look into that i see
(for stdint.h):

  12961If an implementation provides integer types with width 64 
that meet these requirements,
  12962then the following types are required:
  12963int64_t
  12964uint64_t

But from a quick search this is the only such optional occurrence.
The standard imposes the presence of the typedefs at other places,
for example endian.h, with words like "For each of the sizes 16,
32 and 64,", which rather implies 64-bit being non-optional.

Shall i open an issue, or what is to be done.
I mean, i used 64-bit integers with gcc __extension__ 25 years
ago, the Microsoft world had them (from reading), and JAVA had
them by then, too.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: mantis does not email-emit my web edits?

2024-01-15 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso via austin-group-l at The Open Group wrote in
 <20240108212050.2X5aptGT@steffen%sdaoden.eu>:
 |It seems Mantis did record, but not actually post my web edits of
 |this year, including the opened issue on tzalloc etc.
 |It would be nice if these would come :)

It seems

  https://austingroupbugs.net/view.php?id=1794

  0001794: Please add tzalloc/tzfree and localtime_rz, mktime_z interfaces 

from 2024-01-06 has still not been posted to the ML says the
mail-archive.
Just so that it is known that this issue has been created.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-09 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Andrew Josey via austin-group-l at The Open Group wrote in
 <36c65d39-81c9-4852-9b1d-43533395d...@opengroup.org>:
 |> On 5 Jan 2024, at 05:12, Robert Elz via austin-group-l at The Open \
 |> Group  wrote:
 |> 
 |>Date:Thu, 04 Jan 2024 23:24:26 +0100
 |>    From:    Steffen Nurpmeso 
 |>Message-ID:  <20240104222426.ai7_3Mvo@steffen%sdaoden.eu>
 |> 
 |>| I was hoping for the draft; the selection list does not offer
 |>| anything but ..TC2 and it.
 |> 
 |> If you want, you can submit a bug now, using any base standard
 |> that is in some way still current.   It just won't get processed
 |> at all (beyond random notes being added) until the next standard
 |> is being worked on, so submitting now is kind of pointless.
 |
 |Not necessarily, Austin/SD6 (https://www.opengroup.org/austin/docs/austi\
 |n_sd6.txt) lays out the Committee Maintenance Procedures for the Approved \
 |Standard, and there is a section on new work items.
 |
 |So a proposal could advance that way, and lead to a separate standard \
 |adopted by one of the sponsoring organizations, and later adopted to \
 |issue 9. 
 | 
 |Over the years we have progressed a number of new API sets this way, \
 |before they went into the main standard ( the Extended API Sets Parts \
 |1..4, and the Additional APIs for the Base Specifications Issue 8 Parts \
 |1 and 2).

This interface is a good candidate for such a sponsor and
inclusion for sure.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

mantis does not email-emit my web edits?

2024-01-08 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

It seems Mantis did record, but not actually post my web edits of
this year, including the opened issue on tzalloc etc.
It would be nice if these would come :)

Ciao -- and a good and healthy 2024 everybody, if at all possible!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-05 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Robert Elz wrote in
 <2506.1704431...@jacaranda.noi.kre.to>:
 |Date:Thu, 04 Jan 2024 23:24:26 +0100
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20240104222426.ai7_3Mvo@steffen%sdaoden.eu>
 |
 || I was hoping for the draft; the selection list does not offer
 || anything but ..TC2 and it.
 |
 |If you want, you can submit a bug now, using any base standard
 |that is in some way still current.   It just won't get processed
 |at all (beyond random notes being added) until the next standard
 |is being worked on, so submitting now is kind of pointless.

Yes i will open one soon.  I will not add much more than in the
email i wrote, so page numbers and such are no problem.
tzalloc/tzfree are completely new anyway, and the other two
functions get a paragraph added where they belong.

 |On the other hand, delaying may lead to a much better proposal.
 |I in particular would like to see "struct tm" given a complete
 |overhaul - resulting in a struct with a different name of
 |course.   And then, naturally, the interface routines that
 |manipulate it all need redesigning (and renaming).
 |
 |That would be the perfect opportunity to make all the new ones
 |thread safe, and just allow what is there now to wither away.
 |
 |Of course, this is not the place to do that design (and implementation)
 |that needs to happen elsewhere, and then be spread amongst the
 |various systems first - only then should anything happen in the
 |standards universe.

Mostly "everyone i know" was doing some kind of "datetime" object.
If a the native C library would make that fast and thread-safe out
of the box, i would expect many scripting languages (and simple
C or C++ wrappers) to be very happy about it.
Actually having such an API as such would be even better ... but
i can hardly imagine this to happen.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-04 Thread Steffen Nurpmeso via austin-group-l at The Open Group

enh wrote in
 :
 |for other precedent, bionic [Android] has tzalloc()/tzfree(),
 |mktime_z(), localtime_rz(), and the timezone_t type since API level
 |35:
 |
 |https://android.googlesource.com/platform/bionic/+/main/libc/include/time.h

Not yet released?  "Vanilla ice cream".
It really is the better interface.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-04 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <25502.1704337...@jacaranda.noi.kre.to>:
 |Date:Thu, 04 Jan 2024 00:21:45 +0100
 |From:    "Steffen Nurpmeso via austin-group-l at The Open Group" \
 |
 |Message-ID:  <20240103232145.6dAnvvQf@steffen%sdaoden.eu>
 |
 || My question: against which standard should an issue be opened?
 |
 |The next one, after it is issued (ie: just wait, and send in the
 |request after the next standard is published, which is probably
 |this year sometime) - it is far too late for new interfaces in the
 |one currently being developed (the cutoff for those was back in
 |August or something like that).

Sad.  Very sad.

 |The means, issue 9 is the earliest any new interfaces can be added.

I was hoping for the draft; the selection list does not offer
anything but ..TC2 and it.

 |kre
 --End of <25502.1704337...@jacaranda.noi.kre.to>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

IANA TZ / NerBSD TZ: tzalloc/tzfree and localtime_rz, mktime_z

2024-01-03 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

A happy and healthy new year 2024 i wish.

As stated in [1] the localtime() and mktime() series of functions
have the inherent problem of not being thread-safe regarding
possible changes to the time zone: in a pure POSIX environment
changes to TZ always affect global data.

If my memory serves correctly, about a decade ago the NetBSD
project contacted the IANA TZ maintainer in order to upstream
a new, truly thread-safe interface that addresses this issue.
Since some time in 2014 (see [1]) the IANA TZ database, which
includes the Public Domain aka open source code as is used by many
projects to implement the time related programming interface,
includes a new series of functions:

   timezone_t tzalloc(char const *TZ);
   void tzfree(timezone_t tz);

   struct tm *localtime_rz(timezone_t restrict zone,
   time_t const *restrict clock,
   struct tm *restrict result);
   struct tm *restrict tm);
   time_t mktime_z(timezone_t restrict zone,
   struct tm *restrict tm);

  [1] https://austingroupbugs.net/view.php?id=1788

If POSIX would offer this interface, the open source (public
domain) code and manual of which are available via the IANA TZ,
truly "thread-safe" time programming becomes possible in POSIX.

This is especially important if no CLOCK_TAI is available.
For an example, here is what the widely used NTP server chrony
performs in order to achieve its task:

  tm = gmtime();
  if (!tm)
return tz_leap;

  stm = *tm;

  /* Temporarily switch to the timezone containing leap seconds */
  tz_env = getenv("TZ");
  if (tz_env) {
if (strlen(tz_env) >= sizeof (tz_orig))
  return tz_leap;
strcpy(tz_orig, tz_env);
  }
  setenv("TZ", leap_tzname, 1);
  tzset();

  /* Get the TAI-UTC offset, which started at the epoch at 10 seconds */
  t = mktime();
  if (t != -1)
tz_tai_offset = t - when + 10;

  /* Set the time to 23:59:60 and see how it overflows in mktime() */
  stm.tm_sec = 60;
  stm.tm_min = 59;
  stm.tm_hour = 23;

  t = mktime();

  if (tz_env)
setenv("TZ", tz_orig, 1);
  else
unsetenv("TZ");
  tzset();

  if (t == -1)
return tz_leap;

  if (stm.tm_sec == 60)
tz_leap = LEAP_InsertSecond;
  else if (stm.tm_sec == 1)
tz_leap = LEAP_DeleteSecond;

  *tai_offset = tz_tai_offset;

I want to point out that setting an environment variable can be
a costly operation, but moreover changing the timezone as such
may involve several file system operations, being a potentially
very expensive operation.  (By the way the draft 4 uses "file
system" as well as "filesystem".)

With the new interface two timezone objects can be preallocated,
and the operations are totally detached from global data and
multithread-safe.

My question: against which standard should an issue be opened?

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-02 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Stephane Chazelas wrote in
 <20230902084912.vdfedsgbnat2w...@chazelas.org>:
 |2023-09-01 23:28:50 +0200, Steffen Nurpmeso via austin-group-l at The \
 |Open Group:
 ...
 |>|FWIW, a "printf %b" github shell code search returns ~ 29k
 |>|entries
 |>|(https://github.com/search?q=printf+%25b+language%3AShell=code=Sh\
 |>|ell)
 ...
 |> Actually this returns a huge amount of false positives where
 |> printf(1) and %b are not on the same line, let alone the same
 ...
 |Apparently, we can also search with regexps and searching for
 |printf.*%b
 |(https://github.com/search?q=%2Fprintf.*%25b%2F+language%3AShell=code)
 |It's probably a lot more accurate. It returns ~ 19k.
 ...
 |> Furthermore it shows a huge amount of false use cases like
 ...
 |Yes, I also see a lot of echo -e stuff that should have been
 |echo -E stuff (or echo alone in those (many) implementations
 |that don't expand by default or use the more reliable printf
 |with %s (not %b)).
 |
 |> It seems people think you need this to get colours mostly, which
 ...
 |Incidentally, ANSI terminal colour escape sequences are somewhat
 |connecting those two %b's as they are RGB (well BGR) in binary
 |(white is 7 = 0b111, red 0b001, green 0b010, blue 0b100), with:
 |
 |R=0 G=1 B=1
 |printf '%bcyan%b\n' "\033[3$(( 2#$B$G$R ))m" '\033[m'
 |
 |(with Korn-like shells, also $(( 0b$B$G$R )) in zsh though zsh
 |has builtin colour output support including RGB-based).

..and, off-topic, but in my opinion that is also false usage, one
should use tput(1) instead, and then simply printf(1) (or echo(1)
(or cat(1))) the output, something like, fwiw :),

  color_init() {
  [ -n "${NO_COLOUR}" ] && return
  # We do not want color for "make test > .LOG"!
  if [ -t 1 ] && command -v tput >/dev/null 2>&1; then
  { sgr0=$(tput sgr0); } 2>/dev/null
  [ $? -eq 0 ] || return
  { saf1=$(tput setaf 1); } 2>/dev/null
  [ $? -eq 0 ] || return
  { saf2=$(tput setaf 2); } 2>/dev/null
  [ $? -eq 0 ] || return
  { saf3=$(tput setaf 3); } 2>/dev/null
  [ $? -eq 0 ] || return
  { saf5=$(tput setaf 5); } 2>/dev/null
  [ $? -eq 0 ] || return
  { b=$(tput bold); } 2>/dev/null
  [ $? -eq 0 ] || return

  COLOR_ERR_ON=${saf1}${b} COLOR_ERR_OFF=${sgr0}
  COLOR_DBGERR_ON=${saf5} COLOR_DBGERR_OFF=${sgr0}
  COLOR_WARN_ON=${saf3}${b} COLOR_WARN_OFF=${sgr0}
  COLOR_OK_ON=${saf2} COLOR_OK_OFF=${sgr0}
  unset saf1 saf2 saf3 b
  fi
  }

  ...

  printf '%s%s%s' "${COLOR_WARN_ON}" "$SOME_MSG" "${COLOR_WARN_OFF}"

Of course this is also only ANSI via sgr0 (:-|

 |Speaking of stackexchange, on the June data dump of
 |unix.stackexchange.com:
 |
 |stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf\
 |.*%b'
 |494
 |
 |(FWIW)
 |
 |Compared with %d (though that will have entries for printf(3) as well):
 |
 |stackexchange/unix.stackexchange.com$ xml2 < Posts.xml | grep -c 'printf\
 |.*%d'
 |3444

I am totally stunned by the ratio.  I myself have never used %b
(like this, aka for printf).

 --End of <20230902084912.vdfedsgbnat2w...@chazelas.org>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: bug#65659: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Stephane Chazelas via austin-group-l at The Open Group wrote in
 <20230901181024.pwx4plwclz7ij...@chazelas.org>:
 |2023-09-01 07:54:02 -0500, Eric Blake via austin-group-l at The Open Group:
 ...
 |> How many scripts in the wild actually use %b, though?  And if there
 |> are such scripts, anything we can do to make it easy to do a drop-in
 |> replacement that still preserves the old behavior (such as changing %b
 |> to %#s) is going to be easier to audit than the only other
 |> currently-portable alternative of actually analyzing the string to see
 |> if it uses any octal or \c escapes that have to be re-written to
 |> portably function as a printf format argument.
 |[...]
 |
 |FWIW, a "printf %b" github shell code search returns ~ 29k
 |entries
 |(https://github.com/search?q=printf+%25b+language%3AShell=code=Sh\
 |ell)
 |
 |That likely returns only a small subset of the code that uses
 |printf with %b inside the format and probably a few false
 |positives, but that gives many examples of how printf %b is used
 |in practice.

Actually this returns a huge amount of false positives where
printf(1) and %b are not on the same line, let alone the same
command, if you just scroll down a bit it starts like neovim match

 pr_title="${pr_title// /,}" # Replace spaces with commas.
 pr_title="$(printf 'vim-patch:%s' "${pr_title#,}")"

(bash only btw).
Furthermore it shows a huge amount of false use cases like

 printf >&2 "%b\n" "The following warnings and non-fatal errors were 
encountered during the installation process:"

This is only the first result page.
It seems people think you need this to get colours mostly, which
then, it has to be said, is also practically mislead.  (To the
best of *my* knowledge that is.)

Ah it is a copy world, and for one Stephane at stackoverflow
there are 99 that fool and mislead you, or do not know for sure
themselves, but also copy and paste!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Fwd: Re: RFC: changing printf(1) behavior on %b

2023-09-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Dropped that from my mail queue as i realized there are many
other receivers.  (And saw Stephane writing much more to
bug-bash@.)

--- Forwarded from Steffen Nurpmeso  ---
Date: Fri, 01 Sep 2023 18:34:34 +0200
Author: Steffen Nurpmeso 
From: Steffen Nurpmeso 
To: "Oğuz via austin-group-l at The Open Group" 
Cc: Phi Debian , chet.ra...@case.edu, Eric Blake 
, bug-coreut...@gnu.org, bug-b...@gnu.org, Steffen Nurpmeso 

Subject: Re: RFC: changing printf(1) behavior on %b
Message-ID: <20230901163434._byqv%stef...@sdaoden.eu>
Mail-Followup-To: "Oğuz via austin-group-l at The Open Group" 
, Phi Debian , 
chet.ra...@case.edu, Eric Blake , bug-coreut...@gnu.org, 
bug-b...@gnu.org, Steffen Nurpmeso 
OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; 
url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt

Oğuz via austin-group-l at The Open Group wrote in
 :
 |On Fri, Sep 1, 2023 at 7:41 AM Phi Debian  wrote:
 |> My vote is for posix_printf %B mapping to libc_printf %b
 |
 |In the shell we already have bc for base conversion. Does POSIX really
 |have to support C2x %b in the first place?

I would even say BASE#NUM should also be supported, so just to
have a wholistic approach all through the system.
So yet not standardized for sh(1); but available for many.
It is very handy to avoid misinterpretation of leading zeroes.

(However it is still not easy to use as you have to avoid
leading hyphen-minus aka have to normalize data to use this
syntax --- and when doing so you can very well remove leading
zeroes yourself.  Unfortunately bragging i presume the "reparse"
that my MUA offers will never be available for sh(1)ells,

  $ s-nail -#:/ -X 'echo =$((64#  -5))=; vexpr = "64#  -5"; xit'
  =-5=
  0b        1011
  013 | 0xFFFB | -5

ie BASE# will always be unsigned; but nonetheless.)

I would possibly even say grouping via _ shall be supported in
addition, as so many especially "new" (to me ;)) languages etc do
offer, for example 0b_0001.  (I do not support that myself.
And it interferes with base 64 aka 64#*.)
  ...
 -- End forward <20230901163434._byqv%stef...@sdaoden.eu>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: probable UB in recently accepted text

2023-07-22 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Thorsten Glaser wrote in
 :
 |https://www.austingroupbugs.net/view.php?id=561#c6085 (the accepted
 |text) suggests that…
 |
 | versions,  the  size  is  typically in the range 92 to 108. An \
 | application can
 | deduce the size by using sizeof(((struct sockaddr_un *)0)->sun_pat\
 | h).
 |
 |… but I was recently told that attempting that is UB because it
 |dereferences a nil pointer, even though it’s only within a sizeof,
 |and the current C editor didn’t deny that, just stating that

The discussion was that Linux kernel makes excessive use of this,
and that its documentation even enforces its use (last i looked).
So even the palmy-beach-standard-comittee-meeting-on-expenses guys
are expected to not mess this up.  (Not really sorry for listening
ISO C members; the picture i have in mind was W3C, however.)

 |“This has been hotly debated for years” and to use offsetof instead
 |(which does not work for the last member, incidentally) because the
 |implementation of offsetof may do “crimes” an application cannot.
 |
 |The actual discussed thing was…
 | #define FIELD_SIZEOF(t,f) (sizeof(((t*)0)->f))
 |… so basically the same.
 |
 |Note that sizeof-offsetof is not the same because there may be padding.

Offsetof is

  #if su_CC_VCHECK_CLANG(5, 0) || su_CC_VCHECK_GCC(4, 1) || su_CC_VCHECK_PCC(1, 
2) || defined DOXYGEN
  /*! The offset of field \a{F} in the type \a{T}. */
  # define su_FIELD_OFFSETOF(T,F) __builtin_offsetof(T, F)
  #else
  # define su_FIELD_OFFSETOF(T,F) su_S(su_uz,su_S(su_up,&(su_R(T *,su_NIL)->F)))
  #endif

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001642]: DUMB terminal is not defined

2023-04-22 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 :
 ...
 |https://austingroupbugs.net/view.php?id=1642 
 ...
 |Summary:DUMB terminal is not defined
 ...
 | (0006270) ajosey (manager) - 2023-04-22 08:56
 | https://austingroupbugs.net/view.php?id=1642#c6270 
 |-- 
 |The term dumb is one we are recommended to avoid in our style guide
 |regarding use of ableist language.
 |
 |3.1.7 Ableist Language
 |
 |When trying to achieve a friendly and conversational tone, problematic

Often i heard a friendly and conversational tone during the
slicing of the bodies of cattles or gooses (in the past), yet
everybody was looking only down.  They would not have heard the
"wait, but i am here" anyway.  References to the Tibetan Book of
the Dead possibly lead too far .. yet not too far.

 |ableist language may slip in. This can come in the form of figures of
 |speech and other turns of phrase. Be sensitive to your word choice,
 |especially when aiming for an informal tone. Ableist language includes
 |words or phrases such as crazy, insane, blind to or blind eye to, cripple,
 |dumb, and others. Choose alternative words depending on the context. 

Yet this is the established name for these kind of terminals for many
decades, and only this, no other one, everything else would need
to be invented, and distributed, and be used as appropriate, which
requires a lifetime.  And not mine.

  #?0|kent:unix-hist$ git grep -i dumb 1abc11ba34
  1abc11ba34:usr/dict/words:dumb
  1abc11ba34:usr/dict/words:dumbbell
  1abc11ba34:usr/doc/beginners/u3:and of course if you make a dumb mistake
  1abc11ba34:usr/doc/security:was rather dumb:
  1abc11ba34:usr/man/man7/term.7:dumb terminals with no special features
  1abc11ba34:usr/src/games/backgammon.c:  printf( "Congratulations! You 
have just defeated a dumb machine.\n");
  1abc11ba34:usr/src/games/fish.c:"The default is pretty dumb!",

  #?0|kent:unix-hist$ git describe 1abc11ba34
  Research-V7

  $ gl1 Research-V7
  commit 1abc11ba348fd70a5ba6392c2f2e141e16b78685 (tag: refs/tags/Research-V7, 
refs/remotes/origin/Research-Release)
  Merge: bbfde83347 dd1f4b5d3f
  Author: Ken Thompson 
  AuthorDate: 1979-08-25 17:59:53 -0500
  Commit: Ken Thompson 
  CommitDate: 1979-08-25 17:59:53 -0500

  Research V7 release
  Snapshot of the completed development branch

  Synthesized-from: v7

  t$ git grep -i dumb Research-V7~292
  Research-V7~292:.ref-Research-V6/usr/doc/secur/secur:is rather dumb:
  Research-V7~292:usr/man/man7/term.7:dumbterminals with no special 
features

  #?0|kent:unix-hist$ git describe Research-V7~292
  Research-V6-217-gfbb78fc6b6

  #?0|kent:unix-hist$ gl1 Research-V7~292
  commit fbb78fc6b66ef268ed39f482c4ac0e10ed07f9fe
  Author: Ken Thompson 
  AuthorDate: 1979-01-10 15:17:45 -0500
  Commit: Ken Thompson 
  CommitDate: 1979-01-10 15:17:45 -0500

  Research V7 development
  Work on file usr/man/man7/term.7

  Co-Authored-By: Dennis Ritchie 
  Synthesized-from: v7

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: $? behaviour after comsub in same command

2023-04-06 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <6906.1680741...@jacaranda.noi.kre.to>:
 ...
 |The issue here is that people tend to think of
 | a=1
 |as a command.   It isn't (not as people think of it anyway).
 |But with that mindset they treat
 | a=1 b=$a c=$b
 |as 3 commands, one after the other.   It isn't.

To come back to the bug i reported to FreeBSD ([1]).

  [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251770

There i say

  3385   4.23Variable Assignment
  3386   In the shell command language, a word consisting of the 
following parts:
  3387   varname=value
  3388   When used in a context where assignment is defined to 
occur and at no other time, the value
  3389   (representing a word or field) shall be assigned as the 
value of the variable denoted by varname.
  3390   Note:  For further information, see XCU Section 
2.9.1 (on page 2365).

  ---

  75482   2.9.1Simple Command

  754954.Each variable assignment shall be expanded for 
tilde expansion, parameter expansion,
  75496  command substitution, arithmetic expansion, 
and quote removal prior to assigning the
  75497  value.

  ---

  75501  Variable assignments shall be performed as follows:
  75502 •   If no command name results, variable assignments shall 
affect the current execution
  75503 environment.

  ---

  So everything should be handled sequentially, making it a bug.

And that is true, no?  If expansion has to take place, and the
assignment has been performed, .. it has been performed?

  ---

  75504 •   If the command name is not a special built-in utility 
or function, the variable assignments
  [.]
  75507 4. In this case it is unspecified:
  75508   — Whether or not the assignments are visible for 
subsequent expansions in step 4
  75509   — Whether variable assignments made as side-effects 
of these expansions are visible for
  75510 subsequent expansions in step 4, or in the current 
shell execution environment, or
  75511 both

  ---

  So it allows to setup the "execution environment of the command" entirely 
from the current environment, which is effectively read-only.  As you say.

So maybe null command and that is not a bug?
But all shells except FreeBSD do this; also from the report:

  #?2|kent$ for s in dash bash mksh bosh; do $s -c 'du=ich wir='"'"'hey 
'"'"'$du; echo $wir'; done
  hey ich
  hey ich
  hey ich
  hey ich

I am no shell expert whatsoever.  My mailer will never support
that (except for assignment in arithmetic expression eg $((i=1))):

 This behaviour is different to the SHELL[644], which is a programming
 language with syntactic elements of clearly defined semantics, and there‐
 fore capable to sequentially expand and evaluate individual elements of a
 line.  ‘? set one=spoon two=$one’ for example will never assign ‘spoon’
 to two, because it is the command set[275] that performs the assignment,
 long after the expansion has happened.

So i am out.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: $? behaviour after comsub in same command

2023-04-05 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20230405193451.u9bfz%stef...@sdaoden.eu>:
 |Harald van Dijk wrote in
 | <32a27194-a5ff-68d6-3a87-9120e34d8...@gigawatt.nl>:
 ||On 05/04/2023 17:44, Oğuz wrote:
 ||> 5 Nisan 2023 Çarşamba tarihinde Harald van Dijk  <mailto:a...@gigawatt.nl>> yazdı:
 ||> 
 ||> I am not sure which other ash based shells you were looking at, 
 ||> 
 ||> /bin/sh on NetBSD and FreeBSD
 ||
 ||Thanks. I indeed see the same results as you on a recent version of 
 ||FreeBSD sh (the one on the FreeBSD 13.1 installation media).
 ||
 ||There is a legitimate benefit to this: swapping variables without an 
 ||additional helper variable actually works in that implementation.
 ||
 ||   a=1 b=2
 ||   a=$b b=$a
 ||   echo $b $a
 ||
 ||As it turns out, the at the moment still rather incomplete mrsh 
 ||<https://mrsh.sh/> also behaves this way.
 |
 |I think i have open(ed) a bug tracker item on that in the past.
 |Ah yes, their bugzilla says
 |
 |  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251770

That is the above is about sequential workings of assignments/etc
on a line, nothing else (i have not really tracked this thread as
it is much too specific on sh internals).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: $? behaviour after comsub in same command

2023-04-05 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Harald van Dijk wrote in
 <32a27194-a5ff-68d6-3a87-9120e34d8...@gigawatt.nl>:
 |On 05/04/2023 17:44, Oğuz wrote:
 |> 5 Nisan 2023 Çarşamba tarihinde Harald van Dijk  > yazdı:
 |> 
 |> I am not sure which other ash based shells you were looking at, 
 |> 
 |> /bin/sh on NetBSD and FreeBSD
 |
 |Thanks. I indeed see the same results as you on a recent version of 
 |FreeBSD sh (the one on the FreeBSD 13.1 installation media).
 |
 |There is a legitimate benefit to this: swapping variables without an 
 |additional helper variable actually works in that implementation.
 |
 |   a=1 b=2
 |   a=$b b=$a
 |   echo $b $a
 |
 |As it turns out, the at the moment still rather incomplete mrsh 
 | also behaves this way.

I think i have open(ed) a bug tracker item on that in the past.
Ah yes, their bugzilla says

  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251770

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001652]: make: missing option argument

2023-04-03 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 <0c040d43e9caec7f953fb5bb47219...@austingroupbugs.net>:
 ...
 |-- 
 | (0006242) geoffclare (manager) - 2023-04-03 09:31
 | https://austingroupbugs.net/view.php?id=1652#c6242 
 |-- 
 |Since the maxjobs option-argument is clearly present in the resolution of
 |bug https://austingroupbugs.net/view.php?id=1436 this is simply an \
 |editorial
 |mistake[.]
 ...

Oh i already have draft 3, and it prevented an issue against
sigsuspend (misnamed argument) from my side already.
(poppler's pdftotext -layout is a good converter.)
But things i unfortunately am now even credited for .. you know,
cake, candlelight, epic noble music, to celebrate ones scratch in
the fate of eternity, eh, that was missing yet.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Austin Group questions on iconv()

2023-03-09 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Eric Blake wrote in
 <20230309164325.xmqp7mf62obpn...@redhat.com>:
 |In today's Austin Group meeting, the folks discussing POSIX had a
 |question for Bruno and/or anyone else with an idea on how the
 |standards should approach a difference in behavior between Solaris and
 |GNU iconv() implementations.
 |
 |For context, today's meeting minutes:
 |https://posix.rhansen.org/p/2023-03-09 around line 1635

Line 367.
(Effectively a no-op to look at since it is fullfilled with your
email, is it.)

 |and the bugs leading to the question:
 |
 |https://austingroupbugs.net/view.php?id=1635
 | "0001635: iconv: please be more explicit in input-not-convertible case"
 | still open - iconv() resulting in EILSEQ not because of input
 | encoding error but because of output being unable to encode the
 | transliteration
 |
 |https://austingroupbugs.net/view.php?id=1007
 | "0001007: iconv function not allowed to fail to convert valid sequences"
 | resolved at https://austingroupbugs.net/view.php?id=1007#c3330,
 | standardizing the //IGNORE, //TRANSLIT, and //NON_IDENTICAL_DISCARD
 | modifiers
 |
 |It seems that bug 1635 is saying that the Solaris implementation
 |provides a conversion that application writers can use to get reliable
 |output but does not provide some desired features, and the standard
 |should change to acknowledge that the GNU implementation provides some
 |of those desired features.  However, the GNU implementation includes

That all may be 1007.

 |some ambiguities that make it unreliable.  It seems to ask us to
 |change the standard to allow a modified version of the GNU iconv()
 |function that could be reliably interpreted by an appication writer.

That is 1635: it gives merits to that the GNU approach that does

 |For example, overloading EILSEQ to mean that there was an invalid
 |character in the input stream or that there was no transliteration

which application programmers cannot deal with: invalid input and
not being able to convert to some output character set
(losslessly) are very different things.  (To at least some
applications.)

 |available in the output codeset to convert that input character makes
 |it impossible for an application to determine which of those two
 |problems caused iconv() to fail.

Yes, exactly.

 |Can we get an explanation on how an application writer is supposed to
 |write code to reliably use the iconv() in GNU libc, given the above
 |example?  Can we get help in identifying exactly what changes need to

I want to urge people to read the GNU bug report that is linked
from 1635 where the honourable author of the GNU iconv library
points to how gnulib does it, which in turn is then quoted again
in issue 1635.

 |be made to POSIX (after bugid:1007 has been integrated) to allow GNU
 |behavior and get reliable results without breaking applications that
 |currently work with the Solaris iconv() interface.

And before _this_ of yours starts rolling, i want to throw in that
transliteration of characters is not the same as placing
a replacement, or doing the failure the GNU does but in a way that
application writers can properly react upon.

Application writers need to be able to write tests,
transliterations may be anything, and change as time goes by.
Being able to fail fast in case of errors is also an important
property that //transliterations do not fulfill.

The merits of the standard "inventing" a special mode that
enforces the GNU behaviour, but with an identifiable error code
instead of the overloaded EILSEQ, would allow exactly this.
Software which supports the //modifiers must transport state
during iconv to react or fail properly, so this seems (looking at
open source code) to be a rather minimal change.
(It could be that the standard already adds keywords that require
work in existing implementations.  But i am not sure.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Issue 991: it should be changed

2023-02-20 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 :
 |Steffen Nurpmeso wrote, on 19 Feb 2023:
 |> After looking into V10 mail, and all BSD mails i think issue 991
 |> should be changed.  This is my fault.
 |
 |Bug 991 has been applied, so it's not possible to change now.
 |(At least, we can't change the editing instructions - they need to
 |match the change that was made. The formal interpretation response
 |could I suppose be changed but I don't think it needs to be as it
 |says "The standard is unclear on this issue" which remains true.)
 |
 |The way to correct the current text that resulted from applying 991
 |is either to include the appropriate changes in bug 1634 or, if
 |that's not considered sufficiently closely related, to submit a new
 |bug.

Thanks.
I will edit the new one soon.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Issue 991: it should be changed

2023-02-19 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20230219012433.1dxoo%stef...@sdaoden.eu>:
 |After looking into V10 mail, and all BSD mails i think issue 991
 |should be changed.  This is my fault.
 ...
 |As i stated in the description, and to extend this description,
 |"mbox" (and "hold", "preserve", "touch") only work in a primary
 |(system) mailbox.
 ...
 |Because of all this i think the interpretation should be changed.
 ...
 |This is then, in effect, what all codebases do.
 |I think in 1979 when Kurt Shoens came along with this primary /
 |secondary scheme it was a convenient way to deal with "unresolved
 |messages", and like the *flipr* variable can be used to switch in
 ...

To be more precise i think the state of the codebases (for example
V10 mail: "what does delempty do?") can be deduced directly
from M. Douglas McIlroy's "A Research UNIX Reader:[..]"

  [.]Electronic mail was there from the start.  Never satisfied
  with its exact behavior, everybody touched it at one time or
  another[.]

And the many commands around the primary (system) and secondary
mailbox "way of doing things" make not much sense except for
working around the user's setting of the "hold" variable:
hold/preserve when not set, mbox (touch) when set.

I apologise for issue 991 (even though i think having the "mbox"
command available by the means of it has at least some merits),
but i think all those commands should simply refer to the "hold"
variable and make explicit that they only work in a primary
(system) mailbox.

P.S.:
interesting question!

  #?0|kent:V10$ du -sh .
  135M.
  #?0|kent:V10$ grep -ri delempty
  lbin/mailx/quit.c:/* adb:  what does delempty do?
  lbin/mailx/quit.c:  PRIV(delempty(statb.st_mode, 
mailname));

(PRIV() is
#define PRIV(x)   setgid(myegid), (x), setgid(myrgid);)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Issue 991: it should be changed

2023-02-18 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

After looking into V10 mail, and all BSD mails i think issue 991
should be changed.  This is my fault.

  Change on page 2923, starting at line 96567:

  Arrange for the given messages to end up in the mbox save
  file when mailx terminates normally.

  to:

  Arrange for the given messages to be moved to the mbox save
  file when mailx terminates normally or when the folder is
  changed. This command can be used in any folder; for
  messages residing in the system mailbox, mailx shall ignore
  the settings of the internal variables [no]hold and
  [no]keepsave when the mbox command has been used to mark
  them explicitly.

As i stated in the description, and to extend this description,
"mbox" (and "hold", "preserve", "touch") only work in a primary
(system) mailbox.

Whereas V10 mail _also_ allows "mbox" and "touch" to be called in
any box, it does not allow "hold" and "preserve" but in a system
mailbox:

if (edit) {
printf("Cannot \"preserve\" in edit mode\n");
return(1);
}

However, the code paths are so that only in a system mailbox the
MBOX saving will take place (it is only in quit())

if (edit)
edstop();
else {
quit();
Verhogen();
}
...
if (edit) {
if (setjmp(srbuf))
exit(0);
edstop();
} else {
Verhogen();
if (value("exit") != NOSTR)
exit(1);
else
quit();
}

Not only that, it does so only on read-write boxes (it is a move):

/*
 * If we are read only, we can't do anything,
 * so just return quickly.
 */

mcount = 0;
if (readonly)
return;

It is difficult and expensive to adjust these totally different
code paths.
Furthermore it can be assumed that the *hold* / `mbox' / `touch' /
`hold' aka `preserve' way of mail handling is nothing a "modern"
mail user does; i am unsure whether there is any other mail client
who ever used that scheme.

Because of all this i think the interpretation should be changed.

  In a primary (system) mailbox, arrange for the given messages to
  be moved to the mbox save file when mailx terminates normally or
  when the folder is changed.
  This effect of this command is overwritten by the variable hold.

This is then, in effect, what all codebases do.
I think in 1979 when Kurt Shoens came along with this primary /
secondary scheme it was a convenient way to deal with "unresolved
messages", and like the *flipr* variable can be used to switch in
between modes all the commands above evolved into that.
For example, what is now "mbox" was originally implemented by
means of the function that is now called by the "touch" command.
Ie, it evolved wildly, and *hold* was just a shorthand etc etc etc
etc.  (I think the point is clear.)

What i mean is, i think: this is overengineered and overcome, and
the interpretation should be adjusted as in the modern world of
email all that is effectively unused.

Thank you.  And a nice Sunday i wish (if you can).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: behavior of the QUIT character (^\) in the shell command line

2022-12-19 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <17402.1671424...@jacaranda.noi.kre.to>:
 |Date:Mon, 19 Dec 2022 00:17:25 +0100
 |From:"Vincent Lefevre via austin-group-l at The Open Group" \
 |
 |Message-ID:  <20221218231725.ga104...@zira.vinc17.org>
 |
 || Well, so it is not forbidden to bind it to "exit with a core dump"
 || (e.g. abort()), which is what a SIGQUIT does by default. :-)
 |
 |No, you can bind ctrl-\ to any action your shell allows, definitely
 |not forbidden.   Note, that's not SIGQUIT, it is just a character,
 |not a signal.   You only get one or the other (or neither sometimes)
 |never both.

My mailer explicitly supports the mle-raise-quit command (as it
does mle-raise-int and mle-raise-tstp to accommodate for an
equal set of signal raisers people do expect from stty(1), or at
least highly advanced such).  It is unbound by default.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Add -print0 to "find"

2022-12-08 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 :
 |Stephane Chazelas wrote, on 08 Dec 2022:
 |> 2022-12-08 15:39:32 +, Austin Group Bug Tracker via austin-group-l \
 |> at The Open Group:
 |> [...]
 |>> It is looking like the group might decide to add find -print0 and \
 |>> related
 |>> xargs and read features (for reasons I won't go into here).
 |> [...]
 ...
 |> Is there any plan of adding the accompanying [.]
 |> [.] awk -v RS='\0' -v ORS='\0' that some awk
 |> implementations still don't support (that support currently not
 |> mandated by POSIX)).
 |
 |There are no plans for that. And given the short time available
 |before we produce draft 3, I doubt it would be feasible for Issue 8.
 |(New features need to be in draft 3 if they are going to make it
 |into Issue 8 rather than waiting for Issue 9.)

Just to add that i opened an issue for nawk (Kernighan's awk) [1]
where i said "I do not know how portable / desired, but" regarding

  printf 'a\0b\0c\0' |
awk 'BEGIN{FS="\0"} {for(i=0; i < NF; ++i) print i, $i}'

which works in GNU awk and mawk (Dickey's mawk, i no longer test
against the broken 1990's mawk Debian used until not too long
ago), but outputs "0 a" for nawk and

  0 a
  0 b
  0 c

for busybox awk (git current some ~one month ago).
The issue was then closed by Arnold "Aharon" Robbins (also of GNU
awk) with the words[2]

  The One True Awk uses C strings, which are zero terminated, for
  just about everything. Thus a record of "a\0\b\0c\0" looks the
  same as if all it had was "a". Gawk uses pointer +
  length for all strings, so it can handle something like FS
  = "\0". In any case, putting NUL bytes into data isn't portable
  and is also outside the scope of POSIX, which expects data
  to be text, and not binary.

So i would expect this to be a major effort for one of the most
widely used awk implementations.  (And Kernighan seems to play
around with adding Unicode support on a feature branch, without
any thoughts on a NUL FS a short glance suggests.)

  [1] https://github.com/onetrueawk/awk/issues/165
  [2] https://github.com/onetrueawk/awk/issues/165#issuecomment-1306699359

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: "null terminator" v. "NUL terminator" (was: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets)

2022-12-02 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 :
 |Steffen Nurpmeso wrote, on 01 Dec 2022:
 |> Being here, i note an increasing number of "null terminators" for
 |> strings, which surely is wrong as NULL==(void*)0 (or similar aka
 |> 0x0, 0, __null, whatever they are doing now and have complained or
 |> even failed about (?) in the past if misused), whereas i also see
 |> NUL terminator being used in C181 (once), which surely is correct,
 |> especially given the number of NUL and NUL character anywhere.
 |> Overall it surely should be worth an entry in Definitions.
 |> 
 |> "null terminator" cannot be it.
 |> I am inclined to open an issue.
 |
 |NULL and null are not the same thing.  The "null" in "null terminator"
 |is a null byte, not a null pointer.  There is a definition in XBD
 |chapter 3 for "null byte".

Ah.  I see, including references to string and null byte
termination.

 |There are over 30 uses in C181 of "null byte", some within "terminating
 |null byte" or "null byte terminator".  So "null terminator" is just
 |shorthand for those phrases.

Ok.

 |There are nine uses of "null terminator".  The one use of "NUL terminator"
 |is in getdelim() which was newly added in Issue 7.  I consider its use
 |there to have been an editorial error and for consistency it should be
 |changed to "null terminator".

I only ever used NUL myself.

 |If you open an issue, please ask for getdelim() to be changed.
 |Adding a definition of "null terminator" would also be worthwhile.

All-in-one

 https://austingroupbugs.net/view.php?id=1621

Thanks --- and apologies, it is a large standard that has grown
over decades, and i never "lived" in it, but only ever used some
needed parts under a custom interface encapsulation.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets

2022-12-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group

If it was not the GNU C library info manual around Y2K which
mentioned how a SUN_LEN() has to look like if it is not available.

In the accepted text in
https://austingroupbugs.net/view.php?id=561, for bind(2), a space
is missing in between where and address_len.

For AF_UNIX sockets, some implementations support an extension 
whereaddress_len

Ditto for sendto(), wheredest_len.

Being here, i note an increasing number of "null terminators" for
strings, which surely is wrong as NULL==(void*)0 (or similar aka
0x0, 0, __null, whatever they are doing now and have complained or
even failed about (?) in the past if misused), whereas i also see
NUL terminator being used in C181 (once), which surely is correct,
especially given the number of NUL and NUL character anywhere.
Overall it surely should be worth an entry in Definitions.

"null terminator" cannot be it.
I am inclined to open an issue.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets

2022-11-30 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Olivier Certner wrote in
 <9027911.U91TZCKOhC@ravel>:
 |> Having written that, I did test that 'sizeof(((struct
 |> sockaddr_un*)0)->sun_path)' compiles with gcc, although I'm less
 |> certain of whether the C standard permits that (or even if that
 |> permission has changed over time) - the expression argument to sizeof
 |> is unevaluated, which counters the argument that you can't normally
 |> evaluate a dereference of a NULL pointer.
 |
 |According to ISO/IEC 9899, versions 1999 to 2017 at least, the answer \
 |seems 
 |clear from section 6.5.3.4 ("The sizeof operator" or "The sizeof and \
 |_Alignof 
 |operators"), paragraph 2:
 |
 |"""
 |The sizeof operator yields the size (in bytes) of its operand, which \
 |may be an
 |expression or the parenthesized name of a type. The size is determined \
 |from 
 |the type of the operand. The result is an integer. If the type of the \
 |operand 
 |is a variable length array type, the operand is evaluated; otherwise, the 
 |operand is not evaluated and the result is an integer constant.
 |"""
 |
 |Other relevant sections, especially 6.5.2.3 and in particular its paragr\
 |aphs 2 
 |and 4, are not in contradiction with the above text.
 |
 |So in short, yes, IMHO, the standard allows something like:
 |'sizeof(((struct sockaddr_un*)0)->sun_path)'
 |where in fact 0 could be replaced by any other invalid pointer.

That is great.  I heard they fixed static_assert to work like the
working one C++ has.  "Be liberal in what you expect" would be
nice by standard means, if information is naturally available or
has to be computable (in other contexts) anyhow.  The above
differentiation with parenthesis is such a thing, and not that
they introduce computed goto jumps and ({ }) statements.

Yes, sorry, not me, i am (much too) German, but i keep with
honours Article 7844 of comp.lang.c, from Dennis Ritchie, and
advance in peace alongside

  The fundamental problem is that it is not possible to write real
  programs using the X3J11 definition of C.  The committee has
  created an unreal language that no one can or will actually use.

(He is wrong with his tail, of course, nit-pickers will.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets

2022-11-30 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Eric Blake wrote in
 <20221130150909.pei323lktieb4...@redhat.com>:
 |On Wed, Nov 30, 2022 at 08:54:03AM -0600, Eric Blake via austin-group-l \
 |at The Open Group wrote:
 |>>  ...
 |>>|https://austingroupbugs.net/view.php?id=561 
 |> 
 |> First, I chose that wording because 'sizeof(struct
 |> sockaddr_un.sun_path)' doesn't compile.  You are right that 'sizeof
 |> NAME.sun_path' does compile, if NAME is an expression of type struct
 |> sockaddr_un, but the sentence becomes longer to introduce some object
 |> named NAME of the correct type just to get to the shorter sizeof
 |> expression.  However, we can make that edit if it makes sense.
 |
 |Having written that, I did test that 'sizeof(((struct
 |sockaddr_un*)0)->sun_path)' compiles with gcc, although I'm less
 |certain of whether the C standard permits that (or even if that

I would assume a lot of software will break if not;  i use

  #define su_FIELD_SIZEOF(T,F) sizeof(su_S(T *,su_NIL)->F)

(in varying forms) ever since i program in C and C++.

 |permission has changed over time) - the expression argument to sizeof
 |is unevaluated, which counters the argument that you can't normally
 |evaluate a dereference of a NULL pointer.

Now ... i grep(1)ped the Linux kernel 5.15, and i see a lot of
matches for "grep -Fr '0)-", among of which is, in
Documentation/process/coding-style.rst

  Similarly, if you need to calculate the size of some structure member, use

  .. code-block:: c

  #define sizeof_field(t, f) (sizeof(((t*)0)->f))

I hope the C standard would have to be changed if this becomes
invalid.  (Unless they start to invent useful things that provide
those things that everybody needs and uses for decades, and which
are still missing.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000561]: NUL-termination of sun_path in Unix sockets

2022-11-28 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 :
 ...
 |https://austingroupbugs.net/view.php?id=561 
 ...
 |-- 
 | (0006085) geoffclare (manager) - 2022-11-28 16:24
 | https://austingroupbugs.net/view.php?id=561#c6085 
 |-- 
 ...
 |char sun_path[size]   Socket pathname
 |storage.
 ...
 |[.] However, because sun_path is required to be the
 |last member of the struct, an application can deduce the size by using
 |sizeof(struct sockaddr_un) - offsetof(struct sockaddr_un,
 |sun_path).

I am glued to old habits, but given it is the last field and of
a known fixed size sizeof(NAME.sun_path) should be all that is
necessary.  (It definitely is in practice.)
(And all this different to SUN_LEN(), of course.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2022-10-19 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 :
 |Steffen Nurpmeso wrote, on 18 Oct 2022:
 |> Austin Group Bug Tracker wrote in
 |>  <2969d655ede7498ce22799a53d077...@austingroupbugs.net>:
 |>  ...
 |>|https://austingroupbugs.net/view.php?id=249 
 |>  ...
 |>| https://austingroupbugs.net/view.php?id=249#c5995 
 |>  ...
 |>|If a \e or \cX escape sequence specifies a character that does not \
 |>|have an
 |>|encoding in the locale in effect when these backslash escape sequences \
 |>|are
 |> 
 |> \e only yields escape U+1B?
 |> Since "this standard requires support for all of the control
 |> characters except NULL (matching what is done in the stty
 |> utility)" \e is always supported.  It is in (US-)ASCII and thus
 |> ISO-8859-1 and thus in the lower 256 codepoints of Unicode.
 |> (It is also in that EBCDIC thing.)
 |
 |"This standard requires support for all of the control characters except
 |NULL" just means that the shell is required to recognise $'\c[' as
 |specifying , it doesn't mean that  has to have an encoding
 |in all locales. See XBD 6.2:
 |
 |The POSIX locale [...]. Other locales shall contain the characters
 |in Table 6-1 (on page 105) and may contain any or all of the
 |control characters identified in Table 6-2 (on page 110)
 |
 | is in Table 6-2.

You are right, i see, U+001B is not in the portable character set,
only an optional part of character sets.
This is so far off daily live i would never have reflected that on
my own.  ISO 6429, ECMA-48, ECMA-35 from December 1971 includes it
even.  I downloaded a version, it is typewriter written.

Thank you.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2022-10-19 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <28905.1666177...@jacaranda.noi.kre.to>:
 |Date:Wed, 19 Oct 2022 08:26:46 +0100
 |From:"Geoff Clare via austin-group-l at The Open Group" \
 |
 |Message-ID:  
 |
 || I can't see anything "a few lines earlier" that implies quotation-mark
 || needs to be escaped.  Please give the exact wording change you would
 || like to see.
 |
 |I think Steffen is referring to:
 |
 |   \" yields a  (double-quote) character.
 |
 |the first bullet point in the (new) section 2.2.4, and that all he
 |means to change would be to add to that sentence something like:
 |
 |, but note that the double-quote character is not required to be
 |escaped to be included
 |
 |(just before the '.' that ends the existing sentence).

Yes, thank you.
I find it remarkable in the cryptic shell expansion context.
Quotation-mark does not bite if it is not escaped.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2022-10-18 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 <2969d655ede7498ce22799a53d077...@austingroupbugs.net>:
 ...
 |https://austingroupbugs.net/view.php?id=249 
 ...
 | https://austingroupbugs.net/view.php?id=249#c5995 
 ...
 |If a \e or \cX escape sequence specifies a character that does not have an
 |encoding in the locale in effect when these backslash escape sequences are

\e only yields escape U+1B?
Since "this standard requires support for all of the control
characters except NULL (matching what is done in the stty
utility)" \e is always supported.  It is in (US-)ASCII and thus
ISO-8859-1 and thus in the lower 256 codepoints of Unicode.
(It is also in that EBCDIC thing.)

"This standard makes the results implementation-defined if \e or
\cX specifies a character that is not present in the current
locale" cannot be true for \e then, either.

And likewise in "the unsupported character might be replaced with
multiple characters, shell-special or regular (e.g. if  is
not supported $'\e' may be replaced by "???", "XXX" or "")"
\e seems a particularly bad example thus.

(Also quotation-mark does not _need_ to be escaped, it can.  It
might be worthwhile to point this out?  A few lines earlier.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Thread queue position after unlocking PRIO_PROTECT mutex

2022-10-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

shwaresyst wrote in
 <299142152.189695.1665484864...@mail.yahoo.com>:

My idea was

   |What do implementations actually do?

  Hopefully do not give up their timeslice.

but did not send it.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2022-09-13 Thread Steffen Nurpmeso via austin-group-l at The Open Group

psm...@gnu.org wrote in
 <81a04e6224bf59f387aa8bb0a68bf89152b1cd29.ca...@gnu.org>:
 |On Tue, 2022-09-13 at 14:53 +0100, Geoff Clare via austin-group-l at
 |The Open Group wrote:
 |> If I'm honest those cases never occurred to me when we were working
 |> on the wording.  It seems unlikely a makefile author would
 |> intentionally use .WAIT in those ways, but obviously they could arise
 |> through expansion of an empty macro.
 |> 
 |> I think we should tweak the text at the next opportunity to add
 |> "(if any)" in the appropriate places.  I'll make a note to submit
 |> a bug against draft 3 to request that change.
 |
 |Thanks.
 |
 |For everyone's info, I implemented .WAIT in GNU make (pushed to Git
 |yesterday) and it will be available in the next release.
 |
 |I did review Steffen's patch but decided against that approach; but the
 |effort is much appreciated Steffen!

Sure.  It was only a short run from someone without insight into
the gmake code base, and if the real implementation can
parallelize even with .WAIT, instead of hard-synchronizing upon
.WAIT, then that approach is much superior.

Good to hear .WAIT is in GNU make!
(Poor Alexey Neyman, who waited 17 years to see his approach in.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

bind(2) AF_UNIX .. path reusage

2022-09-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

If i .. and bind(2) a AF_UNIX socket, and the server process dies,
then the socket still "exists" in the filesystem, and any further
bind(2) on the socket fails with EADDRINUSE on Linux 5.15.64, even
though it is not actively bound via bind(2).
There is no wording about this situation in the standard?
Shall i open a clarification request for this?

Thank you.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: stty default output/-a _POSIX_VDISABLE character "undef", contrasts with "" on all known implementations? Formatting loss->overinterpretation since Issue 6.

2022-09-10 Thread Steffen Nurpmeso via austin-group-l at The Open Group

наб wrote in
 <20220910120939.iujppzbqbw4al...@tarta.nabijaczleweli.xyz>:
 ...
 |This is very curious! /I/ was very curious, at least.
 ...
 |My naive interpretation of this is that, after loss of monospacing from
 |POSIX.2 to SUSv1, at some point in Issue 6's creation, "" was
 |taken to mean literal undef, i.e. italic undef, which is wrong,
 |but makes sense since use of <>s is very common to mean
 |"enclosed literal" or "literal symbol".
 |
 |The fix would be to simply change italic undef on line 108235 (D2.1)
 |to monospace  or bold .

Impressive sleuthing.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000514]: Enhance internal macros in make

2022-09-08 Thread Steffen Nurpmeso via austin-group-l at The Open Group

psm...@gnu.org wrote in
 <2f4c249e139c3391ae56f0027cde10df76292bfa.ca...@gnu.org>:
 |On Thu, 2022-09-08 at 15:53 +, Austin Group Bug Tracker via austin-
 |group-l at The Open Group wrote:
 |>  (0005962) geoffclare (manager) - 2022-09-08 15:53
 |>  https://austingroupbugs.net/view.php?id=514#c5962 
 |> -
 |> - 
 |> On D2.1 page 2947 line 98895, after applying bug 1520,
 |> change:The $^ macro shall evaluate to the list of
 |> prerequisites for the current target.to:The
 |> $^ macro shall evaluate to the list of prerequisites for the current
 |> target, with any duplicates (except the first) removed.
 |> On D2.1 page 2947 after line 98895 add:$+The
 |> $+ macro shall be equivalent to $^, except that duplicates shall not
 |> be removed; all prerequisites shall appear in the order they were
 |> listed in the makefile 
 |
 |This was closed before I had a chance to comment on the wording but a
 |few things:
 |
 |First, this text doesn't mention the .WAIT prerequisites that were
 |added as optional features; do we need to add text for how these are

Optional?  .WAIT:?  Only to be victorious over Borisorious!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000767]: Add built-in "local"

2022-08-08 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Christoph Anton Mitterer wrote in
 <708410359c03bc0cfb89bfc29baaa9000b0d00b1.ca...@scientia.org>:
 |On Mon, 2022-08-08 at 15:15 +, Austin Group Bug Tracker via austin-
 |group-l at The Open Group wrote:
 |> This was discussed during the 2022-08-08 conference call.  Since
 |> there is
 |> clear disagreement about the scope of local variables,  it is not
 |> clear
 |> that consensus can be reached.
 |
 |Just wondered, whether it was ever considered to "simply" specify a new
 |keyword (e.g. "loc" or something more generic similar to bash's
 |declare),.. which would then allow all shells to keep their current
 |"local"’s behaviour and yet provide a new one, which would then be
 |unified amongst all shells?

Around that time was a ML thread, or maybe another issue where
discussion spread, and i suggested the "my" that perl uses as that
ran tight.

 |Or would that something, that shell authors would be willing to do?

Nothing but silence.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: 202x/D2.1 cksum format specifier ambiguous/wrong?

2022-07-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Sorry for the late reply.

наб wrote in
 <20220723210935.jwgfb2izb5owu...@tarta.nabijaczleweli.xyz>:
 |On Sat, Jul 23, 2022 at 10:32:35PM +0200, Steffen Nurpmeso wrote:
 |> наб wrote in
 |>  <20220723193024.d7nv7lj43rhnl...@tarta.nabijaczleweli.xyz>:
 |>|Is the standard's intent to require
 |>|  "%u∆%d∆%s\n"
 |>|or should the section read something like
 |>|  STDOUT
 |>|For each file processed successfully,
 |>|the cksum utility shall write in the following format,
 |>|if any file operands were specified:
 |>|"%u %d %s\n", , <# of octets>, 
 |>|or if no file operand was specified:
 |>|"%u %d\n", , <# of octets>
 |>|
 |>|Line numbers from 202x/D2.1, also affects Issue 7.
 |> 
 |> cksum(1) implementations differ in the wild.
 |> It was the dear Jörg Schilling who nudged me to the understanding
 |> that Sun's cksum(1) indeed works correctly, it is just the output
 |> that differs and needs normlization (via "cat -vet|grep cksum"):
 |
 |I'm largely asking this from an implementer's standpoint ‒
 |i.e. if I'm allowed to output tabs in the output
 |(or not and the intent was to sped ∆s).
 |Thanks for your example of existing practice,
 |this points to the "'' should've been 's'" interpretation.
 |
 |>   csum="`${cksum} < "${f}" | ${sed} -e 's/[ ^I]\{1,\}/ /g'`"$

 |Out of morbid curiosity: any reason this couldn't be tr -s '\t' ' '?

No, i don't think so; except my personal experience is better with
sed.  I even have forgotten where the necessity to use [ ^I]
instead of [ \t] comes from, i do not seem to currently have
access to a box which would require it.  (I would understand it
for awk, but even the old broken Debian mawk from the 90s seems to
disappear with the next release i have heard.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: 202x/D2.1 cksum format specifier ambiguous/wrong?

2022-07-23 Thread Steffen Nurpmeso via austin-group-l at The Open Group

наб wrote in
 <20220723193024.d7nv7lj43rhnl...@tarta.nabijaczleweli.xyz>:
 |According to XBD, 5 ("File Format Notation"), L3071:
 |  ’ ’ (An empty character position.)
 |  Represents one or more  characters
 |and XRAT, A.5 ("File Format Notation") agrees in L117453-117455,117456-1\
 |17457:
 |  Note that an empty character position in format represents
 |  one or more  characters on the output (not white space,
 |  which can include  characters).
 |  The '∆' character is used when exactly one  is output.
 |
 |So I think this is as-expected.
 |
 |However, XCU, 3, cksum, L83395-83398 says:
 |  STDOUT
 |For each file processed successfully,
 |the cksum utility shall write in the following format:
 |"%u %d %s\n", , <# of octets>, 
 |If no file operand was specified,
 |the pathname and its leading  shall be omitted.
 |
 |So: huh? What if you wanted to output:
 |  %zu\t%zu\t%s\n
 |A strict reading would mean that the no-operand output should be
 |  %zu\t%zu\t\n
 |(or, indeed %zu\t%zu\t %s\n -> %zu\t%zu\t\n, )
 |but that's obviously wrong?
 |
 |Is the standard's intent to require
 |  "%u∆%d∆%s\n"
 |or should the section read something like
 |  STDOUT
 |For each file processed successfully,
 |the cksum utility shall write in the following format,
 |if any file operands were specified:
 |"%u %d %s\n", , <# of octets>, 
 |or if no file operand was specified:
 |"%u %d\n", , <# of octets>
 |
 |Line numbers from 202x/D2.1, also affects Issue 7.
 |
 |(Also, if touching this, I don't really see
 | how the octet count could be negative?
 | So maybe the second format should be %u?
 | That's minor though.)

cksum(1) implementations differ in the wild.
It was the dear Jörg Schilling who nudged me to the understanding
that Sun's cksum(1) indeed works correctly, it is just the output
that differs and needs normlization (via "cat -vet|grep cksum"):

  csum="`${cksum} < "${f}" | ${sed} -e 's/[ ^I]\{1,\}/ /g'`"$

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-26 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20220526085434.GA19184@localhost>:
 |Steffen Nurpmeso wrote, on 24 May 2022:
 |>
 |>   I find that "setlocale() may invalidate the string" painful,
 |>   because many functions of the C library do not have _l() variants
 |>   that could work with a uselocale() object.  Just think about the
 |>   scanf() that is used so often, or strtol(): you cannot even
 |>   convert a number by standard means.
 |
 |You are mixing up uselocale() and newlocale().
 |
 |The _l() functions and uselocale() are different ways to make use
 |of a locale object obtained from newlocale().
 |
 |If there is no _l() function, you can pass the locale object to
 |uselocale() to set a thread-local current locale which must then
 |be used by functions that use the current locale, such as scanf()
 |and strtol().  These functions only use the "global locale" (set
 |by setlocale()) if there is no thread-local current locale set.

That is true.
(But i think this is one more occasion where Stroustrup's "a C++
may even be faster, because problems can be solved differently",
cited more or less correctly, C++ 98, turns out to be correct.)

 |-- 
 |Geoff Clare 
 |The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
 --End of <20220526085434.GA19184@localhost>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-24 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20220524091849.GC25920@localhost>:
 |Bruno Haible wrote, on 12 May 2022:
 |>
 |> https://posix.rhansen.org/p/gettext_draft
 |> Line 573
 |> 
 |> "The application shall ensure that the codeset argument, if non-empty, \
 |> is a
 |>  valid codeset name that can be used as the tocode argument of the \
 |>  iconv_open()
 |>  function."
 |> 
 |> This is not the only requirement. We also need the requirement that \
 |> the NUL
 |> character of ASCII maps to a single NUL byte in the codeset. Otherwise \
 |> the
 |> iconv() processing inside gettext() is likely to malfunction.
 |> 
 |> Suggestion: Change
 |> "... iconv_open() function."
 |> to
 |> "... iconv_open() function, and that the NUL character corresponds to a
 |>  single NUL byte in codeset. So, the codeset may not be, for example,
 |>  "UCS-2", "UTF-16", "UTF-16BE", "UTF-16LE", "UCS-4", "UTF-32", "UTF-32BE"\
 |>  ,
 |>  "UTF-32LE", "UTF-7"."
 |
 |In today's call we made changes along the lines you suggest. Please
 |check the updated etherpad to see if they achieve what you wanted.

But can it be any more generic than

  that in the codeset it specifies, the NUL character corresponds
  to a single NUL byte.

that is the question.

  I personally never liked gettext().  I just did something with
  a dictionary, and used block-injecting C preprocessor macros for
  calls, because the ({ static size_t gen_cnt;.. })
  right-hand-side extension never made it into a standard, and it
  is wasteful to call functions for nothing, especially when the
  gen_cnt will be set only once and never change in "real life".

  I find that "setlocale() may invalidate the string" painful,
  because many functions of the C library do not have _l() variants
  that could work with a uselocale() object.  Just think about the
  scanf() that is used so often, or strtol(): you cannot even
  convert a number by standard means.
  If i were to design this, i would center on bindtextdomain(),
  and just keep it going.
  That is of course easier said than done, as only existing
  behaviour is streamlined and standardized.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-13 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello!

Steffen Nurpmeso wrote in
 <20220513135904.hhnsw%stef...@sdaoden.eu>:
 |Steffen Nurpmeso wrote in
 | <20220513132857.xzhqq%stef...@sdaoden.eu>:
 ||Harald van Dijk wrote in
 || <9aa0b43f-c5de-1698-9f34-c725a40e6...@gigawatt.nl>:
 |||On 12/05/2022 23:10, Steffen Nurpmeso wrote:
 |||> Harald van Dijk wrote in
 |||>   :
 |||>|On 12/05/2022 18:19, Steffen Nurpmeso via austin-group-l at The Open
 |||>|Group wrote:
 |||>|> Bruno Haible wrote in
 |||>|>   <4298913.vrqWZg68TM@omega>:
 | ...
 |||>   LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t c
 |||>   000  \0  \0   a  \0   b  \0  \0  \0
 |||> 
 |||> Two leading NULs?
 |||
 |||This is not what GNU iconv prints at all, at least not on my system, 
 |||which just uses the GNU version unmodified. Rather, it prints
 ||
 ||Interesting.  Unmodified here too.  Bruno Haible contacted me in
 ||private, i gave him all i have.
 |
 |Looking at the code (iconvdata/utf-16.c) i admit i fail to see how
 |this can happen, except maybe due to gcc 11.2.0 miscompilation
 |(CFLAGS="-O2 -march=x86-64 -pipe", shall that be honoured).  The
 |above is however surely what i see here, reproducably.

Bruno Haible had the fantastic idea of checking od(1), and that
was it!  When i use hexdump -C the BOM is back.  Or the GNU
version of od(1).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: When can shells remove "known" process IDs from the list?

2022-05-13 Thread Steffen Nurpmeso via austin-group-l at The Open Group

chet.ra...@case.edu wrote in
 <217874a6-64d5-184b-68e8-0bedb322f...@case.edu>:
 |On 5/13/22 10:27 AM, Geoff Clare via austin-group-l at The Open Group \
 |wrote:
 |> Chet Ramey wrote, on 13 May 2022:
 |>> On 5/13/22 5:20 AM, Geoff Clare via austin-group-l at The Open Group \
 |>> wrote:
 |>>> The definition of "Job" is:
 ...
 |>> Why not? This is what allows jobs/kill/wait to use job control notation
 |>> in operands even when job control is not currently enabled. I'd argue
 |>> that that was intended.
 |> 
 |> My reading is that all the standard requires here is that if one or
 |> more jobs are created with job control enabled, and job control is
 |> subsequently disabled, you can still use "jobs" to list those jobs,
 |> and %n etc. with "kill" to refer to those jobs.
 |
 |Of course; it relies on your assertion that the standard requires job
 |control to be enabled to create a job and put it in the jobs list. I've
 |already said what I think about that, and most, if not all, shells behave
 |differently.

Not to mention the ones where "set -m" is broken somewhere deep
within.

After running against the wall of reliable asynchronous process
interaction from within a sh(1)ell script some years ago, i had to
rewrite it all a bit differently, and one core point now is

   [ -n "${JOBMON}" ] && set -m >/dev/null 2>&1
   (  # Place the job in its own directory to ease file management
  trap '' EXIT HUP INT QUIT TERM USR1 USR2
  ${mkdir} t.${JOBS}.d && cd t.${JOBS}.d &&
 eval t_${1} ${JOBS} ${1} &&
 ${rm} -f ../t.${JOBS}.id
   ) > t.${JOBS}.io &1 /dev/null 2>&1
   JOBLIST="${JOBLIST} ${i}"
   printf '%s\n%s\n' ${i} ${1} > t.${JOBS}.id

   # ..until we should sync or reach the maximum concurrent number
   [ ${JOBS} -lt ${JOBNO} ] && return

This works reliable on all tested systems (*BSD, Linux of several
kind, SunOS 5.{9,10,11}) with all tested (installed) shells.
(Beside the one with actually broken set -m, i have to say

 printf >&2 '%s! $JOBMON: $SHELL %s incapable, disabled!%s\n' \
"${COLOR_ERR_ON}" "${SHELL}" "${COLOR_ERR_OFF}"
 printf >&2 '%s!  No process groups available, killed tests may '\
'leave process "zombies"!%s\n' \
"${COLOR_ERR_ON}" "${COLOR_ERR_OFF}"

but that just cannot be helped.)

Of course it is still a mess that requires synchronization files
etc., but without this it just will not do.  It is still racy

  jtimeout() {
 i=0
 while [ ${i} -lt ${JOBS} ]; do
i=`add ${i} 1`
if [ -f t.${i}.id ] &&
  read pid < t.${i}.id >/dev/null 2>&1 &&
  kill -0 ${pid} >/dev/null 2>&1; then
   j=${pid}
   [ -n "${JOBMON}" ] && j=-${j}
   kill -KILL ${j} >/dev/null 2>&1
else
   ${rm} -f t.${i}.id
fi
 done
  }

But only a bit.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-13 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Harald van Dijk wrote in
 <9aa0b43f-c5de-1698-9f34-c725a40e6...@gigawatt.nl>:
 |On 12/05/2022 23:10, Steffen Nurpmeso wrote:
 |> Harald van Dijk wrote in
 |>   :
 |>|On 12/05/2022 18:19, Steffen Nurpmeso via austin-group-l at The Open
 |>|Group wrote:
 |>|> Bruno Haible wrote in
 |>|>   <4298913.vrqWZg68TM@omega>:
 |>|>|Steffen Nurpmeso wrote:
 |>|>|>  ...
 |>|>|>| [.] "UTF-7"."
 |>|>|>
 |>|>|> That is overshoot.
 |>|>|
 |>|>|No. UTF-7 is invalid here because it produces output that is not NUL
 |>|>|terminated. See:
 |>|>|
 |>|>|$ printf 'ab\0' | iconv -t UTF-7 | od -t c
 |>|>|000   a   b   +   A   A   A   -
 |>|>|007
 |>|>|
 |>|>|strlen() on such a return value makes invalid memory accesses.
 |>|>|You can convince yourself by running
 |>|>|$ OUTPUT_CHARSET=UTF-7 valgrind ls --help
 |>|>
 |>|> This is then surely bogus?  UTF-7 is a normal single byte
 |>|> character set and is to be terminated like anything else.  Nothing
 |>|> in RFC 2152 nor RFC 3501 if you want makes me think something
 |>|> else.
 |>|
 |>|RFC 2152's rules 1 and 3 only allow specifying the listed characters as
 |>|their ASCII form. All other characters, including U+, must be
 |>|encoded using rule 2. GNU iconv is doing what the RFC specifies here.
 |> 
 |> No really, please.  And please do not strip important content,
 |
 |I didn't think I did. You didn't read the RFC properly, I replied to 

You again strip content of follow-up RFCs.
I have implemented UTF-7, and i definitely terminate C-style
strings.

  ...
 |>   LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t c
 |>   000  \0  \0   a  \0   b  \0  \0  \0
 |> 
 |> Two leading NULs?
 |
 |This is not what GNU iconv prints at all, at least not on my system, 
 |which just uses the GNU version unmodified. Rather, it prints

Interesting.  Unmodified here too.  Bruno Haible contacted me in
private, i gave him all i have.

  ...
 |you may want to report this, including steps on how to get a GNU iconv 

I have given up on reporting bugs on sourceware bug tracker.
The reason is on this list i think.

I skip the rest.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Harald van Dijk wrote in
 :
 |On 12/05/2022 18:19, Steffen Nurpmeso via austin-group-l at The Open 
 |Group wrote:
 |> Bruno Haible wrote in
 |>   <4298913.vrqWZg68TM@omega>:
 |>|Steffen Nurpmeso wrote:
 |>|>  ...
 |>|>| [.] "UTF-7"."
 |>|>
 |>|> That is overshoot.
 |>|
 |>|No. UTF-7 is invalid here because it produces output that is not NUL
 |>|terminated. See:
 |>|
 |>|$ printf 'ab\0' | iconv -t UTF-7 | od -t c
 |>|000   a   b   +   A   A   A   -
 |>|007
 |>|
 |>|strlen() on such a return value makes invalid memory accesses.
 |>|You can convince yourself by running
 |>|$ OUTPUT_CHARSET=UTF-7 valgrind ls --help
 |> 
 |> This is then surely bogus?  UTF-7 is a normal single byte
 |> character set and is to be terminated like anything else.  Nothing
 |> in RFC 2152 nor RFC 3501 if you want makes me think something
 |> else.
 |
 |RFC 2152's rules 1 and 3 only allow specifying the listed characters as 
 |their ASCII form. All other characters, including U+, must be 
 |encoded using rule 2. GNU iconv is doing what the RFC specifies here.

No really, please.  And please do not strip important content,
i am neither Chinese nor Russian, and especially not one of the
other 7 billion that do not count.
(I said surely bogus because i alone see the shiny light of having
found give-me-five GNU iconv errors.  Or even beyond that.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: ML reconfigured?

2022-05-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20220512173651.yl-pn%stef...@sdaoden.eu>:
 |Steffen Nurpmeso wrote in
 | <20220512173033.jp_28%stef...@sdaoden.eu>:
 ||Just wondering, i no longer receive my own messages to the ML?
 ||I would henceforth save away my sent copy if this is desired?
 |
 |So that i got.  It thus seems selective values are selectively
 |applied just as currently desired.

(Looking at this, it was likely because i was in Cc:, and i was
there because another ML was addressed.  Sorry for the noise.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: ML reconfigured?

2022-05-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20220512173033.jp_28%stef...@sdaoden.eu>:
 |Just wondering, i no longer receive my own messages to the ML?
 |I would henceforth save away my sent copy if this is desired?

So that i got.  It thus seems selective values are selectively
applied just as currently desired.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

ML reconfigured?

2022-05-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Just wondering, i no longer receive my own messages to the ML?
I would henceforth save away my sent copy if this is desired?

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-12 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Bruno Haible wrote in
 <4298913.vrqWZg68TM@omega>:
 |Steffen Nurpmeso wrote:
 |>  ...
 |>| [.] "UTF-7"."
 |> 
 |> That is overshoot.
 |
 |No. UTF-7 is invalid here because it produces output that is not NUL
 |terminated. See:
 |
 |$ printf 'ab\0' | iconv -t UTF-7 | od -t c
 |000   a   b   +   A   A   A   -
 |007
 |
 |strlen() on such a return value makes invalid memory accesses.
 |You can convince yourself by running
 |$ OUTPUT_CHARSET=UTF-7 valgrind ls --help

This is then surely bogus?  UTF-7 is a normal single byte
character set and is to be terminated like anything else.  Nothing
in RFC 2152 nor RFC 3501 if you want makes me think something
else.  (RFC 5092 "IMAP URL Scheme", which invents the sane-enough-
to-think-yourself "UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX"
conversion scheme, and reverse, even implies the opposite, the
example functions both NUL terminate the string.)
Except Mark Davis said something like "UTF-7 was a failure"
once on the Unicode ML, if i recall correctly, and i surely added
"sadly", given the Punycode mess with domain names.
But one more ship that sailed.  But a pity it is.
Why should NUL be treated differently??  No.  No, i think it is
a bug in GNU iconv that noone stumbled upon because noone is using
UTF-7.  Heck, how about that, for example:

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t c
  000  \0  \0   a  \0   b  \0  \0  \0

Two leading NULs?

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t ucs-2 | od -t c
  000   a  \0   b  \0  \0  \0

That yes.

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-8 | od -t c
  000   a   b  \0

Yes.

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-7 | od -t c
  000   a   b   +   A   A   A   -

No.  Somehow they all bogus, take SunOS 5.10:

  LC_ALL=C printf 'ab\0' |  iconv -f iso-8859-1 -t utf-16 | od -t
  000 376 377  \0   a  \0   b  \0  \0

Ooh, now it gets scary!!  Interestingly OpenBSD 7.1 behaves the
same, likely it is an old instance of GNU iconv thus, there it
says "GNU libiconv 1.16", here it says "iconv (GNU libc) 2.35".

So unless someone convinces me you are arguing based on buggy
software.  UTF-7 is just another 7-bit single byte character set,
and thus.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: POSIX bind_textdomain_codeset(): some invalid codeset arguments

2022-05-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Bruno Haible wrote in
 <24562059.ssLaC8jLEa@omega>:
 ...
 | [.] "UTF-7"."

That is overshoot.
(Though i'd wish they would have used it for internationalized
domain names.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: When can shells remove "known" process IDs from the list?

2022-05-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

chet.ra...@case.edu wrote in
 <195c7c59-8328-4ddc-b936-345f34ab1...@case.edu>:
 |On 5/10/22 12:03 PM, Geoff Clare via austin-group-l at The Open Group \
 |wrote:
 ...
 |So for the known IDs list, it's pretty much `wait' and `jobs', right?

Great words spoken easily.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: When can shells remove "known" process IDs from the list?

2022-05-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group

chet.ra...@case.edu wrote in
 <88762e56-0276-f936-cf4c-d48c8ddc2...@case.edu>:
 |On 4/29/22 4:23 PM, Robert Elz via austin-group-l at The Open Group wrote:
 ...
 |>  true & X=$!
 ...
 |They're not jobs! A pid is a pid. It doesn't matter whether it's the pid of
 |the job's controlling process (or whatever we want to call it). The
 |Asynchronous Lists text says you have to be able to wait for it. This is
 |how bash works, too.
 |
 |This is what happens when you have a jobs list and a list of terminated
 |asynchronous lists that are `known in the current shell environment'.
 ...

A bit off-topic, but it would be nice if scripts would be given
a hand to signal childs in a safe way.  I can wait(1) on a PID
maybe, but timeout(1) is not standardized (nor can it be already
i think, -- though maybe i should simply open an issue?), and so
there is no safe way to collect multiple PIDs while also being
able to kill(1) them when they exceed a time limit.  This can only
be done by means of synchronization of some stamp file or so, but
if i kill(1) a PID i do not know whether it was my PID or already
a reused PID that belongs to another program.  Yet the sh(1) does
know whether that PID is still our child or not.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001533]: struct tm: add tm_gmtoff (and tm_zone) field(s)

2022-03-15 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 <942dac4371c72bf106541bbfe42be...@austingroupbugs.net>:
 ...
 |-- 
 | (0005745) steffen (reporter) - 2022-03-14 00:31
 | https://austingroupbugs.net/view.php?id=1533#c5745 
 |-- 

I have edited this note a bit; nothing "substantial", only
notational, and removed the documentation P.S. now that i have
opened a regular issue on that.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001544]: uudecode: standardise or at least reserve - as another special symbol for decoding to stdout

2022-03-02 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <21233.1646233...@jinx.noi.kre.to>:
 ||  (0005725) steffen (reporter) - 2022-03-02 12:50
 ||  https://austingroupbugs.net/view.php?id=1544#c5725 
 || -- 
 || FreeBSD uuencode only supports -o /dev/stdout not in-stream /dev/stdout,
 |
 |I assume you mean uudecode, since that doesn't make much sense
 |applied to uuencode, but that doesn't look correct to ne.

Was a typo, yes.
(I had to change "uudecode" to "uudecode -o /dev/stdout" because
the embedded target was not recognized; 'was produced via

  < "${i}" uuencode -m /dev/stdout | sed 's/^/X/')

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility

2022-02-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

enh wrote in
 :
 |in terms of "what's actually used in the wild", Android uses toybox (0BSD
 |licensed, so anyone can look :-) ) for both on-device *and* for the OS
 |build itself on the host.
 |
 |toybox readlink (
 |https://github.com/landley/toybox/blob/master/toys/other/readlink.c)
 |currently supports:
 |
 |usage: readlink FILE...
 |
 |With no options, show what symlink points to, return error if not
 |symlink.
 |
 |Options for producing canonical paths (all symlinks/./.. resolved):
 |-e Canonical path to existing entry (fail if missing)
 |-f Full path (fail if directory missing)
 |-m Ignore missing entries, show where it would be
 |-n No trailing newline
 |-q Quiet (no output, just error code)
 |
 |since toybox tends to add things _as they're needed_, rather than "because
 |coreutils has them", that's probably "solid anecdata" about what gets used
 |in the wild.
 |
 |one thing i haven't seen mentioned so far (but which i added to toybox
 |myself, so i know it's definitely in use) is that existing realpath
 |implementations support *multiple* file arguments on the command line, not
 |just one.

To extend this with the widely used busybox:

  #?127|kent:toolbox.git$ busybox.static readlink --help
  BusyBox v1.34.0 (2022-01-03 21:34:20 CET) multi-call binary.

  Usage: readlink [-fnv] FILE

  Display the value of a symlink

  -f  Canonicalize by following all symlinks
  -n  Don't add newline
  -v  Verbose
  #?0|kent:toolbox.git$ busybox.static realpath --help
  BusyBox v1.34.0 (2022-01-03 21:34:20 CET) multi-call binary.

  Usage: realpath FILE...

  Print absolute pathnames of FILEs

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001457]: Add readlink(1) utility

2022-02-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20220225152553.GA4559@localhost>:
 |Robert Elz wrote, on 25 Feb 2022:
 |>
 |> OK.  I have looked at the coreutils realpath man page (gnu licensing
 |> stupidity means I cannot look at their code), and I can see the
 |> possibility (subject to community agreement) of implementing
 |> some of the options it has.   Not all.
 |> 
 |> I'm not sure a -E option is needed, if the whole path exists it
 |> makes no difference, if just the last component is missing, I
 |> can't really imagine and BSD usage requiring an error in that case
 |> (and anyone who needs tgat could use -e if I implement this).
 |
 |The point is it's a difference in behaviour between the two
 |implementations. Rather than just making it unspecified whether
 |the last component has to exist, it seemed to me that it would
 |be more useful to have -e and -E options so that users have a
 |way to ensure they get the same behaviour on both.
 |
 |So my preferences are (in descending order):
 |
 |1. POSIX adds realpath with -e and -E, and readlink without -f.

Why adjust a closed issue if all known implementations of
readlink(1) do support an identical -f?

 |Unspecified which of -e or -E is the default.
 |GNU adds a no-op -E to realpath.
 |NetBSD/FreeBSD adds -E and a no-op -e to realpath.
 |
 |2. POSIX adds readlink with -f (whose behaviour is the same for
 |both implementations).  No realpath.

POSIX could also mention the possibility to handle these two
commands via "argv[0] tricks", "realpath like readlink -f"?
It portability is not an issue.
I looked in my things, i do have two use cases for realpath(1),
quite some more for readlink(1), which i even "fake" in my
~/.profile as necessary:

  # UnixWare plus does not have readlink(1)
  if command -v readlink >/dev/null 2>&1; then
 :
  else
 readlink() {
echo "${*}"
 }
  fi

What a mess.  POSIX has readlink(2) and realpath(3), coming from
that i would assume many programmers who "live" in a modern *x
environment simply take this for granted?  
Ok, the manuals say

  readlink - print resolved symbolic links or canonical file names
  realpath - print the resolved path

but this is GNU only; On OpenBSD one can read

  .Nd display target of symbolic link on standard output
  .Nd print the canonicalized absolute pathname

and on FreeBSD (in /bin even!)

  .Nd return resolved physical path

letting aside readlink for now.
The latter is why i personally would "naturally" think, as it
mirrors readlink(2) and realpath(3).

 |3. POSIX adds realpath without -e and -E, and readlink without -f.
 |Unspecified whether realpath needs last component to exist.
 |
 |I wasn't proposing any other options be included for realpath
 |(except perhaps -q, depending on whether it behaves the same in
 |both implementations).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Bug 1562 and other locale issues

2022-02-14 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20220214152249.embet%stef...@sdaoden.eu>:
 ...
 |I mean the w* series can be used to work it at least.
 |Yes, it is wrong, because it does not work on grapheme
 |clusters, towupper(wint_t) for example is thus broken.

Hah!  I missed to add the German term

  Über die Wupper gehen

which translates to

  Moving over the [river] Wupper

It was a terribly poisened muddy water and whoever went in to get
through came out skeletized one hears.  So its meaning surrounds

  Dying

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Bug 1562 and other locale issues

2022-02-14 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <22573.1644732...@jinx.noi.kre.to>:
 |In general I don't comment much on anything related to i18n,
 |as it is (way) outside my area.
 |
 |However, what is clear is that much of what we have related to
 |locales is a total botch - thrown together in the 80's/90's
 |when it was clear that unix systems needed to be able to
 |operate in the non-English speaking world.
 |
 |But that all happened before the net made i18n a whole
 |different problem - originally one could mostly assume
 |that the system, or at least a user of the system, was
 |operating within one consistent locale - that might not
 |be ascii/English (American) but it would be something
 |stable (and would include English as at least some kind
 |of subset).   The current "solution" has that general
 |philosophy embedded throughout.
 |
 |All this is simply inadequate now.  The printf issue illustrates
 |some of the issues ... it isn't sufficient to distinguish between
 |bytes and characters, for the latter we also need to know the locale,
 |and the LC_* stuff isn't enough, as the locale of the format string
 |(that part written by the script writer who wrote the printf invocation)
 |the data (both strings and numerics) that is to be printed controlled
 |by that format, and the invoking user's desired output locale (that
 |supported by the display device - or desired as the format of the file)
 |might all be different, and the LC_* stuff just cannot cope with that.

I mean the w* series can be used to work it at least.
Yes, it is wrong, because it does not work on grapheme
clusters, towupper(wint_t) for example is thus broken.

You know i think everybody banged his head against this, ending in
depair, and you did not even add onto this per-filesystem text
encodings!

But then again things are crystal clear if you write scripts in
the portable character set only.
And shell variable assignments cause several actions that involve
expansions, and those work on characters.

The good old wonderful perl(1) improved the situation a bit with
its "utf8" pragma, as in utf8(3pm):

   The "use utf8" pragma tells the Perl parser to allow UTF-8 in the
   program text in the current lexical scope.  The "no utf8" pragma tells
   Perl to switch back to treating the source text as literal bytes in the
   current lexical scope.  (On EBCDIC platforms, technically it is
   allowing UTF-EBCDIC, and not UTF-8, but this distinction is academic,
   so in this document the term UTF-8 is used to mean both).

   Do not use this pragma for anything else than telling Perl that your
   script is written in UTF-8. The utility functions described below are
   directly usable without "use utf8;".

   Because it is not possible to reliably tell UTF-8 from native 8 bit
   encodings, you need either a Byte Order Mark at the beginning of your
   source code, or "use utf8;", to instruct perl.

   When UTF-8 becomes the standard source format, this pragma will
   effectively become a no-op.

For my mailer i long reserverd an "u" command modifier,
.. unfortunately i find only fewest time for development, one or
two hours a day at maximum, therefore all this lingers and
lingers.  It is terrible.

It would have been wise by shell developers to add a similar
internal command, or allow it at least as a noop --- this pragma
exists for many, many years.  Maybe even two decades or more??

  ...
 |There is, it seems to me, much, much work left to be done in this area,
 |so wasting our time trying to shoehorn the world into what is currently
 |defined seems pointless, as it patently cannot be done with what we have
 |to work with, and inventing a solution based upon what is thought might
 |be adequate is what got us into the current situation in the first place,
 |so we should definitely avoid repeating that mistake.

But do not leave like this!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

k...@keldix.com wrote in
 <20220110211651.ga2...@www5.open-std.org>:
 |
 |please also consider iso 30112, which is a full iso standard,
 |and written in posix style. it was taken from glibc, after glibc took \
 |it from the iso 14652,
 |it has a number of categories in eccess of the posix oncs, including \
 |lc_paper.
 |also note that for turkish a 'small i with dot' uppercases to a 'capital \
 |i with dot', not to 'capital i'.

Does it have new lira glyph also?

 |and there are uthers, eg. german sharp s uppercases to double s - two \
 |characters.
 |thoi is regardles of utf8

Oh heaven.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

shwaresyst wrote in
 <1494661216.220561.1641574109...@mail.yahoo.com>:
[i resort a bit]
 |  On Thu, Jan 6, 2022 at 3:40 PM, Steffen Nurpmeso via austin-group-l \
 |  at The Open Group wrote:   Hello!
 |
 |I wonder about POSIX.utf-?8, i tried to remember any statement
 |i had read, and Mantis did not show up results.
 |
 |In particular i am interested in whether LC_CTYPE results will
 |bring true Unicode support or not, the reason i am asking is that
 |the upcoming version of my work-box GNU LibC-based (2.34) Linux
 |distribution will provide it like
 |
 |  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null \
 ||| true
 |
 |and then this thing is detected as an UTF-8 locale, but causes
 |three test failures of the MUA i maintain because character set
 |conversion behaves differently.
 |
 |My personal opinion was that POSIX.utf8 will bring the complete
 |range of Unicode characters to at least LC_CTYPE, i wonder about
 |LC_COLLATE, as language matching is, hm, very language specific.
 |The rest not (maybe LC_MESSAGES going for UTF-8 though).
 |
 |Is that approximately correct?

 |The first Issue 8 draft is focusing, afaik, on adding the C1x changes \
 |and Mantis Issue 8 tagged items. The changes to XBD 6, 7, etc., that \
 |will formally add a POSIX UTF8 locale are to be part of the second, \
 |maybe third, draft. This is why you don't see them yet.
 |For maximum compatibility with existing practice the required base \
 |repertoire for this will likely be some subset of UCS-2, plus ISO-6429 \

16-bit characters i do not see in POSIX, going that route would
make impossible implementations which use specific bit patterns in
wchar_t, which, if i recall correctly from 2014 or when i was
looking into the issue, is used by at least the Citrus
implementation of the mb* and w* series for at least some asian
languages.  And more .. but that was not the issue i am concerned
about at the moment anyhow, i personally would assume 8-bit aka
UTF-8 character strings to be predominant in Unix based systems,
they surely are in the predominant ones.  (Even though, i have to
say, UTF-16 aka 16-bit characters do have their value for the
majority of the massively declining number of human languages, and
the older i get the more i think using that as a base is a good
decision.)

 |in full, not the complete range. I've hopes this will be significantly \
 |more than the minimal repertoire of C2x, but it may not as a matter \

That made me look for and download a 2020 draft of ISO C2X, i did
not have a look until now.

 |of deferral to the C standard. It should be left up to implementations \
 |still, in my opinion, how much of the range beyond this base they want \
 |to support as extensions, including UTF16 as an encoding. How the LC_* \
 |categories will be extended to fully support that base repertoire accord\
 |ing to the Unicode requirements hasn't been determined yet either, \
 |but this is the nominal goal. 

And from a glance i do not see anything Unicode-enabled-locale
wise.  UTF-16 specifically i do not see ... as you will have to
convert on input and on output in order to use it in your program,
and then you can very well convert to the transparent wchar_t, or
use the wide I/O series which gives it to you.  Minimizing the
tremendous deficiency that many traditional Unix programs have to
face because the historic string interfaces do not provide proper
functionality to deal with human languages is out of scope is it?

At least it seems as if ISO C2X introduces support for UTF-8 as
a native string representation ... in practice it seems Unix
people use GNU libunicode (which explicitly supports UTF-(32|16|8)
i think) as well as ICU (which i think used UTF-16 internally but
offered improved UTF-8 interface performance by then), so the ISO
standard people were able to simply ignore their responsibility
and focused on mysterious s..t decisions, and POSIX has to follow
ISO C suit for one, and then simply had not the ressources to
define an entire Unicode string interface by themselve ... and so
practice has created its own Genesis.

Thank you.  And ciao from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Future of locale, will there be POSIX.utf-8, what will it bring?

2022-01-06 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello!

I wonder about POSIX.utf-?8, i tried to remember any statement
i had read, and Mantis did not show up results.

In particular i am interested in whether LC_CTYPE results will
bring true Unicode support or not, the reason i am asking is that
the upcoming version of my work-box GNU LibC-based (2.34) Linux
distribution will provide it like

  localedef -i POSIX -f UTF-8 $PKG/usr/lib/locale/C.UTF-8 2> /dev/null || true

and then this thing is detected as an UTF-8 locale, but causes
three test failures of the MUA i maintain because character set
conversion behaves differently.

My personal opinion was that POSIX.utf8 will bring the complete
range of Unicode characters to at least LC_CTYPE, i wonder about
LC_COLLATE, as language matching is, hm, very language specific.
The rest not (maybe LC_MESSAGES going for UTF-8 though).

Is that approximately correct?

Thanks and Ciao! from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001505]: Make doesn't seem to specify unset macro expansion behaviour

2021-12-21 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20211221103814.GB12295@localhost>:
 |Steffen Nurpmeso wrote, on 20 Dec 2021:
 |>
 |> For example CRUX-Linux has a /etc/pkgmk.conf where people can
 |> define $CFLAGS, $CXXFLAGS, etc., also things like
 |> 
 |>   export JOBS=$(nproc)
 |>   export MAKEFLAGS="-j $JOBS"
 ...
 |So use:
 |
 |CFLAGS?=
 |CFLAGS+=-Weven-more-noise
 |
 |to get the same behaviour as before, if you are willing to keep taking
 |the risk of it misbehaving.

I fail to find a real life makefile example which would be bitten
by your proposal.  Most programs use make file generators, for
which it does not matter, and those which still support straight
make files usually do something similar to git(1) here

  # Guard against environment variables
  ...
  SCRIPT_SH =

  ...
  SCRIPT_SH += git-bisect.sh

Only busybox uses 

  CFLAGS  := $(CFLAGS)

but that is ok since CFLAGS is made available via make(1) anyway.
While i am looking around, users of GNU make(1) seem to make heavy
use of (un)?export statements in make files, which addresses
a deficiency in POSIX make(1), one may say, i usually generate
shell variable assignment files that then are included before the
sub-make is executed, that is

  build:
@$(_prestop); LC_ALL=C $${MAKE} -f mk-config.mk all
  [.]
  _prestop = $(__prestop); cd "$(OBJDIR)" && . ./mk-config.env

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [Issue 8 drafts 0001505]: Make doesn't seem to specify unset macro expansion behaviour

2021-12-20 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Geoff Clare wrote in
 <20211220133200.GA25606@localhost>:
 |Robert Elz wrote, on 18 Dec 2021:
 |> bsd make (bmake) has a couple of macros (the doc calls what are
 |> macros here variables, but they are the same thing), which default to
 |> being unset, which can be set in the environment or command line,
 |> but which cannot be set in any Makefile, as they alter the interpretation
 |> of makefiles (slightly).  That is, if they are going to be defined, that
 |> needs to have happened before the first makefile line is read.
 |> They are used (expanded) in makefiles quite commonly.
 |> 
 |> For that to work, no rule requiring variables to be always set in the
 |> Makefile before being expanded is possible 
 |
 |No such rule is being proposed.  The error is only allowed if
 |the macro is expanded while unset.  If it is set, it makes no
 |difference how it was set.
 |
 |> The += is effectively shorthand for (not possible in make I think)
 |>  MACRO= ${MACRO} string
 |> an so is effectively expanding the otherwise potentially unset MACRO
 |> before altering it.   Any implementation which generated an error on
 |> an expansion of an unset macro would need to generate an error for
 |> this usage (assuming MACRO was unset previously) as well.
 |
 |Thank you for pointing that out.  We should revisit the proposed
 |changes to allow this error as well.

Now that it starts to affect me.. :)
I do not think the proposal reflects practice.

For example CRUX-Linux has a /etc/pkgmk.conf where people can
define $CFLAGS, $CXXFLAGS, etc., also things like

  export JOBS=$(nproc)
  export MAKEFLAGS="-j $JOBS"

It is processed by the shell before the actual package is build.
I personally set things in my environment, $CFLAGS for example.
These things are picked up by most build processes automatically
ever since i consciously look at that.

all:
echo CFLAGS=$(CFLAGS)
->
  echo CFLAGS=-O1 -g
  CFLAGS=-O1 -g

or even

CFLAGS+=-Weven-more-noise
all:
echo CFLAGS=$(CFLAGS)
->
  echo CFLAGS=-O1 -g -Weven-more-noise
  CFLAGS=-O1 -g -Weven-more-noise

This would vanish with the proposal:

  CFLAGS=
CFLAGS+=-Weven-more-noise
all:
echo CFLAGS=$(CFLAGS)
->
  echo CFLAGS=-Weven-more-noise
  CFLAGS=-Weven-more-noise

It even becomes impossible to get the same effect that is used in
the wild, as Robert Elz already said, specifying 'make
CFLAGS="$CFLAGS"' on the command line hard-sets CFLAGS in the
makefile, and the += never comes into play.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001531]: time: follow-up to issue #1440

2021-10-30 Thread Steffen Nurpmeso via austin-group-l at The Open Group

...i have omitted all those "sh -c" invocations without an
expansion of command_string that could be misinterpreted.
Ideally i catched them all.

Ciao, and a nice weekend from Germany i wish,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Interpretation starting for a 30 day review (1440)

2021-10-30 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <21180.1635554...@jinx.noi.kre.to>:
 |Date:Sat, 30 Oct 2021 02:23:40 +0200
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20211030002340.gtkvv%stef...@sdaoden.eu>
 |
 || Dear Robert Elz, on the other hand
 ||
 ||   #?127|unstable9s:$ /usr/xpg4/bin/sh  -c -- "echo Robert, let's sally"
 ||   Robert, lets sally
 ||   #?0|unstable9s:$ ll /usr/xpg4/bin/sh
 ||   -r-xr-xr-x   1 root bin   202164 Mar 19  2012 /usr/xpg4/bin/sh*
 |
 |It isn't a question of sh support for -c --, we have to take that
 |as given (or we couldn't possibly want system() and popen() to use it).
 |What matters is libc support for use of the "--" in system() and popen()
 |so that all of those applications that are failing today because when
 |they try system("-some-tool") it fails currently don't need to be altered
 |to change that to be system(" -some-tool");

Ok, sure.  Even though it leads to nothing but syntax errors,
i think it is better to be explicit and add the -- instead of
documenting as below by and for Linux man-pages.  It can only
become better in the future, and maybe some code paths here and
there can avoid checking leading hyphen-minus and adding spaces in
some time from now on.

BUGS
   If  the command name starts with a hyphen, sh(1) interprets the command
   name as an option, and the behavior is undefined.  (See the  -c  option
   to  sh(1).)   To  work  around this problem, prepend the command with a
   space as in the following call:

   system(" -unfortunate-command-name");

I did not mention it in my commit message, but i wondered that
i never have seen that it is "c" not "c:" aka takes an argument,
which is why i implemented it right away after i consciously saw
Geoff Clare's "--" proposal, who got the according credit.
This was 02:30 am and therefore many hours before your message to
the NetBSD mailing list.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Interpretation starting for a 30 day review (1440)

2021-10-29 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <9905.1635549...@jinx.noi.kre.to>:
 |Date:Fri, 29 Oct 2021 16:42:48 -0500
 |From:Eric Blake 
 |Message-ID:  <20211029214248.5rsezyh4wvgl2...@redhat.com>
 |
 |
 || Another thing to consider: if enough implementations fix things NOW to
 || use "--" in system() and popen(), then by the time we actually DO
 || release Issue 8, it will already be common enough practice to
 || standardize it.
 |
 |I think you'd need to delay Issue8 by quite a long time, perhaps even
 |to the extent of making it be 203x rather than 202x, for that to be
 |practical.
 |
 |If the only issue was whether implementations can claim posix conformance,
 |it would be less of an issue, but what is more important is whether users
 |reading what posix says works, find that's correct when they try it on the
 |system they're actually running.   Old systems hang around for a long time,
 |it takes ages for them to gradually phase out to the extent where no-one
 |really cares about them any more.

Dear Robert Elz, on the other hand

  #?127|unstable9s:$ /usr/xpg4/bin/sh  -c -- "echo Robert, let's sally"
  Robert, lets sally
  #?0|unstable9s:$ ll /usr/xpg4/bin/sh
  -r-xr-xr-x   1 root bin   202164 Mar 19  2012 /usr/xpg4/bin/sh*

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Interpretation starting for a 30 day review (1440)

2021-10-29 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <8552.1635508...@jinx.noi.kre.to>:
 |Date:Fri, 29 Oct 2021 09:51:09 +0100
 |From:"Andrew Josey via austin-group-l at The Open Group" \
 |
 |Message-ID:  <5bf8909a-6cc2-4089-87c1-5fac762fa...@opengroup.org>
 ...
 || 0001440: System Interfaces Calling `system("-some-tool")` fails (althoug\
 || h it is a valid `sh` command)   
 ...
 |I object to this one.
 ...
 |ps: as it happens, I am (or should be if I was not wasting time replying
 |to this) testing a change to NetBSD that adds the "--" in both system()
 |and popen() ... but that we will (I expect) have an implementation that
 |would conform with the proposed text does not mean that it is the correct
 |thing for POSIX to specify.

I also added this right away to all use cases that go via
$SHELL -c for the mailer i maintain.  This has nothing much to do
with system(3) of course, but i was disappointed that i read over
the fact that the command string is not an argument to -c, and
that undesired side effects are possible in all released versions
of this MUA!

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-28 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello Paul Smith, all.

Paul Smith wrote in
 :
 |On Sat, 2021-08-28 at 02:38 +0200, Steffen Nurpmeso wrote:
 |> So to get to the gory details was a bit more complicated than i first
 |> thought, but i can offer a public domain patch for GNU make (git
 |> [012918bf11])
 |
 |Hi Steffen.  Thanks for working on these changes.

Well you wanted to have it in December.  I was only a little late.

 |Please start a discussion of this on the bug-m...@gnu.org mailing list;

No, no sorry, i do not want to start a discussion on a GNU list?
I really have absolutely no more spare time nor blood to loose,
i implemented the patch because it is needed to bring this issue
forward.

 |that's where all patches and discussion of new features should happen,
 |so that all users of and contributors to GNU make can participate.

The final version of a public domain patch that even should be
patch(1)able into the version of make that you maintain can be
downloaded from Mantis, i think

  https://www.austingroupbugs.net/file_download.php?file_id=57=bug

should do it (tested here, but i do not trust such trackers).
It survives -fsanitize=address.

 |I haven't looked at this in detail so I can't comment on the approach
 |taken here.

I hope the comments are self-explanatory.
I personally define prototypes first, which makes it look nicer,
than having a recursive workhorse before the function that uses
it, that is.  Which makes it a bit difficult to look at in order.

The implementation abstract is as follows:

 - When we see .WAIT, define automatic dependencies that
   explicitely relate anything after .WAIT to anything before
   .WAIT.  Like this the normal dependency tracking can kick in.

 - Since all the mentioned targets may not yet have been defined
   at the time this automatic dependency creation is performed,
   wait until the makefile is "fixated" (you may call it
   "snapped"), shortly before actual execution starts.

   At that time we know all targets exist, so recurse into all the
   according prerequisites and define automatic dependencies also
   for those, so that they are also covered by the normal
   dependency tracking.

 - All the targets for which should be .WAITed for will block
   until all their commands have been processed.

This in effect establishes strict file-global .WAIT ordering.

A nice Sunday i wish from Germany,
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-27 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Dear Paul Smith, all.

So to get to the gory details was a bit more complicated than
i first thought, but i can offer a public domain patch for GNU
make (git [012918bf11]) that for

  .WAIT:
  all: lib ham .WAIT bin bin2 ;@echo all command
  lib: lib/.stamp; @echo lib command
  ham: ;@echo ham command
  lib/.stamp: ;cd lib && $(MAKE)
  bin: bin/.stamp; @echo bin command
  bin/.stamp: ;cd bin && $(MAKE)
  bin2: ; @echo bin2 command

and "~/src/.gmake.git/make -j4" prints

  cd lib && /home/steffen/src/.gmake.git/make
  ham command
  make[1]: Entering directory '/tmp/z/lib'
  lib-x command
  lib-x after sleep
  lib/all command
  make[1]: Leaving directory '/tmp/z/lib'
  lib command
  cd bin && /home/steffen/src/.gmake.git/make
   bin2 command
  make[1]: Entering directory '/tmp/z/bin'
  bin-x command
  bin/all command
  make[1]: Leaving directory '/tmp/z/bin'
  bin command
  all command

instead of ("make -j4")

  cd lib && make
  cd bin && make
  ham command
  bin2 command
  make[1]: Entering directory '/tmp/z/lib'
  lib-x command
  make[1]: Entering directory '/tmp/z/bin'
  bin-x command
  bin/all command
  make[1]: Leaving directory '/tmp/z/bin'
  bin command
  lib-x after sleep
  lib/all command
  make[1]: Leaving directory '/tmp/z/lib'
  lib command
  all command

For your example, Paul Smith:

  .WAIT:
  all: bar foo
  bar: two one
  foo: one .WAIT two
  one two foo bar: ; @echo $@; sleep 3

it does

  $ time ~/src/.gmake.git/make -j4
  one
  two
  bar
  foo
  real0m9.017s

instead of

  $ time make -j4
  two
  one
  bar
  foo
  real0m6.015s

aka

  $ time bmake -j4
  --- two ---
  --- one ---
  --- two ---
  two
  --- one ---
  one
  --- bar ---
  --- foo ---
  --- bar ---
  bar
  --- foo ---
  foo
  real0m6.038s

So this is something the core developers then need to think about.
The patch for GNU make below ensures that a .WAIT directive causes
a file-wide prioritization, across all targets, to take place.

It of course needs a fresh eye and some polishing, but i think it
looks good.  It fixes problems of the first patch, too (not
case-insensitive hash_table for "file" names, possible off by one
for all_dep stuff, no \n line separation for the eval_buffer()
case already present without late dependencies).  I hope
i followed style guidelines.

It has one problem left, for which i would appreciate input.
I think for complicated dependency trees we yet could possibly
enter endless loops for cyclic dependencies when doing

  static void
  a_file_wait__late_recur (..
  ...
  fp = lookup_file (dep);
  assert (fp != NULL);

  for (dp = fp->deps; dp != NULL; dp = dp->next)
if (dp->file != NULL && !file_wait_is_needed (dp->file))
  a_file_wait__late_recur (wlp, dp->file->name, 0);

Breaking cyclic/multiple dependencies there should be done, but
yet is not.  I was thinking about adding a temporary hash_table
here.

Other than that i really think this looks good?
It would be tremendous to be able to use .WAIT also in GNU make,
with the patch below i can comment out .NOTPARALLEL: and .WAIT: in
the MUA i maintain and it still synchronizes the right way. :)

Ciao, and a nice weekend.  I will do an audit and add a hash_table
to the snippet above, to avoid calling __late_recur() for a name
we yet have seen -- this should do it, then?


diff --git a/src/file.c b/src/file.c
index 765037507a..221a887c96 100644
--- a/src/file.c
+++ b/src/file.c
@@ -36,6 +36,113 @@ this program.  If not, see .  
*/
only work on files which have not yet been snapped. */
 int snapped_deps = 0;
 
+/* All file deps in the global namespace for which new_job() needs to wait
+ * until all commands have been processed (as via .NOTPARALLEL:).
+ * In order to be able to create all necessary dependency relationships we need
+ * the latter, it is resolved within snap_deps.
+ * These will be initialized once the first such dep is seen. */
+struct a_file_wait_late
+{
+  struct a_file_wait_late *last;
+  char **targets;
+  char *deps;
+  size_t deps_len;
+};
+
+static struct hash_table *a_file_wait;
+static struct a_file_wait_late *a_file_wait_late;
+
+static unsigned long
+a_file_wait_hash_1 (const void *key)
+{
+  return_ISTRING_HASH_1 ((const char *) key);
+}
+
+static unsigned long
+a_file_wait_hash_2 (const void *key)
+{
+  return_ISTRING_HASH_2 ((const char *) key);
+}
+
+static int
+a_file_wait_hash_cmp (const void *x, const void *y)
+{
+  return_ISTRING_COMPARE ((const char *) x, (const char *) y);
+}
+
+static void
+a_file_wait_add(struct hash_table **htpp, const void *cvp)
+{
+  void **slot;
+
+  if (a_file_wait == NULL)
+{
+  a_file_wait = xmalloc (sizeof (*a_file_wait));
+  hash_init (a_file_wait, 256,
+_file_wait_hash_1, _file_wait_hash_2, _file_wait_hash_cmp);
+}
+
+  if (*(slot = hash_find_slot (*htpp, cvp)) == NULL)
+hash_insert_at (*htpp, cvp, slot);
+}
+
+static void
+a_file_wait__late_recur (const struct a_file_wait_late *wlp, const char *dep,
+int

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-26 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20210827004938.ufzjl%stef...@sdaoden.eu>:
 ...
 |+ if (any)
 |+   {
 |+ for (tmp = new; tmp != curr; tmp = tmp->next)
 |+   {
 |+ l = strlen(tmp->name) +1;
 |+ buf = xrealloc(buf, all_len + 1 + l);

And this can be optimized, we already query all_dep_len at the
beginning, so one xrealloc() before the loop is enough.
I'll have this already here, but have to look again tomorrow
anyhow.

Good night from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-26 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Steffen Nurpmeso wrote in
 <20210825133359.wvpz4%stef...@sdaoden.eu>:
 |Steffen Nurpmeso wrote in
 | <20210821231356.n0d_i%stef...@sdaoden.eu>:
 ||Paul Smith wrote in
 || :
 |||On Sat, 2021-08-21 at 02:24 +0200, Steffen Nurpmeso via austin-group-l
 |||at The Open Group wrote:
  ...

So here is a two hour draft.  Sorry it is a bit late but someone
reported an "error" with the mailer i maintain; in fact i wonder
a bit, OpenSSL with blocking sockets (don't get me started on
blocking network I/O) and SO_RCVTIMEO set starts a loop because
the socket returns "Resource temporarily unavailable", as it
should, but all in all that loop goes over the quarter of an hour,
yet i have /proc/sys/net/ipv4/tcp_keepalive_time==300, so it seems
some timer is restarted there, or i was stunned.  Sorry.

So this draft is exactly like i said it'll be.  Paul Smith's
example

  #?0|kent:y$ cat makefile
  .WAIT:
  all: bar foo
  bar: two one
  foo: one .WAIT two
  one two foo bar: ; @echo $@; sleep 3

without the patch goes

  #?130|kent:y$ time make -j4
  two
  one
  bar
  foo

  real0m6.012s
  user0m0.009s
  sys 0m0.007s
  #?0|kent:y$ time ~/src/.gmake.git/make -j4
  one
  two
  bar
  foo

  real0m9.017s
  user0m0.013s
  sys 0m0.007s

My own example

   cd /tmp
   mkdir -p z/bin z/lib
   cd z
   cat >makefile <<-'_EOT'
.WAIT:
all: lib .WAIT bin ;@echo all command
clean: ; rm -f bin/.stamp lib/.stamp
lib: lib/.stamp; @echo lib command
lib/.stamp: ;cd lib && $(MAKE)
bin: bin/.stamp; @echo bin command
bin/.stamp: ;cd bin && $(MAKE)
_EOT
   cat >bin/makefile <<-'_EOT'
all: bin-x; @echo bin/all command | tee .stamp
bin-x: ;@echo bin-x command
_EOT
   cat >lib/makefile <<-'_EOT'
all: lib-x; @echo lib/all command | tee .stamp
lib-x: ; @echo lib-x command; sleep 3; echo lib-x after sleep
_EOT

changes like so

   #?0|kent:z$ make clean;make -j4
   rm -f bin/.stamp lib/.stamp
   cd lib && make
   cd bin && make
   make[1]: Entering directory '/tmp/z/lib'
   make[1]: Entering directory '/tmp/z/bin'
   lib-x command
   bin-x command
   bin/all command
   make[1]: Leaving directory '/tmp/z/bin'
   bin command
   lib-x after sleep
   lib/all command
   make[1]: Leaving directory '/tmp/z/lib'
   lib command
   all command

   #?0|kent:z$ make clean;~/src/.gmake.git/make -j4
   rm -f bin/.stamp lib/.stamp
   cd lib && /home/steffen/src/.gmake.git/make
   cd bin && /home/steffen/src/.gmake.git/make
   make[1]: Entering directory '/tmp/z/lib'
   make[1]: Entering directory '/tmp/z/bin'
   lib-x command
   bin-x command
   bin/all command
   make[1]: Leaving directory '/tmp/z/bin'
   lib-x after sleep
   lib/all command
   make[1]: Leaving directory '/tmp/z/lib'
   lib command
   bin command
   all command

So it is a partial success yet only, because it does not add
dependencies recursively.  bin: as such is synchronized, but
bin/.stamp: is not, even though it is a dependency of bin:,
therefore it executes concurrently.  I will look into this
tomorrow.

The patch as such is sofar really easy, global hashmap, and
generated auto-dependencies.  It is a bit longer than it should be
since C-style string building has to be used.

First try to add support for .WAIT
---
 src/file.c| 120 --
 src/filedef.h |   7 +++-
 src/job.c |   2 +-
 src/read.c|  18 ++---
 4 files changed, 137 insertions(+), 10 deletions(-)

diff --git a/src/file.c b/src/file.c
index 765037507a..5beda25015 100644
--- a/src/file.c
+++ b/src/file.c
@@ -36,6 +36,11 @@ this program.  If not, see <http://www.gnu.org/licenses/>.  
*/
only work on files which have not yet been snapped. */
 int snapped_deps = 0;
 
+/* All file deps in the global namespace for which new_job() needs to wait
+ * until all commands have been processed (as via .NOTPARALLEL:).
+ * Will be initialized for the first such dep */
+struct hash_table *file_wait_deps /* = NULL */;
+
 /* Hash table of files the makefile knows how to make.  */
 
 static unsigned long
@@ -442,9 +447,11 @@ remove_intermediates (int sig)
 
 /* Given a string containing prerequisites (fully expanded), break it up into
a struct dep list.  Enter each of these prereqs into the file database.
+   dep_targets_to_eval must be NULL but for first expansion pass, where it will
+   be used to create the necessary dependency relations for .WAIT:.
  */
 struct dep *
-split_prereqs (char *p)
+split_prereqs (char *p, char **dep_targets_to_eval)
 {
   struct dep *new = PARSE_FILE_SEQ (, struct dep, MAP_PIPE, NULL,
 PARSEFS_NONE);
@@ -472,6 +479,70 @@ split_prereqs (char *p)
 ood->ignore_mtime = 1;
 }
 
+  /* At each occurrance of .WAIT we want to place barriers on first pa

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hallo Jörg, all,

Joerg Schilling wrote in
 <20210825152559.ctbfz%sch...@schily.net>:
 |"Steffen Nurpmeso via austin-group-l at The Open Group"  wrote:
 |> Now it has to be said, GNU make supports an immense number of
 |> special cases, pattern expansions etc., and this makes me wonder
 |> whether the standard says anything to this.
 |> Because, _if_ the standard would allow
 |> 
 |>   FOO = .WAIT
 |>   BAR: a $(FOO) b
 |> 
 |> then -- i have not yet really looked into that -- it seems GNU
 |> make uses "double expansion" and the above approach would possibly
 |> no longer work because.
 |> Is there wording in the standard that this is allowed?  Is this
 |> desirable?  Shall there be words that forbid such usage of .WAIT,
 |> or any other special target?  For example, in GNU make, i see
 |
 |We discussed this in the teleconference and since I am using that feature \
 |in 
 |the schily makefile system, I expect this to work and I believe that our 
 |current wording requires it.
 |
 |The background is that make, while parsing
 |
 | BAR: a $(FOO) b
 |
 |immediately expands $(FOO) in the reader, before the rest of the parser \
 |can 
 |see it.

Well i hope so *indeed*!!

Tomorrow i have time and will look into GNU make more thoroughly;
if it *that* way, then i think the even better approach would be
to just use a hash map -- there is hash stuff available! -- of all
dependencies <-> targets which require waiting, and do not fiddle
around with structures etc. at all!  ..Under the premise that
file-wide target names are unique this should work!

Thus the solution to implement .WAIT: for GNU make would be very,
very simple, just adapt split_prereqs() and set a global hashmap
of dependencies which require waiting, and in new_job() look up
whether the "file" name is a member of that hashmap!  That easy it
could be in the end!!  (And i am hoping for it ;) (And of course
we would need to ensure that inter-dependencies are created, to
enable the usual dependency mechanism to spring into existence.)

Thanks Jörg.
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-25 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

Steffen Nurpmeso wrote in
 <20210821231356.n0d_i%stef...@sdaoden.eu>:
 |Paul Smith wrote in
 | :
 ||On Sat, 2021-08-21 at 02:24 +0200, Steffen Nurpmeso via austin-group-l
 ||at The Open Group wrote:
 ...
 ||If we really wanted to implement this the easiest possible way, it
 ||would be simpler to implement like this: whenever a .WAIT target is
 ||seen, GNU make pauses ALL new jobs and waits for all existing jobs to
 ||complete before it proceeds.  That would reduce the value of -j when
 ||you use .WAIT but it would meet the requirements of the wording AFAICT.

Looking into GNU make code, it seems to me that the order-only
deps are a wrong way to go.

I *think* i could come up with a draft, it is a bit early, but
i stumbled over a question to the list, thus i write this now, the
idea would be a bit like the following.

. make(1)'s normal dependency system is sufficient to deal with
  the problem, except that commands are run asynchronously,
  without waiting, unless explicitly requested so.  In GNU make
  this decision is made in new_job(), called from
  execute_file_commands(), where that "file" is a target indeed.

  new_job() will get its ticket for running (ie the jobserver
  etc. gives the job a "go" if an upper limit on concurrent jobs
  is given), and once it has this it does

/* The job is now primed.  Start it running.
   (This will notice if there is in fact no recipe.)  */
start_waiting_job (c);

if (job_slots == 1 || not_parallel)
  /* Since there is only one job slot, make things run linearly.
 Wait for the child to die, setting the state to 'cs_finished'. */
  while (file->command_state == cs_running)
reap_children (1, 0);

  Here not_parallel <-> .NOTPARALLEL:.
  The idea would be to change this to

if (file->needs_wait || job_slots == 1 || not_parallel)

  which seems to satisfy the desire in a simple test makefile
  tree, because the call chain then does not return and thus all
  targets which depend on that target will have to wait in return.

  P.S.: it is interesting to see that completion of commands is
  not a necessary precondition to mark a target worked!
  I.e., in fact _i_ (like i said, i gave up two+ decades ago, and
  use makefile generators in perl and now shell / awk, which use
  simplemost approaches) would have expected this to be the
  default.

- Sprinkle some "unsigned int needs_wait:1".

- change split_prereqs() to look out for .WAIT.
  'Takes a new string buffer parameter that is filled to create
  the necessary "normal" "target: deps" later on via eval_buffer()
  at the end of eval() _if_ it is not NULL there.
  "Set the sprinkled needs_wait."
  Remove .WAIT from the list of targets -- only split_prereqs()
  knows about it, would this work out?

Now it has to be said, GNU make supports an immense number of
special cases, pattern expansions etc., and this makes me wonder
whether the standard says anything to this.
Because, _if_ the standard would allow

  FOO = .WAIT
  BAR: a $(FOO) b

then -- i have not yet really looked into that -- it seems GNU
make uses "double expansion" and the above approach would possibly
no longer work because.
Is there wording in the standard that this is allowed?  Is this
desirable?  Shall there be words that forbid such usage of .WAIT,
or any other special target?  For example, in GNU make, i see

  /* Perform second expansion ...
...
  if (second_expansion)
...
  /* Expand .SUFFIXES: its prerequisites are used for $$* calc.  */
  f = lookup_file (".SUFFIXES");

So this cannot be masked via a variable as such, otherwise it
would not yet be expanded here?

What do you think about above approach, Paul Smith?
And what does the list say to that variable masking of special
targets as such?

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-23 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Steffen Nurpmeso wrote in
 <20210823213019.wtxaf%stef...@sdaoden.eu>:
 |Steffen Nurpmeso wrote in
 | <20210821231356.n0d_i%stef...@sdaoden.eu>:
 ...
 |That striked me as another worthwile addition to POSIX make(1)
 |maybe, i always forget about this: the .PHONY target like
 ...
 |It seems to be supported by all makes i have at hand, which
 |includes SunPro make now that Jörg Schilling has this also in his
 |schilytools.

It *is* already applied!  As #523!!
Wonderful.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-21 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Paul Smith wrote in
 :
 |On Sat, 2021-08-21 at 02:24 +0200, Steffen Nurpmeso via austin-group-l
 |at The Open Group wrote:
 ...
 |Without knowing what you were trying to do all I can say is that the

My only intent ever was to build the projects i am using, and so
that the build process is portable.  This i personally found
impossible to do but by unrolling it all in perl (anyway needed
because of OpenSSL, already by then; today sh(1) and awk(1) etc.),
and generated portable makefiles.

 |entire PURPOSE of make is to impose a reliable ordering on the build.
 |
 |I mean, you can just say:
 |
 |  foo: bar
 |
 |and now you have a reliable ordering that bar will be completed before
 |foo can be started, regardless of how high your -j value is or where
 |else these targets appear in the makefile.
 |If that's not sufficient for some particular use-case (and it might not
 |be, as I discussed in my comments) then yes, more effort may be needed.
 |
 |>   .WAIT: is in use in the BSD world in their holistic makefile
  |>   system for decades, and it enables people to write natural
 |>   looking makefiles that can be understood at a glance, even after
 |>   long work hours, and even after coming back to some project that
 |>   has not been looked at in a long time.
 |
 |Yes, I already said they're easy to read.  But, they're easy to misuse
 |and misunderstand.  Maybe that's a tradeoff that works fine in specific
 |situations (for example, BSD makefiles typically use .WAIT inside lists
 |of singleton "top level" targets like subdirectories so they don't run
 |into these issues because the targets only appear in only one target's
 |prerequisite list) but "easy to read and hard to use reliably" does not
 |make a great candidate for POSIX, at least IMO.

Of course you are right, the BSD makefile system has grown and
been written to allow top level makefiles like for example file(1)

  SUBDIR= lib .WAIT bin

  .include 

plus the subdirectory makefiles.  In sofar some kind of easily
injectable synchronization / barrier was needed.

On the other hand it were the GNU people who invented and pushed
forward (by usage) things like __attribute__((..)) because #pragma
never was anything but a portability mess.
So isn't it strange that i now could use _Pragma even in ISO
C code (and ({ ... }) would be very nice for RHV macro injections,
for example for gettext()/alike optimizing injections), and pretty
much easily port C code with __attribute__(()), but have to go "A:
B C\nB: |C\n" etc. for GNU make?
Just from the argumentational side of the road this is a bit
schizophrenic.
Of course GNU cc and GNU make are distinct programs, but when
seeing this in context it seems odd.

 |>   Consider for example
 |> 
 |> openssl/Makefile:SUBDIR=   lib .WAIT bin
 |
 |Why not just say:
 |
 |bin: lib
 |
 |?

Due to the BSD makefile framework that is de-facto a programming
environment.  I confirm that with a makefile like

  cd /tmp
  mkdir -p z/bin z/lib
  cd z
  cat >makefile <<-'_EOT'
#.WAIT:
#all: lib .WAIT bin ;@echo all command
all: lib bin ;@echo all command
clean: ; rm -f bin/.stamp lib/.stamp
lib: lib/.stamp; @echo lib command
lib/.stamp: ;cd lib && $(MAKE)
bin: bin/.stamp; @echo bin command
bin/.stamp: lib/.stamp;cd bin && $(MAKE)
_EOT
  cat >bin/makefile <<-'_EOT'
all: bin-x; @echo bin/all command | tee .stamp
bin-x: ;@echo bin-x command
_EOT
  cat >lib/makefile <<-'_EOT'
all: lib-x; @echo lib/all command | tee .stamp
lib-x: ; @echo lib-x command
_EOT

yes, this works.  But for bmake which does not echo "lib command"
with or without the .WAIT way of doing things, i think we have
found a bug there.  I'll mail Simon.

But this is a very easy Makefile, and see how hard it gets with
stamp files, and in normal projects, even more those with trees,
inter-dependency etc. are very very hard to track and keep right.
Especially over the years, with several people working on the
makefiles, and patching individual parts of them.

  $ wc -cl /x/src/git.git-no_reduce/Makefile
  3399 106066 /x/src/git.git-no_reduce/Makefile

But the necessity to have an option to be explicit is not called
into question anyway.

 |> openssl/lib/Makefile:SUBDIR+= .WAITlibssl  # depends
 |> on libcrypto
 |
 |Why not just say:
 |
 |  libssl: libcrypto
 |
 |? and now you don't even need the comment: it's self-documenting.

But even with GNU make files where dependencies and such are
dynamically created things are not as easy as you say.

 |Also this form doesn't depend on the order that the SUBDIR variable was
 |created with: the .WAIT version doesn't actually say "you can only
 |build libssl after libcrypto", it says "everything before this point in
 |the SUBDIRS variable must be completed

Re: [1003.1(2016/18)/Issue7+TC2 0001437]: make: (document .NOTPARALLEL and .WAIT special targets) in RATIONALE

2021-08-20 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 :
 ...
 |https://austingroupbugs.net/view.php?id=1437 
 ...

Unfortunately i cannot edit in Mantis.  I wanted to add


  Words like quick-dirty and hack i think are really displaced for
  the nice and elegant solution that .WAIT: is.
  I remember doing GNU make files and how i had to mess with stamp
  files in order to get at least a somehow reliable ordering.
  It may be some time but still.
  It is anyway not a surprise that people no longer write
  makefiles but instead use automatized generators which do, and
  create correct logic, so all this is a non-issue there.

  .WAIT: is in use in the BSD world in their holistic makefile
  system for decades, and it enables people to write natural
  looking makefiles that can be understood at a glance, even after
  long work hours, and even after coming back to some project that
  has not been looked at in a long time.
  Consider for example

openssl/Makefile:SUBDIR=   lib .WAIT bin
openssl/lib/Makefile:SUBDIR+= .WAITlibssl  # depends on 
libcrypto

  which scales OpenSSL from 1 to X, or

xorg/server/xorg-server/Makefile:SUBDIR=  doc include .WAIT
xorg/server/xorg-server/Makefile:SUBDIR+= damageext composite config 
.WAIT
xorg/server/xorg-server/Makefile:SUBDIR+= .WAIT hw
xorg/server/xorg-server/hw/xfree86/Makefile:SUBDIR+=  .WAIT utils
xorg/server/xorg-server/hw/xfree86/Makefile:SUBDIR+=  .WAIT Xorg

  See how easy a human maintainer can define and control even very
  complicated build systems, and group it logically just as
  desired.  Or consider bind:

bind/Makefile:SUBDIR+= lib .WAIT libexec bin
bind/lib/Makefile:SUBDIR+= libisc .WAIT libdns libisccc .WAIT libisccfg 
.WAIT libbind9 libirs

  Today it is only because of GNU make luckily supporting
  .NOTPARALLEL: that portable makefiles which can make use of -j
  can be written at all, _if_ following Wheeler's "everything can
  be done with an indirection", as shown in the introductional
  comment.

  It seems to me GNU make could easily implement .WAIT:, even
  though with slightly different semantics, by creating a "pipe"
  rule of all prerequisites yet encountered on a line when
  a .WAIT is seen?

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: utilities and write errors

2021-07-01 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20210701104540.GA4023@localhost>:
 |Robert Elz wrote, on 29 Jun 2021:
 ...
 |As above, this is all irrelevant to what the standard requires.
 |
 |As far as implementation detail goes, obviously if pwd uses stdio
 |buffering then in order to conform to the standard it must explicitly
 |fflush(stdout) and check there was no write error before exiting. 
 |I see from later in the thread that mksh has now been patched to do
 |exactly that. (Thanks Thorsten.)

Also the next version of my (POSIX mailx) mailer will do this.
Commit as of 06-28, with credits to you (Geoff Clare, Robert Elz).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: Fwd: Re: [1003.1(2016/18)/Issue7+TC2 0001436]: make: add "-j max_jobs" option to support simultaneous rule processing

2021-05-24 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Paul Smith wrote in
 :
 |On Sat, 2021-05-22 at 00:59 +0200, Steffen Nurpmeso via austin-group-l
 |at The Open Group wrote:
 |> Then stating something like "i am process X, and my parent is Y"
 |> etc.  And the rest being up to the make(1) implementor as a quality
 |> of implementation (scheduling, fair even more so, seems to reside in
 |> the area of very complicated programming).  I was surprised to see
 |> that rule content matters at all.
 |
 |The problem is that depending on the implementation, the parent make
 |might need to pass actual resources to the child make instance.  Those
 |resources could negatively impact child processes that are NOT make.
 | In that case, make needs to know whether the thing it is invoking is
 |another make or not, so it knows whether to pass those resources.
 |
 |As an example, in GNU make today we use a simple pipe to implement this
 |feature, which means child makes need to have the open file descriptors
 |for the ends of the pipe provided to them when they are started.
 |
 |But, some other programs that are not make, might not work well if they
 |are invoked with extra file descriptors already open, and if they write
 |garbage into the jobserver pipe then it will break things.  So in this
 |implementation make would want to close-on-exec those file descriptors
 |before invoking the sub-process, IF it knows the sub-process is not a
 |make instance.
 |
 |Of course, for an implementation which can assume POSIX most likely
 |this is NOT the method that would be chosen to implement the jobserver
 |as it leads to complications.  GNU make, however, is part of a system
 |bootstrap toolchain (needed to build a compiler for an old system to
 |start to update it for example) so I try to keep it portable to very
 |old systems.
 |
 |Nevertheless, I've been considering switching GNU make's implementation
 |to a named pipe (mkfifo) or a POSIX semaphore.  These implementations
 |would resolve all the above issues (and others, such as blocking/non-
 |blocking FDs etc.) since only a sub-make would access the named pipe.
 |
 |
 |Anyway, that's not really relevant but just FYI as to why (I suspect)
 |the standard was worded like this.

Thanks for the explanation.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Fwd: Re: [1003.1(2016/18)/Issue7+TC2 0001436]: make: add "-j max_jobs" option to support simultaneous rule processing

2021-05-24 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello.

It is kind of redundant, but i think i forward the message once
again, now that my subscription is reestablished.

--- Forwarded from Steffen Nurpmeso  ---
Date: Fri, 21 May 2021 17:57:20 +0200
From: Steffen Nurpmeso 
Subject: Re: [1003.1(2016/18)/Issue7+TC2 0001436]: make: add "-j max_jobs" 
option to support simultaneous rule processing

Austin Group Bug Tracker wrote in
 <79557df8f0e0d0b53548449c40247...@austingroupbugs.net>:
 |https://austingroupbugs.net/view.php?id=1436 
 ...
 |-- 
 | (0005362) rhansen (manager) - 2021-05-20 17:08
 | https://austingroupbugs.net/view.php?id=1436#c5362 
 |-- 
 |We think we have achieved consensus on a rewrite of the description of the
 |-j option; see "attempt #3" on line 65 of
 |https://posix.rhansen.org/p/2021-05-20. Feedback would be appreciated. 

I am not a honourable make(1) programmer but since i opened the
issue i want to state that i liked it when i read it last night.
I was actually surprised to see the issue reopened as such,
because my thinking would have been that a -j parallelized make(1)
enters some kind of jobserver mode that becomes established via
some environmental setting (or whatever the programmer chooses aka
can easily be found by subprocesses), the existence of which is
checked by make(1) at startup.  Then stating something like "i am
process X, and my parent is Y" etc.  And the rest being up to the
make(1) implementor as a quality of implementation (scheduling,
fair even more so, seems to reside in the area of very complicated
programming).  I was surprised to see that rule content matters at
all.

A nice weekend i wish from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Fwd: Re: [1003.1(2016/18)/Issue7+TC2 0001436]: make: add "-j max_jobs" option to support simultaneous rule processing

2021-05-24 Thread Steffen Nurpmeso via austin-group-l at The Open Group

--- Forwarded from Steffen Nurpmeso  ---
Date: Fri, 21 May 2021 17:57:20 +0200
From: Steffen Nurpmeso 
To: Austin Group Bug Tracker 
Subject: Re: [1003.1(2016/18)/Issue7+TC2 0001436]: make: add "-j max_jobs" 
option to support simultaneous rule processing
Message-ID: <20210521155720.khyhn%stef...@sdaoden.eu>
OpenPGP: id=EE19E1C1F2F7054F8D3954D8308964B51883A0DD; 
url=https://ftp.sdaoden.eu/steffen.asc; preference=signencrypt

Austin Group Bug Tracker wrote in
 <79557df8f0e0d0b53548449c40247...@austingroupbugs.net>:
 |https://austingroupbugs.net/view.php?id=1436 
 ...
 |-- 
 | (0005362) rhansen (manager) - 2021-05-20 17:08
 | https://austingroupbugs.net/view.php?id=1436#c5362 
 |-- 
 |We think we have achieved consensus on a rewrite of the description of the
 |-j option; see "attempt #3" on line 65 of
 |https://posix.rhansen.org/p/2021-05-20. Feedback would be appreciated. 

I am not a honourable make(1) programmer but since i opened the
issue i want to state that i liked it when i read it last night.
I was actually surprised to see the issue reopened as such,
because my thinking would have been that a -j parallelized make(1)
enters some kind of jobserver mode that becomes established via
some environmental setting (or whatever the programmer chooses aka
can easily be found by subprocesses), the existence of which is
checked by make(1) at startup.  Then stating something like "i am
process X, and my parent is Y" etc.  And the rest being up to the
make(1) implementor as a quality of implementation (scheduling,
fair even more so, seems to reside in the area of very complicated
programming).  I was surprised to see that rule content matters at
all.

A nice weekend i wish from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

 -- End forward <20210521155720.khyhn%stef...@sdaoden.eu>

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2021-02-06 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello Robert.

Robert Elz wrote in
 <12854.1612654...@jinx.noi.kre.to>:
 |Date:Sat, 06 Feb 2021 21:55:19 +0100
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20210206205519.43rln%stef...@sdaoden.eu>
 |
 || Fiddling with bytes is something completely different.
 |
 |But how is the shell supposed to know?
 |
 |Consider
 | U1=$'\u021c'
 | U2=$'\u0a47'

The shell has to convert \u.  It checks.

 | X1=$'\310\234'
 | X2=$'\340\251\207'
 |
 |Then  S1="${U1}${U2}"
 | S2="${X1}${X2}"

The shell just takes and concatenates bytes.

 |Or worse, given a script j2 containing just:  printf '%s%s' "$1" "$2"
 |
 | S3="$( j2 "$U1" "$U2" )"
 | S4="$( j2 "$X1" "$X2" )"
 |
 |Then given any (valid, existing, but otherwise unconstrained) unicode
 |code points for U1 and U2 (with the sole exception of \u000A simply because
 |of the way command substitutions eat newlines), and the corresponding
 |encodings not using \u in X1 and X2, which of those lines (the assignments
 |to U1, U2, X1, X2, S1, S2, S3, or S4) should the shell ever generate any
 |kind of error?

But no, that was not what i said.  You have to convert the \u when
you parse it, and can apply the Unicode rules as you go, having
the target character set in mind.  If the target is "Unicode like
POSIX does it" aka UTF-8, then you can perform full UTF-32 to
UTF-8 validity checking.

[I pasted utf32_to_utf8 and vice versa, but then removed it again.]

Otherwise you can only test the overall codepoint (less than or
equal to 0x10).
Well, that is at least how i do it.

What i do not mean is that you retest whether the resulting UTF-8
sequence is valid, but offering the possibility to the user would
also be nice, for example, to validate user input after it has
been cleaned from several constructs.

[It is unfortunately a slow operation.]

 |||the string, it has no idea how the script will interpret it, nothing
 |||requires that a $'\u' value ever be used as "characters" (though
 |||that would be a common use).
 ||
 || I disagree.  Invalid \u \U should either remain unconverted or
 || result in the Unicode replacement character (U+FFFD) to be used
 || instead
 |
 |That's not disagreeing, or not with what I meant.   I have no problem
 |with generating an error (better than silently making a replacement
 |char I think) for invalid \u conversions (\uDEAF for example).

Note that, for Unicode, it is _the_ replacement character (�).

 |What I was referring to is the opinion, sometimes stated, that certain
 |combinations of unicode characters are invalid (as a unicode character
 |sequence).   That is, above, with carefully selected U1 and U2, some
 |people would say that S1 can be invalid.   That I do not think is reason\
 |able
 |to expect of the shell - it is up to the application to get those right.

Well having composition and decomposition aka normalization is far
far away, that much is plain.  Attack vector over attack vector,
sequences that become invalid or join or _do not_ join if such
things happen.  Better to use perl(1) for text processing, it has
tremendously powerful Unicode processing capabilities.

 |But of course, a single \u code point should be converted properly.
 |I thought I said that last time.

Fine.

 || as the starting point of conversion .. to UTF-8 or locale
 || via iconv(3)
 |
 |Ignoring the bit about converting to other replacement chars, here,
 |since I'm concerned with valid codepoints only, I don't think the
 |shell should be converting this kind of thing via iconv() ... utilities
 |might (including built-ins in sh, like echo or printf) but not the
 |shell itself.  In the above (assuming I did the conversions correctly)
 |it should always be the case that $U1 = $X1 and $U1 = $X2, regardless

But if you look around and try $'' sequences in bash for example
you will find that \u sequences just will not do what you want
here.  \u is a Unicode codepoint, and so something purely textual
to the core.

Well i think this all roots in informatics coming from the wrong
direction, may "speech synthesis" have been an early point of
interest or not.  The interface was not even truly US-ASCII at
first, on TUHS there was just recently a thread on that.  6 BIT
ALL UPPERCASE, packed sequences with multiple "characters" per
storage unit, and all that.  This never had anything to do with
human communication aka linguistic communication, it was instead
about communicating human desire to computer language.

What became Unicode changed that.  Yes, we now have emojis or
however these are spelled, and cute robotic eyes show us heart.
So situation has changed somewhat.

 |of any locale settings.  If I cannot assume that when writing a script
 |then I have no idea how I would ever do anything with non-ascii chars
 |reliably.
 |
 || But in my opinion \u \U should not be mutilated but allow the f

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2021-02-06 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <15313.1612563...@jinx.noi.kre.to>:
 |Date:Fri, 05 Feb 2021 21:54:52 +0100
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20210205205452.7tbl2%stef...@sdaoden.eu>
 |
 |
 || Well .. if i recall correctly quoting inside of ${xYz} has been
 || clarified not too long ago
 |
 |Not the way that you seem to think.

For sure.  I really had to look and read it in context.

 |||And last (for now anyway), after "set -- A B C" what's the effect of
 |||$'pfx\${@}sfx' ?
 ||
 || This is interesting.  I would say it is identical to ${*} here.
 |
 |In that case $'' could not be the only quoting mechanism that users use.

Yes.  It can be nailed down to that.

 || My MUA just turns it into UTF-8 (via a utf32_to_utf8 function that
 || uses the Unicode replacement character for erroneous codepoints)
 |
 |The generation of the UTF-8 is not the issue, and the (relatively few)
 |values that are reserved can be handled.

And i will not go nail down in return.

 || You have to be careful a bit with Unicode.  There are guarantees
 || that must be fulfilled, see for example [1].  Since the shell is
 || producing UTF-8 it should ensure that no invalid UTF-8 sequences
 || are exposed to consumers.
 |
 |Of course.
 |
 |But: users are permitted to write $'\xfc\x13' and similar, and no-one
 |suggests that the shell should validate such sequences for valid UTF-8
 |encoding, and nor would anyone (I hope) claim the shell should object
 |to $'\u0207\xfc\x13' just because it happens to have a \u in it.
 |This is all just bits until it gets used somehow, at which point if
 |it is invalid, then so be it.

In a standards context i disagree.  Pacta sunt servanda.  This
stands for "Treu und Glauben" ("Good faith") which is §242 of the
BGB (Bürgerliches Gesetzbuch aka Civil Code of Germany).

Fiddling with bytes is something completely different.  If you
want to create that in the shell you can use \x or \OCTAL or what,
but if you go \u or \U then a valid Unicode codepoint (or whatever
mutilated range ISO standardizes for \u \U escape sequences)
should be expected that successfully passes a conforming
UTF-32-to-X conversion.  That is my opinion.

 ||   When a process interprets a code unit sequence which purports to
 ||   be in a Unicode character encoding form, it shall treat
 ||   ill-formed code unit sequences as an error conddition and shall
 ||   not interpret such sequences as characters.
 |
 |That has to be a requirement on the application, not upon the programming
 |language implementation (the shell here) - when the shell is converting

Yes, you can always use \x or \OCTAL to break constraints if you
want to.  This is the flexibility of the programming language
POSIX shell, that is very much text-bound, however.  But you could
create explicit binary strings with $'' and \x / \OCTAL as well as
\uU, which is much better than what we have, where often the bytes
as such are embedded in strings.  Or uuencoded, or base64 encoded,
in order to be decoded once needed.

 |the string, it has no idea how the script will interpret it, nothing
 |requires that a $'\u' value ever be used as "characters" (though
 |that would be a common use).

I disagree.  Invalid \u \U should either remain unconverted or
result in the Unicode replacement character (U+FFFD) to be used
instead as the starting point of conversion .. to UTF-8 or locale
via iconv(3) (then likely resulting in other replacement
character(s) for non-buggy implementations).

But in my opinion \u \U should not be mutilated but allow the full
range of Unicode aka ISO 10646, if i recall (i have not reread the
thread nor re-looked at ISO) correctly artificial restrictions
where imposed on the range of allowed characters by ISO.

A nice weekend i wish.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2021-02-05 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello Robert.

Robert Elz wrote in
 <14199.1612519...@jinx.noi.kre.to>:
 |Date:Thu, 04 Feb 2021 21:59:52 +0100
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20210204205952.fw6wv%stef...@sdaoden.eu>
 |
 || Ok, of course, but let me disagree with the latter.  Bizarre rules
 || and Bourne/Korn shell etc ... just look at ${aXb} and quoting
 || rules within.
 |
 |Two things .. first, I agree, the quoting rules that exist now are
 |bizarre, and weird, and just a royal pain to deal with (both for
 |users and implementors) - which is one reason I'm loath to add yet
 |another difference.
 |
 |And second, I meant bizarre in a different way, it was probably
 |the wrong word (there are reasons, many of them, why I write code, and
 |not novels, nor, or at least very rarely, even academic papers),
 |what I meant was that inside the shell, we have to deal with single
 |quoted strings (which are very easy, as they're very simple, and which
 |includes both ' and \ quoting), and double quoted strings, which are
 |messy and cause problems, but which we have generally managed to conquer.
 |Adding a third, somewhat in between form, where most of the text is
 |literal, but where $ expansions (but I am assuming not ` expansions)
 |happen, when doing so adds no new functionality, just perhaps a slightly
 |simpler syntax for the user, just seems like the wrong thing to do.

If, and only with this if, it would become standardized it could
replace the other quoting mechanisms, not in the shell, but from
the user point of view.

The good thing about $'' is that nothing happens, just like in
a single-quoted string, unless you see a reverse solidus.  No
fancy rules unless you get triggered to do so.

And i have not implemented it yet, but i already document \`{} as
a future extension that will allow command evaluation, then.
Note this is Plan9 rc syntax (`{command}), which should detect
nesting easier, just like $() does.  I do not expect that to be
implemented by a POSIX shell.  It is a MUA in the end :)
That one documents

   '\$NAME'
   Non-standard extension: expand the given variable name,
   as above.  Brace enclosing the name is supported.

 |That, and while you can do whatever you like in your MUA, we have to
 |deal with the rest of sh syntax ... eg: what happens to a ' that occurs
 |inside a \$ expansion in your scheme (that is, as part of its text, \
 |not its 
 |result)?  Does that terminate the $' string, and perhaps lead to an
 |invalid $ expansion, or do things nest?   Does that include inside \
 |${var:=foo}
 |(etc) type expansions where currently (if inside quotes) quoting in the foo
 |word doesn't work (except some \ quoting) - if so, then we have a whole new
 |expansion syntax to deal with, and if not, then what do we make of a ' that
 |occurs there?  Or what of a \' there?Do $' expressions nest?

Well .. if i recall correctly quoting inside of ${xYz} has been
clarified not too long ago -- i would expect the entire $''
context to be yielded and resumed once the ${xYz} construct has
been handled.  I *think* that is what has to happen with them
inside of "", so it should be just the same.  Except that it was
triggered by \$.. not by $.. as it would in double-quoted strings.
I think that would be the most natural take.

 |First in the simple cases, like
 | $'whatever \$( cmd $'arg' ) and more'
 |where I assume that answer would be yes, and similarly in
 | $'xxx \${var%$'\n'} yyy'
 |but also as a simple insertion
 | $'abc \$'\t' def'
 |where doing so makes no sense at all, and so the answer is probably
 |"not allowed", but that is then the one $ "expansion" which isn't
 |allowed inside $' strings, which is yet another special case.
 |
 |Also, if a command substitution were embedded using \$( ) inside a $'
 |string, what conversions (if any) are performed upon the stdout of the
 |command before being embedded in the string, are \ escapes there expected
 |to work?  (Same question for a variable expansion).
 |
 |Similarly, what does $'\${var-"two words"}' generate, and
 |$'\${var-\"two words\"}'  (assuming var is unset naturally).  Or using '
 |instead of " in both of those?

All that, to me, yield $'', resume once construct has been
handled.

 |And last (for now anyway), after "set -- A B C" what's the effect of
 |$'pfx\${@}sfx' ?

This is interesting.  I would say it is identical to ${*} here.

 |At least once we either drop \u, or properly define how it is supposed
 |to work (if anyone actually has an idea what that is), $' is entirely the
 |same as ' once the internal expansions are done (as part of lexical \
 |analysis)
 |so is trivial to add, makes it easier to encode some strings (just easier,
 |nothing that cannot already be done) and is trivial to implement.  Adding
 |\$ to that would (I think, I haven't tried to actually do it) complicate
 |everything.   Of course, since $' is properly spec

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2021-02-04 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hello Robert.

Robert Elz wrote in
 <9394.1612390...@jinx.noi.kre.to>:
 |Date:Wed, 03 Feb 2021 18:35:01 +0100
 |From:    Steffen Nurpmeso 
 |Message-ID:  <20210203173501.srcqv%stef...@sdaoden.eu>
 |
 || What else.  Having \$ would be nice,
 |
 |It doesn't exist in shells, so it cannot be in the standard.
 |
 || that i do not understand reluctance of you all.
 |
 |For me, it is unnecessary (sure, it might make user input fractionally
 |cleaner, but adds nothing that cannot already be done) - and it turns
 |$'' from being essentially a single quoted string (once the escapes are
 |processed, which is entirely a parse time activity) into a bizarre form
 |of double quoted string, which needs expansion at execution time.   That
 |complicates the implementation, and for the minor benefit it offers,
 |it just isn't worth it.

Ok, of course, but let me disagree with the latter.  Bizarre rules
and Bourne/Korn shell etc ... just look at ${aXb} and quoting
rules within.  (What i mean is: this does not come naturally, at
all.)  And in general, how long would it take to re-understand the
tests you have committed for NetBSD shell a few years ago, without
any comments describing what they do -- what a mess, at least to
my brain!  Temporarily suspend and expand until the \$ aka \${}
construct is fully expanded, then continue "single quote reading
with \XY expansion, that sounds easy.

But hey, i do not want to block progress.  There are minorities
who may not know about the number 0, but still coding standards
put them under the digit system umbrella.  If my words are
blocking issue 249, then i take them back, and in the end making
\U and \u compatible to some ISO standard is state of the art.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2008)/Issue 7 0000249]: Add standard support for $'...' in shell

2021-02-03 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Robert Elz wrote in
 <11054.1612115...@jinx.noi.kre.to>:
 |Date:Sun, 31 Jan 2021 06:48:25 +
 |From:"Austin Group Bug Tracker via austin-group-l at The \
 |Open Group" 
 |Message-ID:  <79086278e43eeebd97f64b7f45613...@www.austingroupbugs.net>
 |
 || A NOTE has been added to this issue.
 |
 |This comment isn't worthy of a note, but
 |
 || As most of the remaining issues are with $'\u' and $'\U', I
 || would suggest that it be dropped for issue8 for now.
 |
 |what is the "it" you're suggesting dropping (or deferring)?
 |
 |The whole of $'...' or just the (two) \u escapes inside $''
 |I'd like to see $'' included, but if the only way to do that is to
 |omit the \u (both) escape sequences, I could live with that, particularly
 |as exactly how the shell should use unicode chars is still very much
 |uncertain (eg: if I want to write a case statement that would match
 |various currency symbols, just how do I encode that?  Does it depend
 |upon the user's current locale, if so, how do I write a portable
 |script (do I need to iconv constant strings?), and if not, how is the
 |user's input supposed to match, particularly if they're not using
 |a UTF-8 locale.
 |
 |There's lots more work needed (initially by implementers, not here)

Letting aside the \u stuff which currently goes via iconv(3) (and
thus likely causes replacement to occur in case the locale
character set cannot handle), not without reiterating that the
real future proof approach would be to require iconv(3) to handle
Unicode grapheme boundaries, and that in turn meaning that
multiple \u must be interpreted in sequence because Unicode is not
about single codepoints but at least potentially graphemes aka
real characters which are formed of multiple adjacent individual
codepoints.

I am not standing in your way, it is only about commenting that it
is worthwhile noting that quoted ranges should extend to the
maximum length possible in order to allow all languages of the
world to benefit from internationalization efforts (sic).

What else.  Having \$ would be nice, i have it for the little MUA
i maintain.  If you just look at this simple shell snippet, and
i could have quoted other things, though admittedly

 chown '"${user}"':'"${group}"' '"${user}"' || exit 6
 echo 0 > '"${user}"'/"'"${datfile}"'"
 chmod 0600 '"${user}"'/"'"${datfile}"'"

could be quoted as unities, hmm.
Anyhow with $'' in its best epiphany, so to say, there would be
a single flow of progression, and so much nicer to the human eye

  -c $'
  ...
 chown \${user}:\${group} \${user} || exit 6
 echo 0 > \${user}/"\${datfile}"
 chmod 0600 \${user}/"\${datfile}"
  '

that i do not understand reluctance of you all.
Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: clarification needed: shell 'exec' + function (builtin, ???)

2020-12-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hallo Jörg.

Joerg Schilling wrote in
 <2020124507.zccfs%sch...@schily.net>:
 |"Steffen Nurpmeso via austin-group-l at The Open Group"  wrote:
 |> Joerg Schilling wrote in
 |>  <20201210004945.i3n8e%sch...@schily.net>:
 |>|Steffen Nurpmeso  wrote:
 |>|> this is an iconv(3)-related error that was fixed in later version
 |>|> of the mailer you use.  The very error came up on the ML this
 ..
 |>|You are correct,
 |> 
 |> Yep -- unfortunately.
 |
 |I meanwhile discovered (from a hint in another mail in this thread \
 |that I canot 
 |answer) that the trigger may be an embedded nul character.
 |
 |The s-nail error message was:
 |
 | "Failed to prepare composed message"

So you could not fully read it .. and tried to respond to the
"half-seen" message nonetheless?
Regardless, it is fixed in the version that will become obsolete
tomorrow.

 |and the mail display (before I tried to answer) stopped before the \
 |line that
 |contained that nul character. Again saving the mail to a file and using \
 |iconv(1)
 |did result in "useful" converted output.
 |
 |There seem to be two things that need to be handled in a way that never \
 |causes
 |a mail (regardless of the content) to prevent reading or answering.
 |
 |- EILSEQ should not result in shortened mails or errors that abort
 | work completely for that mail.
 |
 |- Characters that cause EILSEQ should be transformed into "something"
 | in the output that at least is a hint for that problem.

Ah, it is not that alone, people are playing games and inject
invalid bytes in base64 streams and such.  We handle these cases
well for a remarkable long time, when viewed in context.
It is a pity that the software is not yet at a point where we can
simply log such occurrances though, even though we can join
successive error messages in the style known from syslog.  (At the
beginning we did, but it could have caused hundreds of log
messages.)

 |I am not sure whether that helps, but I remember from my experiences from 
 |mkisofs that iconv() from glibc has a bug and ignores *outbytesleft. This
 |frequently results in reading or writing too much data.

In the software i maintain i found references to problems
regarding character sets and iconv(3).  I have seen _very_ strange
names like (that is phantasie) western-subset-european-10646 or
something like that (that was on UnixWare).  How could anyone deal
with that portably?

Decades ago i was hm foolishly prowd of my
way of generating character set names, because we normalized them
splitting at boundaries which included numbers (so utf-8 and utf8
would end up as one entry "utf 8" to test).  One decade ago the
Python people did not like it (and also separate(d) with
hyphen-minus not whitespace which i found strange.

Today (what a ridiculous regress!) i have to write code like

 static char const * const names[] = {"csASCII", "cp367", "IBM367", "us",
   "ISO646-US", "ISO_646.irv:1991", "ANSI_X3.4-1986", "iso-ir-6",
   "ANSI_X3.4-1968", "ASCII", "US-ASCII"};

to find out whether anything is actually defined in US-ASCII,
because no official interfaces exist which give someone a hint.

Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: iconv() EILSEQ (was: clarification needed: shell 'exec' + function)

2020-12-11 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Geoff Clare wrote in
 <20201211100245.GA1627@localhost>:
 |Steffen Nurpmeso wrote, on 10 Dec 2020:
 |>
 |> While talking about iconv, i got closed glibc bug[1] as "resolved
 |> invalid", but wouldn't you all agree that in the following
 |> 
 |>   #include  
 |>   #include 
 |>   #include 
 |>   #include 
 |>   int main(void){
 |>  char inb[16], oub[16], *inbp, *oubp;
 |>  iconv_t id;
 |>  size_t inl, oul;
 |> 
 |>  memcpy(inbp = inb, "a\303\244c", sizeof("a\303\244c"));
 |>  inl = sizeof("a\303\244c") -1;
 |>  oul = sizeof oub;
 |>  oubp = oub;
 |> 
 |>  if((id = iconv_open("ascii", "utf8")) == (iconv_t)-1)
 |>return 1;
 |>  fprintf(stderr, "Converting %lu <%s>\n",(unsigned long)inl, inbp);
 |>  if(iconv(id, , , , ) == (size_t)-1){
 |> fprintf(stderr, "Fail <%s>\n", strerror(errno));
 |> return 2;
 |>}  
 |>  fprintf(stderr, "GOT <%s>\n", oub);
 |>  iconv_close(id);
 |>  return 0;
 |>}
 |> 
 |> you should get replacement characters out of the box?
 |
 |That depends entirely on how the implementation defines the codeset
 |it calls "ascii" (if it has one at all).
 |
 |If the "ascii" codeset is defined as having 0-127 as the only valid
 |characters, the standard requires that iconv() fails with EILSEQ.
 |If it is defined as having 0-255 as valid characters, the standard
 |requires that all of the input characters are converted.

I think you are mistaken here Geoff, EILSEQ is only defined for
the source/input character set.  I will open an issue on Monday
after i released this MUA, mayb it would be better to say on
p. 1123 lines 38001 ff. 

 If a sequence of input bytes does not form a valid character in
 the specified [input] codeset

where [input] would be new.
I am depletively convinced that the above should be covered by

  38014 If iconv( ) encounters a character in the input buffer
that is valid, but for which an identical
  38015 character does not exist in the target codeset, iconv( )
shall perform an implementation-defined
  38016 conversion on this character.

I recall issues flying by revolving iconv, i have a look on
Monday, maybe .. i can do something.  I personally find it
disturbing that the above behaviour is in the world for at least
two and a half years without any audible user echoes.

 |If you change your program to use the codeset name returned by

Oh!  That was a reproducer only, Geoff!

 |nl_langinfo(CODESET) instead of hard-coded "ascii" then you would
 |have a stronger case that iconv() should not give an EILSEQ error,
 |since the standard requires the POSIX locale to have 256 valid
 |single-byte characters.

You have read the bugreport and refer to the setlocale(3) call
that i did not use regardless of having been pointed to it.
Your remark made me remember and reread issue 663.

A nice weekend i wish from Germany,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: clarification needed: shell 'exec' + function (builtin, ???)

2020-12-10 Thread Steffen Nurpmeso via austin-group-l at The Open Group

[I bring back austin-group-l, ok?

Thorsten Glaser wrote in
 :
 |Steffen Nurpmeso dixit:
 |
 |>  #include 
 |>  #include 
 |>  #include 
 |>  #include 
 |>  int main(void){
 |> char inb[16], oub[16], *inbp, *oubp;
 |> iconv_t id;
 |> size_t inl, oul;
 |>
 |> memcpy(inbp = inb, "a\303\244c", sizeof("a\303\244c"));
 |> inl = sizeof("a\303\244c") -1;
 |
 |Not -1 otherwise oub will not be NUL-terminated and end with junk:
 |
 |$ ./a.out
 |Converting 4 
 |GOT 

Sure thing.  Just like below.  Normally stack pages are cow forked
from zero if i understand that right.  But maybe i do not.

 |Without the trailing NUL, stateful conversation may also be
 |incomplete…
 |
 |> oul = sizeof oub;
 |> oubp = oub;
 |>
 |> if((id = iconv_open("ascii", "utf8")) == (iconv_t)-1)
 |>   return 1;
 |
 |Throws 1 because you need "utf-8", but with it, see above.

Well names and iconv are a thing.  Especially regarding Unicode
(and nl_langinfo(CODESET), if i remember UnixWare right).

 |> fprintf(stderr, "Converting %lu <%s>\n",(unsigned long)inl, inbp);
 |> if(iconv(id, , , , ) == (size_t)-1){
 |>fprintf(stderr, "Fail <%s>\n", strerror(errno));
 |>return 2;
 |>}
 |> fprintf(stderr, "GOT <%s>\n", oub);
 |> iconv_close(id);
 |> return 0;
 |>}
 |>
 |>you should get replacement characters out of the box?
 |
 |Citrus iconv agrees. Its manpage says:
 |
 | If the string pointed to by *src contains a character which is valid
 | under the source codeset but can not be converted to the destination
 | codeset, the character is replaced by an "invalid character" which
 | depends on the destination codeset, e.g., '?', and the conversion \
 | is con-
 | tinued. iconv() returns the number of such "invalid conversions".

That was my thinking.
Thanks for confirming this.

 --End of 

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: mail encoding not-fun (was Re: clarification needed: shell 'exec' + function (builtin, ???))

2020-12-10 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Thorsten Glaser via austin-group-l at The Open Group wrote in
 :
 |Steffen Nurpmeso via austin-group-l at The Open Group dixit:
 |
 |>|This is because m4.opengroup.org runs qmail, the arsehole under the MTAs,
 |>|which auto-converted the mail from quoted-printable to 8bit, sending it
 |>|as 8bit even to MTAs that don't offer 8BITMIME (I configured my sendmail
 |>|not to do that as well, so I got the same truncated mail back :( other
 |>|than qmail, exim is known to break the MIME and SMTP standards like \
 |>|that).
 |>
 |>Naaah, not true Thorsten.  At least this time.
 |
 |This one *is* correct, as I got the broken message back as well.
 |It contains an embedded NUL.
 |
 |But apparently, this was not the cause of J�rg’s problem ☻

Evil, you.  Hey, i live also on IRC since ~1.5 years for the first
time ever, and on this Linux-Distro (just released 3.6 two days
ago) here is someone active from Düsseldorf, and i now sometimes
listen to La Düsseldorf from La Düsseldorf.  (Of course Neu! and
Lilo Engel are less populist.)  Lots of longing for the 70s here.
Tja.

All he would need to do would be to upgrade to a newer version,
i think i innocently prod him two times on that.

 |>Related to my MUA.
 |[…]
 |>I have been able to save the mail as file and to run iconv(1) on the \
 |>content.
 |
 |oic
 |
 |>Maybe a problem is that the first missing line is a line with a character \
 |>that
 |>is not part of ISO-8859-1
 |
 |Yes, of course, I have been writing in UTF-8 for a while.

Not so here, even though i have ~/.kent.xmodmaprc keycode
adjustments for some German and French quotation marks, i find it
hard to go "random Unicode".  I usually use altgr/g-a in vim and
the look in UnicodeData.txt to find the codepoint ;)

 |[00:02]  gecko: benutzt du emacs ?
 |[00:03]  nö  [00:03]  nur n normalen mac

Graphical selection may be a winner here, indeed.

 |[00:04]  argl   [00:04]  ne den editor
 | -- Vutral und gecko2 in #deutsch (NB: Editor? Betriebssystem.)
 --End of 

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: clarification needed: shell 'exec' + function (builtin, ???)

2020-12-10 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hallo Jörg, all,

Joerg Schilling wrote in
 <20201210004945.i3n8e%sch...@schily.net>:
 |Steffen Nurpmeso  wrote:
 |> this is an iconv(3)-related error that was fixed in later version
 |> of the mailer you use.  The very error came up on the ML this
 |> year[1], basically you use LATIN1 on your box, as could be
 |> expected, but Thorsten is known to be a Unicode character
 |> "junkie", so to say.
 |
 |You are correct,

Yep -- unfortunately.

 |I have been able to save the mail as file and to run iconv(1) on the \
 |content.

Yes, we temporarily did not restart for ILSEQ, if your prompt
would include "set prompt='\${^ERRNAME}', for example, you would
have seen that an error happened.
But of course we are tolerant for weird base64, so we should be
tolerant for weird iconv, thus i "restored the original
behavirour", so to say.

That reminds me of iconv weirdness regarding hard-to-test
replacement characters, which makes testing really hard.  Wasn't
there an issue on that going on, being able to specify it
explicitly, and whether it stands for an entire character or for
by-byte sequences would be a great improvement.

While talking about iconv, i got closed glibc bug[1] as "resolved
invalid", but wouldn't you all agree that in the following

  #include  
  #include 
  #include 
  #include 
  int main(void){
 char inb[16], oub[16], *inbp, *oubp;
 iconv_t id;
 size_t inl, oul;

 memcpy(inbp = inb, "a\303\244c", sizeof("a\303\244c"));
 inl = sizeof("a\303\244c") -1;
 oul = sizeof oub;
 oubp = oub;

 if((id = iconv_open("ascii", "utf8")) == (iconv_t)-1)
   return 1;
 fprintf(stderr, "Converting %lu <%s>\n",(unsigned long)inl, inbp);
 if(iconv(id, , , , ) == (size_t)-1){
fprintf(stderr, "Fail <%s>\n", strerror(errno));
return 2;
 }  
 fprintf(stderr, "GOT <%s>\n", oub);
 iconv_close(id);
 return 0;
  }

you should get replacement characters out of the box?
I said by then

   $ /tmp/zt
   Converting 4 
   Fail 

  whereas musl gives

   $ ./zt
   Converting 4 
   GOT 

and i still think musl is totally right (also by giving only one
replacement character.

  [1] https://sourceware.org/bugzilla/show_bug.cgi?id=22908

 |Maybe a problem is that the first missing line is a line with a character \
 |that
 |is not part of ISO-8859-1

Yes, transliteration should possibly be possible.
On the other hand, if i change the above to

   if((id = iconv_open("ascii//TRANSLIT", "utf8")) == (iconv_t)-1)

i get

  Converting 4 
  GOT 

and with

   if((id = iconv_open("ascii//TRANSLIT", "utf8")) == (iconv_t)-1)

we are back at the error.

Ciao,

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: clarification needed: shell 'exec' + function (builtin, ???)

2020-12-09 Thread Steffen Nurpmeso via austin-group-l at The Open Group

austin-group-l@opengroup.org wrote in
 :
 |Joerg Schilling via austin-group-l at The Open Group dixit:
 |
 |>here is where the original mail ended for me. Interesting that you did get
 |
 |This is because m4.opengroup.org runs qmail, the arsehole under the MTAs,
 |which auto-converted the mail from quoted-printable to 8bit, sending it
 |as 8bit even to MTAs that don't offer 8BITMIME (I configured my sendmail
 |not to do that as well, so I got the same truncated mail back :( other
 |than qmail, exim is known to break the MIME and SMTP standards like that).

Naaah, not true Thorsten.  At least this time.
Related to my MUA.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: clarification needed: shell 'exec' + function (builtin, ???)

2020-12-09 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Hallo Jörg,

Joerg Schilling wrote in
 <2020120933.yyo5w%sch...@schily.net>:
 |"shwaresyst via austin-group-l at The Open Group"  wrote:
 ...
 |> Hi *,
 |
 |Hi,
 |
 |here is where the original mail ended for me. Interesting that you did get
 |more content. Is there any idea, why I received only the first line \
 |from the
 |original mail?

this is an iconv(3)-related error that was fixed in later version
of the mailer you use.  The very error came up on the ML this
year[1], basically you use LATIN1 on your box, as could be
expected, but Thorsten is known to be a Unicode character
"junkie", so to say.

commit a9ec20d6
    Author: Steffen Nurpmeso 
AuthorDate: 2020-04-23 17:29:57 +0200

Fix: "revert" [ab0cd3b8] from 2017-10-20.. (Claus Assmann)..

(FIX iconv for main body part (since EVER!) (Doug McIlroy,
Random832)..) was nonsense in sofar as we now generated ILSEQ
errors, but without giving any error message, so that users
(without nice *prompt*) normally had no indication of what
happened, but looked at a partial message.  This is inacceptible,
so instead simply use replacement until we possibly have a better
way out at some later time.

  This changeset is in v14.9.19.

  [1] https://www.mail-archive.com/s-mailx@lists.sdaoden.eu/msg01013.html

P.S.: i will release v14.9.20 before Christian Christmas, and that
should just work neatly on SchilliX, too, including readily
prepared catman manual (to be downloaded as an extra).

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

Re: [1003.1(2004)/Issue 6 0001419]: Missing newline / indentation in .SCCS_GET default rule definition

2020-11-07 Thread Steffen Nurpmeso via austin-group-l at The Open Group

Austin Group Bug Tracker wrote in
 <8d658048f7dbed7f9cd0b38c954c2...@austingroupbugs.net>:
 |The following issue has been SUBMITTED. 
 |== 
 |https://austingroupbugs.net/view.php?id=1419 
 ...
 |Apologies for not including the page and line number: I'm not sure where to
 |get a copy of the standard with this information included.

Interesting!  I convert with pdftotext from "poppler", which
i install only for that purpose.  (But it is not that large.)

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)

1 2 >

1 - 100 of 174 matches

Mail list logo