Re: ps and AIX field descriptors

2023-02-22 Thread David Wright
On Tue 21 Feb 2023 at 16:06:48 (+0100), Andreas Leha wrote:
> David Wright  writes:
> > On Mon 20 Feb 2023 at 10:39:21 (+0100), Andreas Leha wrote:
> >> Greg Wooledge  writes:
> >> > On Sun, Feb 19, 2023 at 12:04:22PM -0600, David Wright wrote:
> >> >> But even that's not enough
> >> >> because the field width is somewhat variable: try   ps -eo '%c  |  %z  
> >> >> |  %a'
> >> >> (We can still use | to make the problem somewhat more obvious.)
> >> >
> >> > Oh wow.  Yeah, OK, that's not really solvable.
> >> >
> >> > For those who don't want to try to reverse engineer David's conclusion,
> >> > or who don't just happen to stumble upon it with their current process
> >> > list, here's what I'm seeing:
> >> >
> >> > COMMAND  | VSZ  |  COMMAND
> >> > systemd  |  164140  |  /sbin/init
> >> > kthreadd |   0  |  [kthreadd]
> >> > rcu_gp   |   0  |  [rcu_gp]
> >> > rcu_par_gp   |   0  |  [rcu_par_gp]
> >> > [...]
> >> > steamwebhelper   |  4631064  |  
> >> > /home/greg/.steam/debian-installation/[...]
> >> > [...]
> >> > chrome_crashpad  |  33567792  |  
> >> > /opt/google/chrome/chrome_crashpad_handler[...]
> >> > [...]
> >> > kworker/3:0-eve  |   0  |  [kworker/3:0-events]
> >> >
> >> > ps appears to guess an initial maximum width for the VSZ field, but
> >> > when a value comes along that exceeds the guessed maximum, it simply
> >> > shoves the field barrier over.  It doesn't even become the new maximum,
> >> > with all of the fields aligning after that.  It's just a one-time shove,
> >> > breaking the current line only.
> >> >
> >> > Therefore, parsing the header line cannot give us enough information to
> >> > insert field separators correctly in body lines after the fact.
> >> 
> >> 
> >> Dear all,
> >> 
> >> Thanks for chiming in.  The example was indeed simplified and I am using
> >> %a which can contain internal whitespace.
> >> 
> >> This is the command I was using previously:
> >> 
> >>   ps -eo '%p|%c|%C' -o "%mem" -o '|%a' --sort=-%cpu

 

> >> I now replaced it with
> >> 
> >>   ps -eo '%p %c %C' -o "%mem" -o ' %a' --sort=-%cpu  | sed -E 's/([0-9]+) 
> >> (.+) ([0-9]+.?[0-9]?) ([0-9]+.?[0-9]?) (.+)/\1|\2|\3|\4|\5/'
> >>  
> >> This works, but is of course cumbersome to maintain.
> >> 
> >> Again, thanks for all the comments!
> >
> > I think there are a few too many assumptions in there;
> > in particular, numbers in %a will match patterns designed
> > to match cpu and mem, because you can't prevent sed from
> > being greedy (except with the [^ … … ]+ construction, to
> > restrict what it matches).
> >
> > This version makes a few assumptions as well:
> > . that the new format matches the old one (mine) if the
> >   delimiters given are a single space (like '%p %c %C'),
> >   or stripped (like "%mem" and '%a', but not ' %a').
> > . the short command is always 15 chars wide even if all
> >   the commands in the table are shorter, eg with ps -o.
> > . I don't have any of those new-fangled extra-long PIDs
> >   yet today.
> >
> > It might well break if a CPU or MEM is running at 100%.
> > That's not easily tested here.
> >
> > I've reordered the columns on the first pass, so that the
> > numeric ones (with their limited character set) come first,
> > which means I can use an auxiliary character for
> > correcting the spacing. (The spaces between the columns get
> > comingled with the leading spaces of numbers.) The second
> > pass sorts that out and processes the heading.
> >
> > $ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
> > (.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1~\3~\2\4/;' | sed -E 's/([^~]+)~ 
> > ([^~]+)~(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) 
> > /\1|\2|/;s/%MEM COMMAND/%MEM|COMMAND/;' | less
> > $ 
> >
> > This is the same, except I deliberately chose _ for the auxiliary
> > character, knowing that short commands are stuffed with underscores:
> >
> > $ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
> > (.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1_\3_\2\4/;' | sed -E 's/([^_]+)_ 
> > ([^_]+)_(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) 
> > /\1|\2|/;s/%MEM COMMAND/%MEM|COMMAND/;' | less
> > $ 
> >
> > Examples:
> >
> > PID|COMMAND|%CPU %MEM|COMMAND
> >9798|firefox-esr| 2.5  5.8|firefox-esr
> >   16143|Isolated Web Co| 1.8  2.2|/usr/lib/firefox-esr/firefox-esr 
> > -contentproc -childID 11 -isForBrowser -prefsLen 47676 -prefMapSize 232307 
> > -jsInitLen 277276 -parentBuildID 20230214011352 -appDir 
> > /usr/lib/firefox-esr/browser 9798 true tab
> >1242|Xorg   | 1.0  1.4|/usr/lib/xorg/Xorg -nolisten tcp :0 vt1 
> > -keeptty -auth /tmp/serverauth.FxvBp8B7Qn
> > [ … ]
> >   8|mm_percpu_wq   | 0.0  0.0|[mm_percpu_wq]
> >   9|rcu_tasks_rude_| 0.0  0.0|[rcu_tasks_rude_]
> >  10|rcu_tasks_trace| 0.0  0.0|[rcu_tasks_trace]
> >
> > An incestuous one, with -o rather -eo:
> >
> > PID|COMMAND|%CPU 

Re: ps and AIX field descriptors

2023-02-21 Thread Andreas Leha
David Wright  writes:

> On Mon 20 Feb 2023 at 10:39:21 (+0100), Andreas Leha wrote:
>> Greg Wooledge  writes:
>> > On Sun, Feb 19, 2023 at 12:04:22PM -0600, David Wright wrote:
>> >> But even that's not enough
>> >> because the field width is somewhat variable: try   ps -eo '%c  |  %z  |  
>> >> %a'
>> >> (We can still use | to make the problem somewhat more obvious.)
>> >
>> > Oh wow.  Yeah, OK, that's not really solvable.
>> >
>> > For those who don't want to try to reverse engineer David's conclusion,
>> > or who don't just happen to stumble upon it with their current process
>> > list, here's what I'm seeing:
>> >
>> > COMMAND  | VSZ  |  COMMAND
>> > systemd  |  164140  |  /sbin/init
>> > kthreadd |   0  |  [kthreadd]
>> > rcu_gp   |   0  |  [rcu_gp]
>> > rcu_par_gp   |   0  |  [rcu_par_gp]
>> > [...]
>> > steamwebhelper   |  4631064  |  /home/greg/.steam/debian-installation/[...]
>> > [...]
>> > chrome_crashpad  |  33567792  |  
>> > /opt/google/chrome/chrome_crashpad_handler[...]
>> > [...]
>> > kworker/3:0-eve  |   0  |  [kworker/3:0-events]
>> >
>> > ps appears to guess an initial maximum width for the VSZ field, but
>> > when a value comes along that exceeds the guessed maximum, it simply
>> > shoves the field barrier over.  It doesn't even become the new maximum,
>> > with all of the fields aligning after that.  It's just a one-time shove,
>> > breaking the current line only.
>> >
>> > Therefore, parsing the header line cannot give us enough information to
>> > insert field separators correctly in body lines after the fact.
>> 
>> 
>> Dear all,
>> 
>> Thanks for chiming in.  The example was indeed simplified and I am using
>> %a which can contain internal whitespace.
>> 
>> This is the command I was using previously:
>> 
>>   ps -eo '%p|%c|%C' -o "%mem" -o '|%a' --sort=-%cpu
>> 
>> I now replaced it with
>> 
>>   ps -eo '%p %c %C' -o "%mem" -o ' %a' --sort=-%cpu  | sed -E 's/([0-9]+) 
>> (.+) ([0-9]+.?[0-9]?) ([0-9]+.?[0-9]?) (.+)/\1|\2|\3|\4|\5/'
>>  
>> This works, but is of course cumbersome to maintain.
>> 
>> Again, thanks for all the comments!
>
> I think there are a few too many assumptions in there;
> in particular, numbers in %a will match patterns designed
> to match cpu and mem, because you can't prevent sed from
> being greedy (except with the [^ … … ]+ construction, to
> restrict what it matches).
>
> This version makes a few assumptions as well:
> . that the new format matches the old one (mine) if the
>   delimiters given are a single space (like '%p %c %C'),
>   or stripped (like "%mem" and '%a', but not ' %a').
> . the short command is always 15 chars wide even if all
>   the commands in the table are shorter, eg with ps -o.
> . I don't have any of those new-fangled extra-long PIDs
>   yet today.
>
> It might well break if a CPU or MEM is running at 100%.
> That's not easily tested here.
>
> I've reordered the columns on the first pass, so that the
> numeric ones (with their limited character set) come first,
> which means I can use an auxiliary character for
> correcting the spacing. (The spaces between the columns get
> comingled with the leading spaces of numbers.) The second
> pass sorts that out and processes the heading.
>
> $ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
> (.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1~\3~\2\4/;' | sed -E 's/([^~]+)~ 
> ([^~]+)~(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) /\1|\2|/;s/%MEM 
> COMMAND/%MEM|COMMAND/;' | less
> $ 
>
> This is the same, except I deliberately chose _ for the auxiliary
> character, knowing that short commands are stuffed with underscores:
>
> $ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
> (.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1_\3_\2\4/;' | sed -E 's/([^_]+)_ 
> ([^_]+)_(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) /\1|\2|/;s/%MEM 
> COMMAND/%MEM|COMMAND/;' | less
> $ 
>
> Examples:
>
> PID|COMMAND|%CPU %MEM|COMMAND
>9798|firefox-esr| 2.5  5.8|firefox-esr
>   16143|Isolated Web Co| 1.8  2.2|/usr/lib/firefox-esr/firefox-esr 
> -contentproc -childID 11 -isForBrowser -prefsLen 47676 -prefMapSize 232307 
> -jsInitLen 277276 -parentBuildID 20230214011352 -appDir 
> /usr/lib/firefox-esr/browser 9798 true tab
>1242|Xorg   | 1.0  1.4|/usr/lib/xorg/Xorg -nolisten tcp :0 vt1 
> -keeptty -auth /tmp/serverauth.FxvBp8B7Qn
> [ … ]
>   8|mm_percpu_wq   | 0.0  0.0|[mm_percpu_wq]
>   9|rcu_tasks_rude_| 0.0  0.0|[rcu_tasks_rude_]
>  10|rcu_tasks_trace| 0.0  0.0|[rcu_tasks_trace]
>
> An incestuous one, with -o rather -eo:
>
> PID|COMMAND|%CPU %MEM|COMMAND
>1694|bash   | 0.0  0.1|bash
>   23486|ps | 0.0  0.0|ps -o %p %c %C -o %mem -o %a --sort=-%cpu
>   23487|sed| 0.0  0.0|sed -E s/( *[0-9]+) (.{15})( +[0-9.]+ 
> +[0-9.]+) (.*$)/\1~\3~\2\4/;
>   23488|sed| 0.0  0.0|sed -E s/([^~]+)~ 
> 

Re: ps and AIX field descriptors

2023-02-20 Thread David Wright
On Mon 20 Feb 2023 at 10:39:21 (+0100), Andreas Leha wrote:
> Greg Wooledge  writes:
> > On Sun, Feb 19, 2023 at 12:04:22PM -0600, David Wright wrote:
> >> But even that's not enough
> >> because the field width is somewhat variable: try   ps -eo '%c  |  %z  |  
> >> %a'
> >> (We can still use | to make the problem somewhat more obvious.)
> >
> > Oh wow.  Yeah, OK, that's not really solvable.
> >
> > For those who don't want to try to reverse engineer David's conclusion,
> > or who don't just happen to stumble upon it with their current process
> > list, here's what I'm seeing:
> >
> > COMMAND  | VSZ  |  COMMAND
> > systemd  |  164140  |  /sbin/init
> > kthreadd |   0  |  [kthreadd]
> > rcu_gp   |   0  |  [rcu_gp]
> > rcu_par_gp   |   0  |  [rcu_par_gp]
> > [...]
> > steamwebhelper   |  4631064  |  /home/greg/.steam/debian-installation/[...]
> > [...]
> > chrome_crashpad  |  33567792  |  
> > /opt/google/chrome/chrome_crashpad_handler[...]
> > [...]
> > kworker/3:0-eve  |   0  |  [kworker/3:0-events]
> >
> > ps appears to guess an initial maximum width for the VSZ field, but
> > when a value comes along that exceeds the guessed maximum, it simply
> > shoves the field barrier over.  It doesn't even become the new maximum,
> > with all of the fields aligning after that.  It's just a one-time shove,
> > breaking the current line only.
> >
> > Therefore, parsing the header line cannot give us enough information to
> > insert field separators correctly in body lines after the fact.
> 
> 
> Dear all,
> 
> Thanks for chiming in.  The example was indeed simplified and I am using
> %a which can contain internal whitespace.
> 
> This is the command I was using previously:
> 
>   ps -eo '%p|%c|%C' -o "%mem" -o '|%a' --sort=-%cpu
> 
> I now replaced it with
> 
>   ps -eo '%p %c %C' -o "%mem" -o ' %a' --sort=-%cpu  | sed -E 's/([0-9]+) 
> (.+) ([0-9]+.?[0-9]?) ([0-9]+.?[0-9]?) (.+)/\1|\2|\3|\4|\5/'
>  
> This works, but is of course cumbersome to maintain.
> 
> Again, thanks for all the comments!

I think there are a few too many assumptions in there;
in particular, numbers in %a will match patterns designed
to match cpu and mem, because you can't prevent sed from
being greedy (except with the [^ … … ]+ construction, to
restrict what it matches).

This version makes a few assumptions as well:
. that the new format matches the old one (mine) if the
  delimiters given are a single space (like '%p %c %C'),
  or stripped (like "%mem" and '%a', but not ' %a').
. the short command is always 15 chars wide even if all
  the commands in the table are shorter, eg with ps -o.
. I don't have any of those new-fangled extra-long PIDs
  yet today.

It might well break if a CPU or MEM is running at 100%.
That's not easily tested here.

I've reordered the columns on the first pass, so that the
numeric ones (with their limited character set) come first,
which means I can use an auxiliary character for
correcting the spacing. (The spaces between the columns get
comingled with the leading spaces of numbers.) The second
pass sorts that out and processes the heading.

$ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
(.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1~\3~\2\4/;' | sed -E 's/([^~]+)~ 
([^~]+)~(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) /\1|\2|/;s/%MEM 
COMMAND/%MEM|COMMAND/;' | less
$ 

This is the same, except I deliberately chose _ for the auxiliary
character, knowing that short commands are stuffed with underscores:

$ ps -eo '%p %c %C' -o "%mem" -o '%a' --sort=-%cpu | sed -E 's/( *[0-9]+) 
(.{15})( +[0-9.]+ +[0-9.]+) (.*$)/\1_\3_\2\4/;' | sed -E 's/([^_]+)_ 
([^_]+)_(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) /\1|\2|/;s/%MEM 
COMMAND/%MEM|COMMAND/;' | less
$ 

Examples:

PID|COMMAND|%CPU %MEM|COMMAND
   9798|firefox-esr| 2.5  5.8|firefox-esr
  16143|Isolated Web Co| 1.8  2.2|/usr/lib/firefox-esr/firefox-esr -contentproc 
-childID 11 -isForBrowser -prefsLen 47676 -prefMapSize 232307 -jsInitLen 277276 
-parentBuildID 20230214011352 -appDir /usr/lib/firefox-esr/browser 9798 true tab
   1242|Xorg   | 1.0  1.4|/usr/lib/xorg/Xorg -nolisten tcp :0 vt1 
-keeptty -auth /tmp/serverauth.FxvBp8B7Qn
[ … ]
  8|mm_percpu_wq   | 0.0  0.0|[mm_percpu_wq]
  9|rcu_tasks_rude_| 0.0  0.0|[rcu_tasks_rude_]
 10|rcu_tasks_trace| 0.0  0.0|[rcu_tasks_trace]

An incestuous one, with -o rather -eo:

PID|COMMAND|%CPU %MEM|COMMAND
   1694|bash   | 0.0  0.1|bash
  23486|ps | 0.0  0.0|ps -o %p %c %C -o %mem -o %a --sort=-%cpu
  23487|sed| 0.0  0.0|sed -E s/( *[0-9]+) (.{15})( +[0-9.]+ 
+[0-9.]+) (.*$)/\1~\3~\2\4/;
  23488|sed| 0.0  0.0|sed -E s/([^~]+)~ 
([^~]+)~(.{15})(.*)/\1|\3|\2|\4/;s/^( *PID) (COMMAND) 
/\1|\2|/;s/%MEM|COMMAND/%MEM|COMMAND/;
  23489|less   | 0.0  0.0|less

Cheers,
David.



Re: ps and AIX field descriptors

2023-02-20 Thread Andreas Leha
Greg Wooledge  writes:

> On Sun, Feb 19, 2023 at 12:04:22PM -0600, David Wright wrote:
>> But even that's not enough
>> because the field width is somewhat variable: try   ps -eo '%c  |  %z  |  %a'
>> (We can still use | to make the problem somewhat more obvious.)
>
> Oh wow.  Yeah, OK, that's not really solvable.
>
> For those who don't want to try to reverse engineer David's conclusion,
> or who don't just happen to stumble upon it with their current process
> list, here's what I'm seeing:
>
> COMMAND  | VSZ  |  COMMAND
> systemd  |  164140  |  /sbin/init
> kthreadd |   0  |  [kthreadd]
> rcu_gp   |   0  |  [rcu_gp]
> rcu_par_gp   |   0  |  [rcu_par_gp]
> [...]
> steamwebhelper   |  4631064  |  /home/greg/.steam/debian-installation/[...]
> [...]
> chrome_crashpad  |  33567792  |  
> /opt/google/chrome/chrome_crashpad_handler[...]
> [...]
> kworker/3:0-eve  |   0  |  [kworker/3:0-events]
>
> ps appears to guess an initial maximum width for the VSZ field, but
> when a value comes along that exceeds the guessed maximum, it simply
> shoves the field barrier over.  It doesn't even become the new maximum,
> with all of the fields aligning after that.  It's just a one-time shove,
> breaking the current line only.
>
> Therefore, parsing the header line cannot give us enough information to
> insert field separators correctly in body lines after the fact.


Dear all,

Thanks for chiming in.  The example was indeed simplified and I am using
%a which can contain internal whitespace.

This is the command I was using previously:

  ps -eo '%p|%c|%C' -o "%mem" -o '|%a' --sort=-%cpu

I now replaced it with

  ps -eo '%p %c %C' -o "%mem" -o ' %a' --sort=-%cpu  | sed -E 's/([0-9]+) (.+) 
([0-9]+.?[0-9]?) ([0-9]+.?[0-9]?) (.+)/\1|\2|\3|\4|\5/'
 
This works, but is of course cumbersome to maintain.

Again, thanks for all the comments!

Best,
Andreas



Re: ps and AIX field descriptors

2023-02-20 Thread Andreas Leha
Reco  writes:

>   Hi.
>
> On Fri, Feb 17, 2023 at 07:46:23AM +0100, Andreas Leha wrote:
>> Now my question: How can I restore the previous behaviour that allowed
>> other than whitespace separators between fields?
>
> diff -purw procps-3.3.17/ps/sortformat.c procps-4.0.2/src/ps/sortformat.c
> shows me that:
>
> @@ -128,22 +127,24 @@ static const char *aix_format_parse(sf_n
>items = 0;
>walk = sfn->sf;
>/* state machine */ {
> -  int c;
> +  int c = *walk++;
>initial:
> -c = *walk++;
>  if(c=='%')goto get_desc;
>  if(!c)goto looks_ok;
>/* get_text: */
>  items++;
> -  get_more_text:
> +  get_more:
>  c = *walk++;
>  if(c=='%')goto get_desc;
> -if(c) goto get_more_text;
> +if(c==' ')goto get_more;
> +if(c) goto aix_oops;
>  goto looks_ok;
>get_desc:
>  items++;
>  c = *walk++;
> -if(c) goto initial;
> +if(c&!=' ') goto initial;
> +return _("missing AIX field descriptor");
> +  aix_oops:
>  return _("improper AIX field descriptor");
>looks_ok:
>  ;
>
> If you look at "get_more" label, you'll notice that "old" version of
> procps (bullseye's) checked for any character after "%" block.
> "New" one (bookworm's) explicitly checks for space, and goes to
> "aix_oops" in any other case.
>
> And there is no #ifdefs, no environment variable checks, no options
> etc.
>
>
> So, to answer your question - currently the only way to restore the
> behaviour you want is to patch procps and rebuild it.
>
> Reco


Dear Reco,

Thanks for the fast and accurate answer!  What a shame for this change...

Best,
Andreas



Re: ps and AIX field descriptors

2023-02-19 Thread Greg Wooledge
On Sun, Feb 19, 2023 at 12:04:22PM -0600, David Wright wrote:
> But even that's not enough
> because the field width is somewhat variable: try   ps -eo '%c  |  %z  |  %a'
> (We can still use | to make the problem somewhat more obvious.)

Oh wow.  Yeah, OK, that's not really solvable.

For those who don't want to try to reverse engineer David's conclusion,
or who don't just happen to stumble upon it with their current process
list, here's what I'm seeing:

COMMAND  | VSZ  |  COMMAND
systemd  |  164140  |  /sbin/init
kthreadd |   0  |  [kthreadd]
rcu_gp   |   0  |  [rcu_gp]
rcu_par_gp   |   0  |  [rcu_par_gp]
[...]
steamwebhelper   |  4631064  |  /home/greg/.steam/debian-installation/[...]
[...]
chrome_crashpad  |  33567792  |  /opt/google/chrome/chrome_crashpad_handler[...]
[...]
kworker/3:0-eve  |   0  |  [kworker/3:0-events]

ps appears to guess an initial maximum width for the VSZ field, but
when a value comes along that exceeds the guessed maximum, it simply
shoves the field barrier over.  It doesn't even become the new maximum,
with all of the fields aligning after that.  It's just a one-time shove,
breaking the current line only.

Therefore, parsing the header line cannot give us enough information to
insert field separators correctly in body lines after the fact.



Re: ps and AIX field descriptors

2023-02-19 Thread David Wright
On Sat 18 Feb 2023 at 09:53:01 (-0500), Greg Wooledge wrote:
> It should be noted that there appear to be two TYPES of data fields:
> numeric and string.  Look at this example:
> 
> unicorn:~$ ps -o '%C %g %n %p %U %a'
> %CPU RGROUPNI PID USER COMMAND
>  0.0 greg   01010 greg bash
>  0.0 greg   0 2094243 greg ps -o %C %g %n %p %U %a
> 
> The "%CPU", "NI" and "PID" fields are right-justified.  The "RGROUP",
> "USER" and "COMMAND" fields are left-justified.
> 
> This means the header parser will also need to contain knowledge about
> each header -- whether it's left-justified (string) or right- (numeric).

Oh, it's somewhat worse than that. You need to know the maximum length
that can be shown for left-justified strings, and also what the
maximum width of a numeric field is. But even that's not enough
because the field width is somewhat variable: try   ps -eo '%c  |  %z  |  %a'
(We can still use | to make the problem somewhat more obvious.)

> With all those pieces, I think the problem can be "solved", although I
> wouldn't care to write such a thing.  Time spent on writing that
> parser/filter would be better spent advocating to restore the previous
> functionality, IMHO.

We don't know the priorities of the OP, and whether the example was
somewhat simplified just for posing the question. If it wasn't, then
quick and dirty might suffice. That's up to the OP, and what they
consider the chances are of the "fix" being accepted. Much of   man ps
seems to be an exercise in vagueness. Somewhat.

Cheers,
David.



Re: ps and AIX field descriptors

2023-02-18 Thread Greg Wooledge
On Fri, Feb 17, 2023 at 10:28:43PM -0600, David Wright wrote:
> On Fri 17 Feb 2023 at 11:30:43 (-0500), Greg Wooledge wrote:
> > On Fri, Feb 17, 2023 at 09:20:34AM -0600, David Wright wrote:
> > >   $ ps -eo '%p %C' | sed -e 's/\([^ ]\+\) /\1|/;'

> > Eww, GNUisms.
> 
> I don't keep a list of differences to hand, but I guess you'd prefer:
> 
>   $ ps -eo '%p %C' | sed -E 's/([^ ]+) /\1|/;'
> PID|%CPU
>   1| 0.0
>   2| 0.0

That's *slightly* better, in that it works on both GNU and BSD (and
maybe some future edition of POSIX -- I've been told they're considering
adopting the -E flag).  A truly portable version would either use \{1,\}
or would simply repeat itself: [^ ][^ ]*   (The latter is by far the
more common, especially in scripts that target ancient Unixes where \{1,\}
might not work.)

However, a bigger issue is that your command only works for the two-column
case.  It doesn't support more columns:

unicorn:~$ ps -o '%p|%U|%a'
PID|USER|COMMAND
   1010|greg|bash
2093990|greg|ps -o %p|%U|%a
unicorn:~$ ps -o '%p %U %a' | sed -E 's/([^ ]+) /\1|/;'
PID|USER COMMAND
   1010|greg bash
2093858|greg ps -o %p %U %a
2093859|greg sed -E s/([^ ]+) /\1|/;

And even if you extended it in the "obvious" way, it would break down on
columns that can contain internal whitespace (e.g. %a).

> > That aside, a workaround like this is ugly and should
> > not be needed.
> 
> The OP wrote: "How can I restore the previous behaviour that
> allowed other than whitespace separators between fields?"
> 
> If that's the required format, what are the alternatives?

Because data fields can contain internal whitespace, the only way to
parse the output of ps and determine the right spot to put pipelines
(or whatever) would be to parse the header row.  All of the headers
listed under "AIX format specifiers" are free of whitespace.  So, one
could in theory parse that line, determine the column numbers where
each data field will end, and then replace spaces with pipelines in
those column numbers.

It should be noted that there appear to be two TYPES of data fields:
numeric and string.  Look at this example:

unicorn:~$ ps -o '%C %g %n %p %U %a'
%CPU RGROUPNI PID USER COMMAND
 0.0 greg   01010 greg bash
 0.0 greg   0 2094243 greg ps -o %C %g %n %p %U %a

The "%CPU", "NI" and "PID" fields are right-justified.  The "RGROUP",
"USER" and "COMMAND" fields are left-justified.

This means the header parser will also need to contain knowledge about
each header -- whether it's left-justified (string) or right- (numeric).

With all those pieces, I think the problem can be "solved", although I
wouldn't care to write such a thing.  Time spent on writing that
parser/filter would be better spent advocating to restore the previous
functionality, IMHO.



Re: ps and AIX field descriptors

2023-02-17 Thread David Wright
On Fri 17 Feb 2023 at 11:30:43 (-0500), Greg Wooledge wrote:
> On Fri, Feb 17, 2023 at 09:20:34AM -0600, David Wright wrote:
> > On Fri 17 Feb 2023 at 10:05:20 (+0300), Reco wrote:
> > > So, to answer your question - currently the only way to restore the
> > > behaviour you want is to patch procps and rebuild it.
> 
> Fabulous analysis.
> 
> > Or, depending on the context, you could of course restore
> > the appearance of the output with sed:
> > 
> >   $ ps -eo '%p %C' | sed -e 's/\([^ ]\+\) /\1|/;'
> >   PID|%CPU
> > 1| 0.0
> > 2| 0.0
> > 3| 0.0
> > 4| 0.0
> > 6| 0.0
> > [ … ]
> 
> Eww, GNUisms.

I don't keep a list of differences to hand, but I guess you'd prefer:

  $ ps -eo '%p %C' | sed -E 's/([^ ]+) /\1|/;'
PID|%CPU
  1| 0.0
  2| 0.0
[ … ]

> That aside, a workaround like this is ugly and should
> not be needed.

The OP wrote: "How can I restore the previous behaviour that
allowed other than whitespace separators between fields?"

If that's the required format, what are the alternatives?

> This sounds like a bug in procps that should be reported,
> if it hasn't already.

And how long before it's fixed?

As for whether it /is/ a bug, I guess that depends on the
interpretation of somewhat in "This ps supports AIX format
descriptors, which work somewhat like the formatting codes
of printf(1) and printf(3)." That's beyond my pay-grade.

Cheers,
David.


Re: ps and AIX field descriptors

2023-02-17 Thread The Wanderer
On 2023-02-17 at 15:21, Greg Wooledge wrote:

> On Fri, Feb 17, 2023 at 01:49:59PM -0500, The Wanderer wrote:
>
>> I can't speak to the new version, as I'm still running 3.3.17-7.1 on my
>> machine - but I can at least note that the man page from that older
>> version also explicitly says "a blank-separated or comma-separated list"
>> in the description for the '-o' option, but the given command line (with
>> a pipe for a separator) still works. (This may reflect only the same
>> thing that you said, above.)
>> 
>> It's entirely possible that this was an intentional change, to bring
>> things in line with the documentation, and/or even one required in order
>> to be in line with some appropriate specification.
> 
> Hmm... fair point.  POSIX says:
> 
>The application shall ensure that the format specification is a list of
>names  presented  as  a  single argument,  or -separated.
> 
> So the behavior of ps in bullseye is an extension of the POSIX requirement,
> and apparently only applies to the "AIX format specifiers", which are yet
> another extension.

FWIW, at least in the 3.3.17-7.1 version I have, the man page claims
that ps supports the POSIXLY_CORRECT environment variable. This type of
more-strict POSIX compliance, even when it means being less capable,
strikes me as the sort of thing that should probably be gated behind a
check for that flag.

> Nevertheless, this change definitely feels like a regression.  Scripts
> that are relying on the bullseye behavior, with full output formatting
> capability, will no longer work in bookworm.
> 
> I'm not using any such scripts, so I don't have anything to lose here,
> but the OP might seriously want to get a bug report in, at least to
> learn whether this is an intended regression, or an accidental one.

Agreed,

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: ps and AIX field descriptors

2023-02-17 Thread Greg Wooledge
On Fri, Feb 17, 2023 at 01:49:59PM -0500, The Wanderer wrote:
> I can't speak to the new version, as I'm still running 3.3.17-7.1 on my
> machine - but I can at least note that the man page from that older
> version also explicitly says "a blank-separated or comma-separated list"
> in the description for the '-o' option, but the given command line (with
> a pipe for a separator) still works. (This may reflect only the same
> thing that you said, above.)
> 
> It's entirely possible that this was an intentional change, to bring
> things in line with the documentation, and/or even one required in order
> to be in line with some appropriate specification.

Hmm... fair point.  POSIX says:

   The application shall ensure that the format specification is a list of
   names  presented  as  a  single argument,  or -separated.

So the behavior of ps in bullseye is an extension of the POSIX requirement,
and apparently only applies to the "AIX format specifiers", which are yet
another extension.

On bullseye:

unicorn:~$ ps -o '%U|%p|%a'
USER|PID|COMMAND
greg|   1010|bash
greg|2023595|ps -o %U|%p|%a
unicorn:~$ ps -o 'user|pid|args'
error: unknown user-defined format specifier "user|pid|args"
[...]

Nevertheless, this change definitely feels like a regression.  Scripts
that are relying on the bullseye behavior, with full output formatting
capability, will no longer work in bookworm.

I'm not using any such scripts, so I don't have anything to lose here,
but the OP might seriously want to get a bug report in, at least to
learn whether this is an intended regression, or an accidental one.



Re: ps and AIX field descriptors

2023-02-17 Thread The Wanderer
On 2023-02-17 at 13:21, debian-u...@howorth.org.uk wrote:

> Greg Wooledge  wrote:
> 
>> This sounds like a bug in procps that should be reported, if it
>> hasn't already.
> 
> It might be a bug if it disagreed with its documentation. But do the
> docs say anything about this feature? What they do say is that you
> should be able to use comma-separated field decriptions instead of
> space-separated I think. Is that true for the new version?

I can't speak to the new version, as I'm still running 3.3.17-7.1 on my
machine - but I can at least note that the man page from that older
version also explicitly says "a blank-separated or comma-separated list"
in the description for the '-o' option, but the given command line (with
a pipe for a separator) still works. (This may reflect only the same
thing that you said, above.)

It's entirely possible that this was an intentional change, to bring
things in line with the documentation, and/or even one required in order
to be in line with some appropriate specification.

It might be interesting to dig up the actual commit message from the
upstream development commit that made this change, and possibly also any
here's-what's-changed-in-the-new-version documentation (whether in
Debian or upstream), to see whether there's anything that sheds light on
whether this was intentional and if so what the reason was.

The answer might, at least, inform the approach to be taken in arguing
for the restoration of this functionality in a potential future version.

-- 
   The Wanderer

The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw



signature.asc
Description: OpenPGP digital signature


Re: ps and AIX field descriptors

2023-02-17 Thread debian-user
Greg Wooledge  wrote:
> This sounds like a bug in procps that should be reported, if it
> hasn't already. 

It might be a bug if it disagreed with its documentation. But do the
docs say anything about this feature? What they do say is that you
should be able to use comma-separated field decriptions instead of
space-separated I think. Is that true for the new version?



Re: ps and AIX field descriptors

2023-02-17 Thread Greg Wooledge
On Fri, Feb 17, 2023 at 09:20:34AM -0600, David Wright wrote:
> On Fri 17 Feb 2023 at 10:05:20 (+0300), Reco wrote:
> > So, to answer your question - currently the only way to restore the
> > behaviour you want is to patch procps and rebuild it.

Fabulous analysis.

> Or, depending on the context, you could of course restore
> the appearance of the output with sed:
> 
>   $ ps -eo '%p %C' | sed -e 's/\([^ ]\+\) /\1|/;'
>   PID|%CPU
> 1| 0.0
> 2| 0.0
> 3| 0.0
> 4| 0.0
> 6| 0.0
> [ … ]

Eww, GNUisms.  That aside, a workaround like this is ugly and should
not be needed.  This sounds like a bug in procps that should be reported,
if it hasn't already.



Re: ps and AIX field descriptors

2023-02-17 Thread David Wright
On Fri 17 Feb 2023 at 10:05:20 (+0300), Reco wrote:
> On Fri, Feb 17, 2023 at 07:46:23AM +0100, Andreas Leha wrote:
> > Now my question: How can I restore the previous behaviour that allowed
> > other than whitespace separators between fields?
> 
> diff -purw procps-3.3.17/ps/sortformat.c procps-4.0.2/src/ps/sortformat.c
> shows me that:
> 
> @@ -128,22 +127,24 @@ static const char *aix_format_parse(sf_n
>items = 0;
>walk = sfn->sf;
>/* state machine */ {
> -  int c;
> +  int c = *walk++;
>initial:
> -c = *walk++;
>  if(c=='%')goto get_desc;
>  if(!c)goto looks_ok;
>/* get_text: */
>  items++;
> -  get_more_text:
> +  get_more:
>  c = *walk++;
>  if(c=='%')goto get_desc;
> -if(c) goto get_more_text;
> +if(c==' ')goto get_more;
> +if(c) goto aix_oops;
>  goto looks_ok;
>get_desc:
>  items++;
>  c = *walk++;
> -if(c) goto initial;
> +if(c&!=' ') goto initial;
> +return _("missing AIX field descriptor");
> +  aix_oops:
>  return _("improper AIX field descriptor");
>looks_ok:
>  ;
> 
> If you look at "get_more" label, you'll notice that "old" version of
> procps (bullseye's) checked for any character after "%" block.
> "New" one (bookworm's) explicitly checks for space, and goes to
> "aix_oops" in any other case.
> 
> And there is no #ifdefs, no environment variable checks, no options
> etc.
> 
> So, to answer your question - currently the only way to restore the
> behaviour you want is to patch procps and rebuild it.

Or, depending on the context, you could of course restore
the appearance of the output with sed:

  $ ps -eo '%p %C' | sed -e 's/\([^ ]\+\) /\1|/;'
  PID|%CPU
1| 0.0
2| 0.0
3| 0.0
4| 0.0
6| 0.0
[ … ]

Cheers,
David.


Re: ps and AIX field descriptors

2023-02-16 Thread Reco
Hi.

On Fri, Feb 17, 2023 at 07:46:23AM +0100, Andreas Leha wrote:
> Now my question: How can I restore the previous behaviour that allowed
> other than whitespace separators between fields?

diff -purw procps-3.3.17/ps/sortformat.c procps-4.0.2/src/ps/sortformat.c
shows me that:

@@ -128,22 +127,24 @@ static const char *aix_format_parse(sf_n
   items = 0;
   walk = sfn->sf;
   /* state machine */ {
-  int c;
+  int c = *walk++;
   initial:
-c = *walk++;
 if(c=='%')goto get_desc;
 if(!c)goto looks_ok;
   /* get_text: */
 items++;
-  get_more_text:
+  get_more:
 c = *walk++;
 if(c=='%')goto get_desc;
-if(c) goto get_more_text;
+if(c==' ')goto get_more;
+if(c) goto aix_oops;
 goto looks_ok;
   get_desc:
 items++;
 c = *walk++;
-if(c) goto initial;
+if(c&!=' ') goto initial;
+return _("missing AIX field descriptor");
+  aix_oops:
 return _("improper AIX field descriptor");
   looks_ok:
 ;

If you look at "get_more" label, you'll notice that "old" version of
procps (bullseye's) checked for any character after "%" block.
"New" one (bookworm's) explicitly checks for space, and goes to
"aix_oops" in any other case.

And there is no #ifdefs, no environment variable checks, no options
etc.


So, to answer your question - currently the only way to restore the
behaviour you want is to patch procps and rebuild it.

Reco



ps and AIX field descriptors

2023-02-16 Thread Andreas Leha
Hi all,

I am facing a strange issue.  This command used to work

  ps -eo '%p|%C'

Now, on a debian testing machine only

  ps -eo '%p %C'

works.  Running

  ps -eo '%p|%C'

results in this error:

  error: improper AIX field descriptor

ps --version says 'ps from procps-ng 4.0.2'

Now my question: How can I restore the previous behaviour that allowed
other than whitespace separators between fields?

Thanks in advance!
Andreas