Andrew Black wrote:
[snipped descriptive summary]
...
Looking at the output (as it stands now), I believe the parsers could be
more robust if they ignored all lines that started with whitespace or
where the first word was in all caps.  Alternately, they could start
processing after the first line in all caps, and stop processing before
the second line in all caps.  I suppose a third option would be to alter
the exec utility so it could produce the output desired, rather than
doing a post-processing step.  Thoughts?

IMO, since exec already has all the data the post-processing stage
works so hard to compute it might as well write it out in a format
that's easy to parse by the script you're talking about. I.e., the
summary would be it. The other script that I was referring to is
the cross build view generation script that actually needs the
individual examples, locales, and tests, and doesn't care (yet)
about the summary.

So if take this approach we need to "standardize" both the output
format for the individual programs as well as the summary. The
summary format currently looks like this:

PROGRAM SUMMARY:
  Non-zero exit status:      %6u
  Signalled:                 %6u
  Compiler warnings:         %6u
  Linker warnings:           %6u
  Runtime warnings:          %6u
  Assertions:                %6u
  Failed assertions:         %6u

I'm also thinking about enhancing it to break down the signals,
i.e., to have something like (with rows only for non-zero totals):

  Signalled:                 %6u
    ABRT:                    %6u
    BUS:                     %6u
    SEGV:                    %6u

so we'd need some convention to separate the specific signals from
the other totals (such as the extra leading space here). Another
option would be for the script to specify the exact format of the
summary, via a command line option or some such mechanism. We'd
need to enhance exec to accept this option and to parse the format
string (each total would need to be represented by its own unique
directive, e.g., the total number of programs exited with a non
zero status by %E, the total number of programs exited with a
signal by %S, etc.) The script could then specify the precise
format of the summary section.

I don't think this latter approach would be too difficult to
implement, but I'm not sure if it's not overkill.

Martin



--Andrew Black

Martin Sebor wrote:
I did some work on the exec utility recently. One of the enhancements
I added was a summary section at the end of the output. It seems that
this had the unexpected consequence to the scripts that interpret the
output of the utility, namely that the new summary is taken as
additional tests, examples, or locales and included in the generated
reports.

It's clear that the scripts aren't robust enough to deal with these
types of changes. I'd like to find a way to make them more robust so
that we can safely enhance the exec utility's output in the future
without causing this sort of adverse fallout. To make this possible
I think we need to enhance the output of exec to help the scripts
disambiguate ordinary output (such as the "PROGRAM SUMMARY" label
I added) from the output relevant to the scripts (i.e., the examples,
tests, and locales).

What should this output format look like? Would adding the number
of each program in front its name be sufficient? I.e., replacing

  NAME                  STATUS WARN ASSERTS FAILED PERCNT ...
  sanity_test.sh             0    0      46      0   100%
  af_ZA.ISO-8859-1.sh        0    0      16      0   100%
  ar_AE.ISO-8859-6.sh        0    0      16      0   100%
  ar_BH.ISO-8859-6.sh        0    0      16      0   100%

with

  ##  NAME                  STATUS WARN ASSERTS FAILED PERCNT ...
   1. sanity_test.sh             0    0      46      0   100%
   2. af_ZA.ISO-8859-1.sh        0    0      16      0   100%
   3. ar_AE.ISO-8859-6.sh        0    0      16      0   100%
   4. ar_BH.ISO-8859-6.sh        0    0      16      0   100%

That way scripts written to process this type of output would look
for lines matching the RE pattern "^ *[1-9][0-9]*\. [^ ][^ ]*  *"
to reliably distinguish find programs (examples, locales, tests)
in the log files from other kinds of output. The big assumption
here is that there would be no other lines matching that RE in
the logs, or at least not in the immediate vicinity of the output
from exec.

I would also like to make an additional enhancement to help scripts
to extract the relevant exec output from the rest of the contents
of the log and differentiate between examples, locales, and tests.
I'm thinking replacing the label "NAME" with "EXAMPLE NAME", "LOCALE
NAME", and "TEST NAME" and ending the output with something like
"END EXAMPLES" etc. should do it.

Any other/better ideas?

Martin


Reply via email to