On Tue, Feb 15, 2022, 6:57 PM Philippe Mathieu-Daudé <[email protected]>
wrote:

> On 16/2/22 00:53, John Snow wrote:
> > On Tue, Feb 15, 2022 at 5:55 PM Eric Blake <[email protected]> wrote:
> >>
> >> On Tue, Feb 15, 2022 at 05:08:50PM -0500, John Snow wrote:
> >>>>>> print(enboxify(msg, width=72, name="commit message"))
> >>> ┏━ commit message
> ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
> >>> ┃ enboxify() takes a chunk of text and wraps it in a text art box that
> ┃
> >>> ┃  adheres to a specified width. An optional title label may be given,
> ┃
> >>> ┃  and any of the individual glyphs used to draw the box may be
> ┃
> >>
> >> Why do these two lines have a leading space,
> >>
> >>> ┃ replaced or specified as well.
>  ┃
> >>
> >> but this one doesn't?  It must be an off-by-one corner case when your
> >> choice of space to wrap on is exactly at the wrap column.
> >>
> >
> > Right, you're probably witnessing the right-pad *and* the actual space.
> >
> >>>
> ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
> >>>
> >>> Signed-off-by: John Snow <[email protected]>
> >>> ---
> >>>   python/qemu/utils/__init__.py | 58
> +++++++++++++++++++++++++++++++++++
> >>>   1 file changed, 58 insertions(+)
>
> >>> +    def _wrap(line: str) -> str:
> >>> +        return os.linesep.join([
> >>> +            wrapped_line.ljust(lwidth) + suffix
> >>> +            for wrapped_line in textwrap.wrap(
> >>> +                    line, width=lwidth, initial_indent=prefix,
> >>> +                    subsequent_indent=prefix,
> replace_whitespace=False,
> >>> +                    drop_whitespace=False, break_on_hyphens=False)
> >>
> >> Always nice when someone else has written the cool library function to
> >> do all the hard work for you ;)  But this is probably where you have
> the off-by-one I called out above.
> >>
> >
> > Yeah, I just didn't want it to eat multiple spaces if they were
> > present -- I wanted it to reproduce them faithfully. The tradeoff is
> > some silliness near the margins.
> >
> > Realistically, if I want something any better than what I've done
> > here, I should find a library to do it for me instead -- but for the
> > sake of highlighting some important information, this may be
> > just-enough-juice.
>
> 's/^┃  /┃ /' on top ;D
>

I have to admit that this function is actually very fragile. Last night, I
did some reading on unicode and emoji encodings and discovered that it's
*basically impossible* to predict the "visual width" of a sequence of
unicode codepoints.

So, this function as written will only really work if we stick to
single-codepoint glyphs that can be rendered 1:1 in a monospace font.

I could probably improve it to work with "some" (but certainly not all)
wide glyphs and emoji, but it's a very complex topic and far outside my
specialty. Support for multi-codepoint narrow/halfwidth glyphs is also an
issue. (This affects some Latin characters outside of ascii if they are
encoded using combining codepoints.)

(See https://hsivonen.fi/string-length/ ... It's nasty.)

So I must admit that this function has some very serious limitations to it.
I want to explain why I wrote it, though.

First: Tracebacks make people's eyes cross over. It's a very long sequence
of mumbo jumbo that most people don't read, because it's program debug
information. I don't blame them. Setting apart the error summary visually
is a helpful tool for drawing one's eyes to the most critical pieces of
information.

Second: In my AQMP library, I use the ascii vertical bar | as a left-hand
border decoration to provide a kind of visual quoting mechanism to
illustrate in the logfile which subsequent confusing lines of jargon belong
to the same log entry. I really like this formatting mechanism, but...

Third: If a line of text becomes so long that it wraps in your terminal,
the visual quote mechanism breaks, making the output messy and hard to
read. Forcibly re-wrapping the text in a virtual box is a necessary
mechanism to preserve readability in this circumstance - the lines from
qemu-img et al may be much wider than your terminal column width.

And so, I drew a box instead of just a left border, because I needed to
re-wrap the text anyway. Visually, I believed it to help explain that the
output was being re-formatted to fit in a certain dimensionality.
Unfortunately, it's inadequate.

So ... what to do.

(1) I can just remove the right margin decoration and call the function
visual_quote or something. If any of the lines get too "long" because of
emoji/日本語, it MAY break the indent line, but occasional uses of one or two
wide characters probably won't cause wrapping that breaks the "visual quote
line" on a terminal with at least 85 columns. Essentially it'd still be
broken, but without a solid right border it'd be harder to notice *small*
breakages.

(2) If there is a genuine interest in using visual highlighting techniques
to make iotest failures easier to diagnose (and making sure it is properly
multilingual), I could use the urwid helper library to estimate visual text
width to make drawing terminal boxes more resilient than what I could do on
my own power. Downside is a new third party dependency. I already use urwid
for the aqmp tui that we're working on, but it's remained an optional
dependency so far.

(3) I can take a swing at improving this text decoration utility and having
it account for the most basic cases. East Asian language support is a low
hanging fruit, though I have only rudimentary familiarity with Hangul. (And
virtually no exposure to Thai or other south-eastern Asian scripts.)

(4) Just leave it alone for now, don't you have IDE/FDC patches to work on?

Sigh. The punishment for trying to do something nice is swift.

--js

>

Reply via email to