Re: Confusing textwrap parameters, and request for RE help

Chris Angelico Tue, 24 Mar 2020 14:33:29 -0700

On Wed, Mar 25, 2020 at 8:04 AM DL Neil via Python-list
<python-list@python.org> wrote:
>
> On 23/03/20 8:00 AM, Chris Angelico wrote:
> > When using textwrap.fill() or friends, setting break_long_words=False
> > without also setting break_on_hyphens=False has the very strange
> > behaviour that a long hyphenated word will still be wrapped. I
> > discovered this as a very surprising result when trying to wrap a
> > paragraph that contained a URL, and wanting the URL to be kept
> > unchanged:
>
> I dropped textwrap years ago. Which policy likely shows my/my
> applications' bias.
>
> Today it feels like an anachronism because it is comes from the era of
> fixed-width fonts and line-lengths denominated in characters*. The issue
> is that it was designed to re-define 'white space' and to enable the
> conversion of text 'wrapped' in one (fixed) format, to suit another.
> With the arrival?predominance of proportional-width fonts, the skills of
> hyphenation have started to go the way of cursive hand-writing
> [substitute any number of grumpy, old man regrets/favorite complaints,
> here].


Terminals still use somewhat-fixed-width fonts, so it's still
reasonably appropriate - until there's some sort of indentation level
escape code.

> Your idea of sub-classing (as I'm sure YOU know, textwrap is but a
> convenience-function) struck me as clever-thinking! Could textwrap's
> 'final format' be caught just before 'return', enabling a post-process
> to undo anything textwrap has done, and (re-)format the URLs to spec, or
> to treat textwrap's output as a template and 'inject' the URL
> appropriately? If not a sub-class, a decorator?

Hmmmmmm. Very VERY interesting idea, and one I hadn't thought of. Thank you.

> My idea (being more simple-minded than you!), would be to partition the
> text (yes, am alluding to the Python str.method):
> - textwrap the 'early text',
> - treat the URL as a string using the required convention,
> - textwrap the 'later text', and
> - str.join() the three components/partitions afterwards.
>
> Both likely 'force' the URL to occupy a line of its own, and thus create
> some odd-looking results!

The use-case here is a Twitter client I'm building. It works in the
terminal. I would very much like NOT to build any sort of GUI for it.
Since tweets often contain URLs, it's important to render them
correctly. The display style is to have "@Username: " at the start of
the first line, and the same number of spaces on subsequent lines,
which creates a very readable display. (Also, quoting retweets show
the original tweet indented underneath, giving a bit more
indentation.) Unfortunately, forcing every URL onto its own line would
make the display a bit too vertical for my liking; often there's a
very short URL (eg someone's Twitch link) that doesn't want to be
split across.

Actually the ultimate solution would be the not-yet-standardized
protocol for effectively showing hypertext on the console, where the
abbreviated text could actually be made clickable as the full text.
But that requires more help from the terminal emulator, unless I'm
just misreading the examples (it's supposed to work in gnome-terminal
but I couldn't get it to behave). Or alternatively, as mentioned, a
way to say to the terminal emulator, "please indent this text by at
least this amount" (where the amount would most likely be specified as
a number of characters, but in theory could be millimeters instead).

For now, though, all I can do is rewrap URLs. And at the moment, what
I've done is just block all long words from being wrapped AND block
words from being wrapped at hyphens, when really what I actually want
is to say "https://......."; becomes unbreakable, and leave all the
rest unchanged. Hence the question about the regex.

Currently, the regex splits a long line of text into a series of
words, including hyphen points. What I want is to assert "this ain't a
word split point, because the word starts [a-z]+:// and is thus a
URL". But that might be beyond the flexibility of REs.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Confusing textwrap parameters, and request for RE help

Reply via email to