22. 3. 2020 v 20:02 Chris Angelico <ros...@gmail.com>: > > When using textwrap.fill() or friends, setting break_long_words=False > without also setting break_on_hyphens=False has the very strange > behaviour that a long hyphenated word will still be wrapped. I > discovered this as a very surprising result when trying to wrap a > paragraph that contained a URL, and wanting the URL to be kept > unchanged: > [...] > > Second point, and related to the above. The regex that defines break > points, as found in the source code, is: > > wordsep_re = re.compile(r''' > ( # any whitespace > %(ws)s+ > | # em-dash between words > (?<=%(wp)s) -{2,} (?=\w) > | # word, possibly hyphenated > %(nws)s+? (?: > # hyphenated word > -(?: (?<=%(lt)s{2}-) | (?<=%(lt)s-%(lt)s-)) > (?= %(lt)s -? %(lt)s) > | # end of word > (?=%(ws)s|\Z) > | # em-dash > (?<=%(wp)s) (?=-{2,}\w) > ) > )''' % {'wp': word_punct, 'lt': letter, > 'ws': whitespace, 'nws': nowhitespace}, > > It's built primarily out of small matches with long assertions, eg > "match a hyphen, as long as it's preceded by two letters or a letter > and a hyphen". What I want to do is create a *negative* assertion: > specifically, to disallow any breaking between "\b[a-z]+://" and "\b", > which will mean that a URL will never be broken ("https://.........." > until the next whitespace boundary). Regex assertions of this form > have to be fixed lengths, though, so as described, this isn't > possible. Regexperts, any ideas? How can I modify this to never break > inside a URL? > [...] > > ChrisA > Hi, I might be missing something obvious, but it seems to me, that the regex library might help with regard to your originally presented approach: https://pypi.org/project/regex/ https://bitbucket.org/mrabarnett/mrab-regex/
It supports variable-length lookaround assertions (beyond many other extra features); You could make textwrap or other code use it with a tweaked regex pattern. However, I can't say whether it is sufficient in order to achieve the needed functionality. Regards, vbr -- https://mail.python.org/mailman/listinfo/python-list