My personal experience of the most common problematic substitutions by tools 
such as Outlook, Word & some web tools:

  1.  Double Quotes \u201c & \u201d “”
  2.  Single Quotes \u2018 & \u2019 ‘’
  3.  The m-hyphen \2013 –
  4.  Copyright © \xa9 and others, Registered ® \xae and trademark ™ \u2122
  5.  Some fractions e.g.  ½ ¼
  6.  Non-breaking spaces

From: David Mertz <me...@gnosis.cx>
Sent: 10 May 2020 18:33
To: Steven D'Aprano <st...@pearwood.info>
Cc: python-ideas <python-ideas@python.org>
Subject: [Python-ideas] Re: Improve handling of Unicode quotes and hyphens

On Sun, May 10, 2020 at 4:03 AM Steven D'Aprano 
<st...@pearwood.info<mailto:st...@pearwood.info>> wrote:
I think that David(?) may have a Vim or Emacs mode that allows him to
use Unicode chars as syntax?

I use the vim-conceal plugin: https://github.com/khzaw/vim-conceal.  I know 
that something similar exists for Emacs, but don't remember the name.  What 
this does though is not change anything about the underlying ASCII characters 
in the code, but rather it substitutes particular character sequences (perhaps 
in regex context) with other things, such as fancy Unicode characters.

So as typing goes, I still type e.g. the letter 'i' followed by the letter 'n' 
and a space, and the screen simply displayed the U+2208 (∈) character.  But on 
disk, and for Python, it's only still just 'in'.

On my own system, I've learned the Unicode code points for the common things 
like n-dashes and m-dashes that I use.  I actually don't know the vim shortcuts 
for other special things, although I probably should.  Still, the vim digraphs 
are always going to be fewer than all the Unicode code points, even if some 
useful ones are included (and somewhat mnemonic).  But indeed, entry of all 
those special characters is going to be more work than the characters directly 
on my keyboard, in any event.

>   6.  Change the error message "SyntaxError: invalid character in
>   identifier" to include which character and it's Unicode value so
>   that it becomes "SyntaxError: invalid character 0x201c " in
>   identifier" -
More informative error messages are good :-)

 I wouldn't mind messages that actually looked specifically for some of those 
common annoying auto-substitutions.  E.g.:

% python ~/tmp/wrongchar.py
  File "/home/dmertz/tmp/wrongchar.py", line 1
    x = 2014 – 2013
             ^
SyntaxError: invalid character in identifier

The hyphen really does look a lot like the n-dash that is on screen.  And I 
think that's one of those substitutions that word processors and email clients 
often do.  Maybe a collection of the top 20 such common substitutions with some 
fitting message.  I dunno "SyntaxError: invalid character U+2013 may be 
substitution of ASCII dash".

--
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/EWRT4EGBN4PMF5UX33FWWT3RZWYYEVSK/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to