[issue12568] Add functions to get the width in columns of a character

2018-11-08 Thread STINNER Victor


STINNER Victor  added the comment:

I close the issue as WONTFIX.

--
resolution:  -> wont fix
stage: test needed -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2018-08-18 Thread Terry J. Reedy

Terry J. Reedy  added the comment:

I suggest reclosing this issue, for the same reason I suggested closure of 
#24665 in msg321291: abstract unicode 'characters' (graphemes) do not, in 
general, have fixed physical widths of 0, 1, or 2 n-pixel columns (or spaces).  
I based that fairly long message on IDLE's multiscript font sample as displayed 
on Windows 10.  In that context, for instance, the width of (fixed-pitch) East 
Asian characters is about 1.6, not 2.0, times the width of fixed-pitch Ascii 
characters.  Variable-width Tamil characters average about the same.  The exact 
ratio depends on the Latin font used.

I did more experiments with Python started from Command Prompt with code page 
437 or 65001 and characters 20 pixels high.  The Windows console only allows 
'fixed pitch' fonts.  East Asian characters, if displayed, are expanded to 
double width.

However, European characters are not reliably displayed in one column. The 
width depends on the both the font selected when a character is entered and the 
current font. The 20 latin1 characters in '¢£¥§©«®¶½ĞÀÁÂÃÄÅÇÐØß' usually 
display in 20 columns.  But if they are entered with the font set to MSGothic, 
the '§' and '¶' are each displayed in the middle of 2 columns, for 22 total.  
If the font is changed to MSGothic after entry, the '§' and '¶' are shifted 1/2 
column right to overlap the following '©' or '½' without changing the total 
width.  Greek and Cyrillic characters also sometimes take two columns.

I did not test whether the font size (pixel height) affects horizontal column 
spacing.

--
stage:  -> test needed
versions: +Python 3.8 -Python 3.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2018-08-18 Thread Bian Jiaping


Change by Bian Jiaping :


--
nosy: +bianjp

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2018-01-29 Thread Robert Booth

Change by Robert Booth :


--
nosy: +ishigoya

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-07-23 Thread Socob

Changes by Socob <206a8...@opayq.com>:


--
nosy: +Socob

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-07-13 Thread Guillaume Sanchez

Guillaume Sanchez added the comment:

Hello,

I come from bugs.python.org/issue30717 . I have a pending PR that needs review 
( https://github.com/python/cpython/pull/2673 ) adding a function that breaks 
unicode strings into grapheme clusters (aka what one would intuitively call "a 
character"). It's based on the grapheme cluster breaking algorithm from TR29.

Let me know if this is of any relevance.

Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[]

--
nosy: +Guillaume Sanchez

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-07-03 Thread STINNER Victor

STINNER Victor added the comment:

You need users who use CJK and understand locale issues especially the width of 
characters. Ask maybe Xiang Zhang and Naoki INADA?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com




[issue12568] Add functions to get the width in columns of a character

2017-07-03 Thread STINNER Victor

STINNER Victor added the comment:

> At least two other issues depend on this: issue17048 and issue24665.

I removed the dependency from bpo-24665 (CJK support for textwrap) to this 
issue, since its current PR uses unicodedata.east_asian_width(), not the C 
function wcswidth().

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-07-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

At least two other issues depend on this: issue17048 and issue24665.

If Victor lost interest in this issue I take it. I'm going to push at least 
imperfect solution which may be improved in time.

--
resolution: rejected -> 
stage: resolved -> 
status: closed -> open
versions: +Python 3.7 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-07-01 Thread R. David Murray

R. David Murray added the comment:

Interestingly, this just came up again in issue 30717.

--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2017-06-27 Thread STINNER Victor

STINNER Victor added the comment:

Since we failed to agree on this feature, I close the issue.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2015-11-26 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

I think this function would be very useful in many parts of interpreter core 
and standard library. From displaying tracebacks to formatting helps.

Otherwise we are doomed to implement imperfect variants in multiple places.

--
resolution: out of date -> 
status: closed -> open
versions: +Python 3.6 -Python 3.3

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2015-03-18 Thread STINNER Victor

STINNER Victor added the comment:

Since no consensus was found on the definition of the function, and this issue 
has no activity since 2 years, I close the issue as out of date.

--
resolution:  - out of date
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2013-02-02 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
stage:  - patch review
type:  - enhancement

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2013-02-02 Thread Terry J. Reedy

Terry J. Reedy added the comment:

In this part of width.py,
w = unicodedata.east_asian_width(c)
if c == 'A':
# ambiguous
raise ValueError(ambiguous character %x % (ord(c)))

I presume that 'c' should be 'w'.

--
nosy: +terry.reedy

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-19 Thread STINNER Victor

STINNER Victor victor.stin...@gmail.com added the comment:

 Martin: I agree that there are going to be cases where it is not
 correct because the terminal does something strange, but what we
 need is something that gets as close as possible to what the
 terminal is likely to be doing

Can't we expose wcswidth() as locale.strwidth() with a recipe explaining how to 
use unicodedata to get a correct result? At least until everyone implements 
correctly Unicode and Unicode stops evolving? :-)

--

For unicodedata, a function to get the width of a string would be more 
convinient than unicodedata.east_asian_width():

 import unicodedata
 unicodedata.east_asian_width('abc')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: need a single Unicode character as parameter
 'abc'.ljust(unicodedata.east_asian_width(' '))
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'str' object cannot be interpreted as an integer

The function posted in msg155361 looks like east_asian_width() is not enough to 
get the width in columns of a single character.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-19 Thread Serhiy Storchaka

Serhiy Storchaka storch...@gmail.com added the comment:

Has anyone tested wcswidth on FreeBSD, old Solaris? With non-utf8 locales?

--
nosy: +storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-16 Thread Éric Araujo

Éric Araujo mer...@netwok.org added the comment:

eo

--
nosy: +benjamin.peterson, eric.araujo, lemburg

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-16 Thread Éric Araujo

Changes by Éric Araujo mer...@netwok.org:


--
Removed message: http://bugs.python.org/msg156059

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-11 Thread poq

poq p...@gmx.com added the comment:

Martin,

I agree that wcswidth is incorrect with respect to Unicode. However I don't 
think that's relevant at all. Python should only try to match the behaviour of 
the terminal.

Since terminals do slightly different things, trying to match them exactly - in 
all cases, on all systems - is virtually impossible. But AFAICT wcwidth should 
match the terminal behaviour on nearly all modern systems, so it makes sense to 
expose it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-11 Thread Nicholas Cole

Nicholas Cole nicholas.c...@gmail.com added the comment:

Poq: I agree.  Guessing from the Unicode standard is going to lead to users 
having to write some complicated code that people are going have to reinvent 
over and over, and is not going to be accurate with respect to curses.  I'd 
favour exposing wcwidth.

Martin: I agree that there are going to be cases where it is not correct 
because the terminal does something strange, but what we need is something that 
gets as close as possible to what the terminal is likely to be doing (the 
Unicode standard itself is not really the issue for curses stuff).  So whether 
it is called wcwidth or wcswidth I don't really mind, but I think it would be 
useful.

The other alternative is to include one of the other ideas that have been 
mentioned in this thread as part of the library, I suppose, so that people 
don't have to keep reinventing the wheel for themselves.  

The one thing I really don't favour is shipping something that supports wide 
characters, but gives the users no way of guessing whether or not that is what 
they are printing, because that is surely going to break a lot of applications.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Nicholas Cole

Nicholas Cole nicholas.c...@gmail.com added the comment:

Martin: sorry to be completely dense, but I can't get this to work properly 
with the python3.3a1 build.  Could you post some example code?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Please see the attached width.py for an example

--
Added file: http://bugs.python.org/file24773/width.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread poq

poq p...@gmx.com added the comment:

Martin, I think you meant to write if w == 'A':.
Some very common characters have ambiguous widths though (e.g. the Greek 
alphabet), so you can't just raise an error for them.

http://unicode.org/reports/tr11/ says:
Ambiguous characters occur in East Asian legacy character sets as wide 
characters, but as narrow (i.e., normal-width) characters in non-East Asian 
usage.

So in practice applications can treat ambiguous characters as narrow by 
default, with a user setting to use legacy (wide) width.

As Tom pointed out there are also a bunch of zero width characters, and 
characters with special formatting like tab, soft hyphen, ...

--
nosy: +poq

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

I would encourage you to look at the Perl CPAN module Unicode::LineBreak,
which fully implements tr11.  It includes Unicode::GCString, a class
that has a columns() method to determine the print columns.  This is very
fancy in the case of Asian widths, but of course there are many other cases too.

If you'd like, I can show you a program that uses these, a rewrite the
standard Unix fmt(1) filter that works properly on Unicode column widths.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Nicholas Cole

Nicholas Cole nicholas.c...@gmail.com added the comment:

Marting and Poq: I think the sample code shows up a real problem. Ambiguous 
characters according to unicode may be rendered by curses in different ways.

Don't we need a function that actually reports how curses is going to print a 
given string, rather than just reporting what the unicode standard says?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Martin, I think you meant to write if w == 'A':.
 Some very common characters have ambiguous widths though (e.g. the Greek 
 alphabet), so you can't just raise an error for them.

That's precisely why I don't think this should be in the library, but
in the application. Application developers who need that also need
to concern themselves with the border cases, and decide on how
they need to resolve them.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 I would encourage you to look at the Perl CPAN module Unicode::LineBreak,
 which fully implements tr11.

Thanks for the pointer!

 If you'd like, I can show you a program that uses these, a rewrite the
 standard Unix fmt(1) filter that works properly on Unicode column widths.

I believe there can't be any truly proper implementation, as you
can't be certain how the terminal will handle these itself. In any
case, anybody who is interested in contributing a patch should also
be capable of understanding the source of Unicode::LineBreak.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Martin v. L=C3=B6wis mar...@v.loewis.de added the comment:

 Martin, I think you meant to write if w =3D=3D 'A':.
 Some very common characters have ambiguous widths though (e.g. the Greek =
alphabet), so you can't just raise an error for them.

That's precisely why I don't think this should be in the library, but
in the application. Application developers who need that also need
to concern themselves with the border cases, and decide on how
they need to resolve them.

The column-width of a string is not an application issue.  It is
well-defined by Unicode.  Again, please see how we've done it in 
Perl, where tr11 is fully implemented.  The columns() method from 
Unicode::GCString always gives the right answer per the Standard for
any string, even what you are calling ambiguous ones.

This is not an applications issue -- at all.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 Don't we need a function that actually reports how curses is going to
 print a given string, rather than just reporting what the unicode
 standard says?

That may be useful, but

a) this patch doesn't provide that, and
b) it may not actually possible to implement such a change in a portable
way as there may be no function exposed by the curses implementation
that provides this information.

To put my closing this issue differently: I rejected the patch that
Victor initially submitted. If anybody wants to contribute a different
patch that uses a different strategy, please submit a new issue.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Antoine Pitrou

Changes by Antoine Pitrou pit...@free.fr:


--
resolution: works for me - 
status: closed - open

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

Martin v. L=C3=B6wis mar...@v.loewis.de added the comment:

 I would encourage you to look at the Perl CPAN module Unicode::LineBreak,
 which fully implements tr11.

Thanks for the pointer!

 If you'd like, I can show you a program that uses these, a rewrite the
 standard Unix fmt(1) filter that works properly on Unicode column widths.

I believe there can't be any truly proper implementation, as you
can't be certain how the terminal will handle these itself. 

Hm.  I think we may not be talking about the same thing after all.

If we're talking about the Curses library, or something similar,
this is not the same.  I do not think Curses has support for 
combining characters, right to left text, wide characters, etc.

However, Unicode does, and defines the column width for those.

I have an illustration of what this looks like in the picture
in the very last recipe, #44, in 

http://training.perl.com/scripts/perlunicook.html

That is what I have been talking about by print widths.  It's running
in a Mac terminal emulator, and unlike the HTML which grabs from too
many fonts, the terminal program does the right thing with the widths.

Are we talking about different things?

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread poq

poq p...@gmx.com added the comment:

It seems this is a bit of a minefield...

GNOME Terminal/libvte has an environment variable (VTE_CJK_WIDTH) to override 
the handling of ambiguous width characters. It bases its default on the locale 
(with the comment 'This is basically what GNU libc does').

urxvt just uses system wcwidth.

Xterm uses some voodoo to decide between system wcwidth and mk_wcwidth(_cjk): 
http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

I think the simplest solution is to just expose libc's wc(s)width. It is widely 
used and is most likely to match the behaviour of the terminal.

FWIW I wrote a little script to test the widths of all Unicode characters, and 
came up with the following logic to match libvte behaviour:

def wcwidth(c, legacy_cjk=False):
if c in u'\t\r\n\10\13\14': raise ValueError('character %r has no 
intrinsic width' % c)
if c in u'\0\5\7\16\17': return 0
if u'\u1160' = c = u'\u11ff': return 0 # hangul jamo
if unicodedata.category(c) in ('Mn', 'Me', 'Cf') and c != u'\u00ad': 
return 0 # 00ad = soft hyphen
eaw = unicodedata.east_asian_width(c)
if eaw in ('F', 'W'): return 2
if legacy_cjk and eaw == 'A': return 2
return 1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Tom: I don't think Unicode::GCString implements UAX#11 correctly (but this is 
really out of scope of this issue). In particular, it contains an ad-hoc 
decision to introduce the EA_Z east-asian width that UAX#11 doesn't talk about.

In most cases, it's probably reasonable to introduce this EA_Z feature. 
However, there are some significant deviations from UAX#11 here:
- combining characters are given EA_Z in sombok/data/custom.pl, even though 
UAX#11 assigns A or N. UAX#11 points out that the advance width depends on 
whether or not the terminal performs character combination or not. It's not 
clear whether Unicode::GCString aims for strict UAX#11, or advance width.
- control characters are also given EA_Z, even though UAX#11 gives them EA_N. 
In this case, it's neither UAX#11 width nor advance width since control 
characters will have various effects on the terminal (in particular for the tab 
character)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-10 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

poq: I still remain opposed to exposing wcswidth, since it is just as incorrect 
as any of the other solutions that people have circulated. I could agree to it 
if it was called wcswidth, making it clear that it does whatever the C 
library does, with whatever semantics the C library wants to give to it (and an 
availability that depends on whether the C library supports it or not). 

That would probably cover the nurses use cases, except that it is not only 
incorrect with respect to Unicode, but also incorrect with respect to what the 
terminal may be doing. I guess users would use it anyway.

For Python's internal use, I could accept using the sombok algorithm. I 
wouldn't expose it, since it again would trick people into believing that it 
was correct in some sense. Perhaps calling it sombok_width might allow for 
exposing it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-09 Thread Nicholas Cole

Nicholas Cole nicholas.c...@gmail.com added the comment:

Could we have an update on the status of this? I ask because if 3.3 is going to 
(finally) fix unicode for curses, it would be really nice if it were possible 
to calculate the width of what's being displayed!  It looks as if there was 
never quite agreement on the proper API

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2012-03-09 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

Nicholas: I consider this issue fixed. There already *is* any API to compute 
the width of a character. Closing this as works for me.

--
resolution:  - works for me
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-18 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

 I'm -1 on using wcswidth, though.
 
 When you write text into a console on Linux (e.g. displayed by
 gnome-terminal or konsole), I suppose that wcswidth() can be used to
 compute the width of a line. It would help to fix #2382.
 
 Or do you think that wcswidth() gives the wrong result for this use
 case?

No, I think that using it is not necessary. If you want to compute the
width of a line, use unicodedata.east_asian_width. And yes, wcswidth
may sometimes produce incorrect results (although it's probably
correct most of the time).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-18 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-17 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 I'm -1 on using wcswidth, though.

When you write text into a console on Linux (e.g. displayed by gnome-terminal 
or konsole), I suppose that wcswidth() can be used to compute the width of a 
line. It would help to fix #2382.

Or do you think that wcswidth() gives the wrong result for this use case?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-14 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

I think the WideCharToMultibyte approach is just incorrect.

I'm -1 on using wcswidth, though. We already have unicodedata.east_asian_width, 
which implements http://unicode.org/reports/tr11/ 
The outcomes of this function are these:
- F: full-width, width 2, compatibility character for a narrow char
- H: half-width, width 1, compatibility character for a narrow char
- W: wide, width 2
- Na: narrow, width 1
- A: ambiguous; width 2 in Asian context, width 1 in non-Asian context
- N: neutral; not used in Asian text, so has no width. Practically, width can 
be considered as 1

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-14 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

 Martin v. Löwis mar...@v.loewis.de added the comment:

 I think the WideCharToMultibyte approach is just incorrect.

 I'm -1 on using wcswidth, though. 

Like you, I too seriously question using wcswidth() for this at all:

The wcswidth() function either shall return 0 (if pwcs points to a
null wide-character code), or return the number of column positions
to be occupied by the wide-character string pointed to by pwcs, or
return -1 (if any of the first n wide-character codes in the wide-
character string pointed to by pwcs is not a printable wide-
character code).

I would be willing to bet (a small amount of) money it does not correctly
inplmented Unicode print widths, even though one would certainly *think* it
does according to this:

 The wcswidth() function determines the number of column positions
 required for the first n characters of pwcs, or until a null wide
 character (L'\0') is encountered.

There are a bunch of interesting cases I would want it tested against.

 We already have unicodedata.east_asian_width, which implements 
 http://unicode.org/reports/tr11/ 

 The outcomes of this function are these:
 - F: full-width, width 2, compatibility character for a narrow char
 - H: half-width, width 1, compatibility character for a narrow char
 - W: wide, width 2
 - Na: narrow, width 1
 - A: ambiguous; width 2 in Asian context, width 1 in non-Asian context
 - N: neutral; not used in Asian text, so has no width. Practically, width can 
 be considered as 1

Um, East_Asian_Width=Ambiguous (EA=A) isn't actually good enough for this.
And EA=N cannot be consider 1, either.

For example, some of the Marks are EA=A and some are EA=N, yet how may
print columns they take varies.  It is usually 0, but can be 1 at the start
of the file/string or immediately after a linebreak sequence.  Then there
are things like the variation selectors which are never anything.

Now consider the many \pC code points, like 

U+0009  CHARACTER TABULATION
U+00AD  SOFT HYPHEN 
U+200C  ZERO WIDTH NON-JOINER
U+FEFF  ZERO WIDTH NO-BREAK SPACE
U+2062  INVISIBLE TIMES

A TAB is its own problem but SHY we know is only width=1 immediately
before a linebreak or EOF, and ZWNJ and ZWNBSP are both certainly
width=0.  So are the INVISIBLE * code points.

Context:

Imagine you're trying to format a string so that it takes up exactly 20
columns: you need to know how many spaces to pad it with based on the
print width.  That is what the #12568 is needing
to do, and you have to do much more than East Asian Width properties.

I really do think that what #12568 is asking for is to have the equivalent
of the Perl Unicode::GCString's columns() method, and that you aren't going
to be able to handle text alignment of Unicode with anything that is much
less of that.  After all, #12568's title is Add functions to get the width
in columns of a character.  I would very much like to compare what
columns() thinks compared with what wcswidth() thinks.  I bet wcswidth() is
very simple-minded at best.

I may of course be wrong.

--tom

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

 There might be something you can steal from  ...

I don't think that Python should reinvent the wheel. We should just reuse 
wcswidth().

Here is a simple patch exposing wcswidth() function as locale.width().

Example:

 import locale
 text = '\u3042\u3044\u3046\u3048\u304a'
 len(text)
5
 locale.width(text)
10
 locale.width(' ')
1
 locale.width('\U0010abcd')
1
 locale.width('\uDC80')
Traceback (most recent call last):
  File stdin, line 1, in module
locale.Error: the string is not printable
 locale.width('\U0010')
Traceback (most recent call last):
  File stdin, line 1, in module
locale.Error: the string is not printable

I don't think that we need locale.width() on Windows because its console has 
already bigger issues with Unicode: see issue #1602. If you want to display 
correctly non-ASCII characters on Windows, just avoid the Windows console and 
use a graphical widget.

--
keywords: +patch
Added file: http://bugs.python.org/file23401/locale_width.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-10-13 Thread STINNER Victor

STINNER Victor victor.stin...@haypocalc.com added the comment:

Oh, unicode_width.patch of issue #2382 implements the width on Windows using:

WideCharToMultiByte(CP_ACP, 0, buf, len, NULL, 0, NULL, NULL);

It computes the length of byte string encoded to the ANSI code page. I don't 
know if it can be seen as the width of a character string in the console...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-08-11 Thread Tom Christiansen

Tom Christiansen tchr...@perl.com added the comment:

I can attest that being able to get the columns of a grapheme cluster is very 
important for printing, because you need this to do correct linebreaking.  
There might be something you can steal from 

   http://search.cpan.org/perldoc?Unicode::GCString
   http://search.cpan.org/perldoc?Unicode::LineBreak

which implements UAX#14 on linebreaking and UAX#11 on East Asian widths.  

I use this in my own code to help format Unicode strings my columns or lines.  
The right way would be to build this sort of knowledge into string.format(), 
but that is much harder, so an intermediary library module seems good enough 
for now.

--
nosy: +tchrist

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-07-21 Thread Ezio Melotti

Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-07-16 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

In the #2382 code, how is the Windows case supposed to work? Also, what about 
systems that don't have wcswidth? IOW, the patch appears to be incorrect.

I like the #6755 approach better, except that it shouldn't be using hard-coded 
tables, but instead integrate with Python's version of the UCD. In addition, it 
should use an accepted, published strategy for determining the width, 
preferably coming from the Unicode consortium.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-07-15 Thread Nicholas Cole

Changes by Nicholas Cole nicholas.c...@gmail.com:


--
nosy: +Nicholas.Cole

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-07-15 Thread Christian Hofstaedtler

Changes by Christian Hofstaedtler ch+pythonb...@zeha.at:


--
nosy: +zeha

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12568] Add functions to get the width in columns of a character

2011-07-14 Thread STINNER Victor

New submission from STINNER Victor victor.stin...@haypocalc.com:

Some characters take more than one column in a terminal, especially CJK 
(chinese, japanese, korean) characters. If you use such character in a terminal 
without taking care of the width in columns of each character, the text 
alignment can be broken. Issue #2382 is an example of this problem.

#2382 and #6755 have patches implementing such function:
- unicode_width.patch of #2382 adds unicode.width() method
- ucs2w.c of #6755 creates a new ucs2w module with two functions: unichr2w() 
(width of a character) and ucs2w() (width of a string)

Use test_ucs2w.py of #6755 to test these new functions/methods.

--
components: Unicode
messages: 140376
nosy: haypo, inigoserna
priority: normal
severity: normal
status: open
title: Add functions to get the width in columns of a character
versions: Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue12568
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com