[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread Christian Clauss

New submission from Christian Clauss ccla...@bluewin.ch:

BUGS: certain diacritical marks can and should be capitalized...
str.upper() does not .replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü'), etc.
str.lower() does not .replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü'), etc.
str.title() has the same problems plus it capitalizes the letter _after_ a 
diacritic. e.g. 'lüsai'.title() -- 'LÜSai' with a capitol 'S'
myUpper(), myLower(), myTitle() exhibit the correct behavior with a handful 
of diacritic marks.

def myUpper(inString):
return inString.upper().replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü')

def myLower(inString):
return inString.lower().replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü')

def myTitle(inString): # WARNING: converts all whitespace to a single space
returnValue = []
for theWord in inString.split():
returnValue.append(myUpper(theWord[:1]) + myLower(theWord[1:]))
return ' '.join(returnValue)

--
components: Unicode
messages: 158332
nosy: Christian.Clauss, ezio.melotti
priority: normal
severity: normal
status: open
title: Certain diacritical marks can and should be capitalized... e.g. ü -- Ü
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

It works fine if you use unicode.

--
nosy: +r.david.murray
resolution:  - invalid
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread Christian Clauss

Christian Clauss ccla...@bluewin.ch added the comment:

On Apr 15, 2012, at 4:43 PM, R. David Murray wrote:

 
 R. David Murray rdmur...@bitdance.com added the comment:
 
 It works fine if you use unicode.
 
 --
 nosy: +r.david.murray
 resolution:  - invalid
 stage:  - committed/rejected
 status: open - closed
 
 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue14587
 ___

What does it mean in this context to use unicode??
===
In Idle... 
===
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type copyright, credits or license() for more information.
 lusai = u'lüsai'
Unsupported characters in input
 lusai = 'lüsai'
Unsupported characters in input
 print ŠČŽ
Unsupported characters in input
===
In a script...
Every time that I try to use unicode an exception is thrown.
  All try blocks in the following code trigger an exception
===
#/bin/bash/env python
# -*- coding: utf-8 -*-

print '=='

import sys # sys.version_info = sys.version_info(major=2, minor=7, micro=1, 
releaselevel='final', serial=0)
print 'sys.version_info = {}.{}.{} {} {}'.format(sys.version_info[0], 
sys.version_info[1], sys.version_info[2], sys.version_info[3], 
sys.version_info[4])

import commands, os
print 'os.name = {}'.format(os.name)
print 'os.uname = {}'.format(os.uname())

print '=='

def myUpper(inString):
return inString.upper().replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü').replace('ẞ', 'ß')

def myLower(inString):
return inString.lower().replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü').replace('ß', 'ẞ')

def myTitle(inString):
returnValue = []
for theWord in inString.split():
returnValue.append(myUpper(theWord[:1]) + myLower(theWord[1:]))
return ' '.join(returnValue)

def formatted(inValue, inSep = ' '):
s = str(inValue)
print ' s={}{}su={}{}sl={}{}st={}...'.format(s, inSep, s.upper(), inSep, 
s.lower(), inSep, s.title())
print ' s={}{}mu={}{}ml={}{}mt={}...'.format(s, inSep, myUpper(s), inSep, 
myLower(s), inSep, myTitle(s))
u = unicode(inValue, 'utf8')
try:
print ' u={}{}uu={}{}ul={}{}ut={}...'.format(u, inSep, u.upper(), 
inSep, u.lower(), inSep, u.title())
except:
print === Exception thrown trying to print unicode({}, 
'utf8').format(repr(s))

kolnUpperUnspecified   = str('KÖLN')
kolnUpperAsString  = str('KÖLN')
kolnUpperAsUnicode = unicode('KÖLN', 'utf8')

kolnLowerUnspecified   = str('köln')
kolnLowerAsString  = str('köln')
kolnLowerAsUnicode = unicode('köln', 'utf8')

formatted(kolnUpperUnspecified)
formatted(kolnUpperAsString)
try:
formatted(kolnUpperAsUnicode)
except:
pass

formatted(kolnLowerUnspecified)
formatted(kolnLowerAsString)
try:
formatted(kolnLowerAsUnicode)
except:
pass

formatted('Ötto Clauß lives in the hamlet of Lüsai in the village of Lü in the 
valley of Val Müstair in the Canton of Graubünden', '\n')
formatted('ZÜRICH is the largest city in Switzerland and the geographic center 
of the country is in Älggi-Alp which can be reached via the Lötschberg Tunnel', 
'\n')
formatted('20% of Swiss people speak Französisch but only 0.5% speak 
Rätoromanisch', '\n')
formatted('LÜSAI, lüsai, München, Neuchâtel, Ny-Ålesund, Tromsø, ZÜRICH', '\n')

print BUGS: certain diacritical marks can and should be capitalized...
str.upper() does not .replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü'), etc.
str.lower() does not .replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü'), etc.
str.title() has the same problems plus it capitalizes the letter _after_ a 
diacritic. e.g. 'lüsai'.title() -- 'LÜSai' with a capitol 'S'
myUpper(), myLower(), myTitle() exhibit the correct behavior with a handful 
of diacritic marks.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

In addition to R. David's remark, it also works fine in a German locale. In 
general, you cannot know whether the byte '\xe4' denotes 'ä' or some other 
letter. For example, in KOI8-R, it denotes Д, instead, which already is an 
upper-case letter. So either do setlocale at the start of your program, or 
(better) switch to Unicode strings.

Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type help, copyright, credits or license for more information.
 print u'ä'.upper()
Ä

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread STINNER Victor

STINNER Victor victor.stin...@gmail.com added the comment:

Or you can port your program to Python 3 to avoid such issues :-)

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14587] Certain diacritical marks can and should be capitalized... e.g. ü -- Ü

2012-04-15 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Indeed, this type of confusion is a large part of the motivation behind Python3.

You might try posting to the python-list mailing list asking for help if for 
some reason you are required to use python2 for your program.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue14587
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com