Christian Clauss <ccla...@bluewin.ch> added the comment:

On Apr 15, 2012, at 4:43 PM, R. David Murray wrote:

> 
> R. David Murray <rdmur...@bitdance.com> added the comment:
> 
> It works fine if you use unicode.
> 
> ----------
> nosy: +r.david.murray
> resolution:  -> invalid
> stage:  -> committed/rejected
> status: open -> closed
> 
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <http://bugs.python.org/issue14587>
> _______________________________________

What does it mean in this context to "use unicode"??
===============================================
In Idle... 
===============================================
Python 2.7.3 (v2.7.3:70274d53c1dd, Apr  9 2012, 20:52:43) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "copyright", "credits" or "license()" for more information.
>>> lusai = u'lüsai'
Unsupported characters in input
>>> lusai = 'lüsai'
Unsupported characters in input
>>> print "ŠČŽ"
Unsupported characters in input
===============================================
In a script...
Every time that I try to "use unicode" an exception is thrown.
  All try blocks in the following code trigger an exception
===============================================
#/bin/bash/env python
# -*- coding: utf-8 -*-

print '=========='

import sys # sys.version_info = sys.version_info(major=2, minor=7, micro=1, 
releaselevel='final', serial=0)
print 'sys.version_info = {}.{}.{} {} {}'.format(sys.version_info[0], 
sys.version_info[1], sys.version_info[2], sys.version_info[3], 
sys.version_info[4])

import commands, os
print 'os.name = {}'.format(os.name)
print 'os.uname = {}'.format(os.uname())

print '=========='

def myUpper(inString):
    return inString.upper().replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü').replace('ẞ', 'ß')

def myLower(inString):
    return inString.lower().replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü').replace('ß', 'ẞ')

def myTitle(inString):
    returnValue = []
    for theWord in inString.split():
        returnValue.append(myUpper(theWord[:1]) + myLower(theWord[1:]))
    return ' '.join(returnValue)

def formatted(inValue, inSep = ' '):
    s = str(inValue)
    print ' s={}{}su={}{}sl={}{}st={}...'.format(s, inSep, s.upper(), inSep, 
s.lower(), inSep, s.title())
    print ' s={}{}mu={}{}ml={}{}mt={}...'.format(s, inSep, myUpper(s), inSep, 
myLower(s), inSep, myTitle(s))
    u = unicode(inValue, 'utf8')
    try:
        print ' u={}{}uu={}{}ul={}{}ut={}...'.format(u, inSep, u.upper(), 
inSep, u.lower(), inSep, u.title())
    except:
        print "=== Exception thrown trying to print unicode({}, 
'utf8')".format(repr(s))

kolnUpperUnspecified   = str('KÖLN')
kolnUpperAsString      = str('KÖLN')
kolnUpperAsUnicode = unicode('KÖLN', 'utf8')

kolnLowerUnspecified   = str('köln')
kolnLowerAsString      = str('köln')
kolnLowerAsUnicode = unicode('köln', 'utf8')

formatted(kolnUpperUnspecified)
formatted(kolnUpperAsString)
try:
    formatted(kolnUpperAsUnicode)
except:
    pass

formatted(kolnLowerUnspecified)
formatted(kolnLowerAsString)
try:
    formatted(kolnLowerAsUnicode)
except:
    pass

formatted('Ötto Clauß lives in the hamlet of Lüsai in the village of Lü in the 
valley of Val Müstair in the Canton of Graubünden', '\n')
formatted('ZÜRICH is the largest city in Switzerland and the geographic center 
of the country is in Älggi-Alp which can be reached via the Lötschberg Tunnel', 
'\n')
formatted('20% of Swiss people speak Französisch but only 0.5% speak 
Rätoromanisch', '\n')
formatted('LÜSAI, lüsai, München, Neuchâtel, Ny-Ålesund, Tromsø, ZÜRICH', '\n')

print """BUGS: certain diacritical marks can and should be capitalized...
    str.upper() does not .replace('à', 'À').replace('ä', 'Ä').replace('è', 
'È').replace('é', 'É').replace('ö', 'Ö').replace('ü', 'Ü'), etc.
    str.lower() does not .replace('À', 'à').replace('Ä', 'ä').replace('È', 
'è').replace('É', 'é').replace('Ö', 'ö').replace('Ü', 'ü'), etc.
    str.title() has the same problems plus it capitalizes the letter _after_ a 
diacritic. e.g. 'lüsai'.title() --> 'LÜSai' with a capitol 'S'
    myUpper(), myLower(), myTitle() exhibit the correct behavior with a handful 
of diacritic marks."""

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14587>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to