Jerry Rocteur wrote: > When I try and format output when there are accented characters the > output does not look right. > > e.g. > > 27 Angie Dickons 67,638 > 28 Anne MÉRESSE 64,825 > > So the strings containing accented characters print one less than > those that don't. > > I've tried both: > > print '{0:2} {1:25} {2} '.format( cnt, nam[num].encode('utf-8'), > steps[ind1]) > print "%3d %-25s %-7s" % ( cnt, nam[num].encode('utf-8'), steps[ind1]) > > I've searched but I can't see a solution.. > > I guess it is the way I'm printing nam[num].encode('utf-8') perhaps I > have to convert it first ?
If you have a byte string (the standard in Python 2) you have to decode(), i. e. convert it to unicode) before you format it. Compare: >>> names = "Angie Dickons", "Anne Méresse" >>> for name in names: ... print "|{:20}|".format(name) ... |Angie Dickons | |Anne Méresse | >>> for name in names: ... name = name.decode("utf-8") ... print u"|{:20}|".format(name) ... |Angie Dickons | |Anne Méresse | The best approach is to convert your data to unicode as soon as you read it and perform all string operations with unicode. This also avoids breaking characters: >>> print "Méresse"[:2] M� >>> print u"Méresse"[:2] Mé There are still problems (e. g. with narrow builds), and the best way to avoid a few string-related inconviences is to switch to Python 3. -- https://mail.python.org/mailman/listinfo/python-list