--------------------------------------------
On Wed, 11/27/13, Steven D'Aprano <[email protected]> wrote:

 Subject: Re: [Tutor] string replacement in Python 2 and 3
 To: [email protected]
 Date: Wednesday, November 27, 2013, 12:36 AM
 
 On Tue, Nov 26, 2013 at 11:42:29AM
 -0800, Albert-Jan Roskam wrote:
 > Hi,
 > 
 > String replacement works quite differently with bytes
 objects in 
 > Python 3 than with string objects in Python 2. What is
 the best way to 
 > make example #1 below run in Python 2 and 3? 
 
 If you are working with text strings, always use the text
 string type. 
 In Python 3, that is called "str". In Python 2, that is
 called 
 "unicode". To make it easier, I do this at the top of the
 module:
 
 try:
     unicode
 except NameError:
     # Python 3.
     pass
 else:
     # Python 2.
     str = unicode
 
 then always use str. Or, if you prefer:
 
 try:
     unicode
 except NameError:
     # Python 3.
     unicode = str
 
 
 and always use unicode.
 
 As an alternative, if you need to support Python 2.7 and 3.3
 only, you 
 can use u'' string literals:
 
 s = u"Hello World!"
 
 Sadly, Python 3.1 and 3.2 (don't use 3.0, it's broken) don't
 support the 
 u string prefix. If you have to support them:
 
 if sys.version < '3':
     def u(astr):
         return unicode(astr)
 else:
     def u(astr):
         return astr
 
 
 and then call:
 
 s = u("Hello World!")
 
 *but* be aware that this only works with ASCII string
 literals. We can 
 make u() be smarter and handle more cases:
 
 if sys.version < '3':
     def u(obj, encoding='utf-8',
 errors='strict'):
         if isinstance(obj, str):
             return
 obj.decode(encoding, errors)
         elif isinstance(obj, unicode):
             return obj
         else:
             return
 unicode(obj)
 else:
     def u(obj, encoding='utf-8',
 errors='strict'):
         if isinstance(obj, str):
             return obj
         elif isinstance(obj, bytes):
             return
 obj.decode(encoding, errors)
         else:
             return str(obj)
 
 then use the u() function on any string, text or bytes, or
 any other 
 object, as needed. 
 
 But the important thing here is:
 
 * convert bytes to text as early as possible;
 
 * then do all your work using text;
 
 * and only convert back to bytes if you really need to, 
   and as late as possible.
 
 
 If you find yourself converting backwards and forwards
 between bytes and 
 text multiple times for each piece of data, you're doing it
 wrong. Look 
 at file input in Python 3: when you open a file for reading
 in text 
 mode, it returns a text string, even though the underlying
 file on disk 
 is bytes. It decodes those bytes once, as early as it can
 (when 
 reading), and then for the rest of your program you treat it
 as text. 
 Then when you write it back out to a file, it encodes it to
 bytes only 
 when doing the write(). That's the strategy you should aim
 to copy.
 
 Ideally, no string should be encoded or decoded more than
 once each in 
 its entire lifespan.
 
 
 
===> Hi Steven,

Thanks for your advice (esp. the bullets are placemat-worthy ;-). I will have a 
crtical look at my code to see where I can improve it. I am reading binary data 
so I start with str (Python 2) or bytes (Python 3). Before I was adapting my 
code for use in both Python versions, I simply returned string (Python 2 sense 
of the word) data. So in Pyhon 3 it seemed consisent to return bytes.

In one case, I add padding to values to get rid of null bytes. That's a small 
operation that's done very very often. The first example below is MUCH faster 
than ljust, but it does not work in Python 3. Maybe Donald Knuth should slap me 
because I am optimzing prematurely,
>>> value = b"blah"
>>> b"%-50s" % value   # fast, but not python 3 proof
'blah                                              '
>>> value.ljust(50)    # okay, this will have to do for python 3, then
'blah                                              '
                                            '
 regards,
Albert-Jan
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to