On Fri, Aug 22, 2014 at 02:10:21PM -0700, Albert-Jan Roskam wrote: > Hi, > > I have data that is either floats or byte strings in utf-8. I need to > cast both to unicode strings. I am probably missing something simple, > but.. in the code below, under "float", why does [B] throw an error > but [A] does not?
Unicode in Python 2 is a little more confusing than in Python 3. But let's see what is going on: > >>> value = 1.0 > >>> unicode(value) # [A] > u'1.0' This works for the same reason that str(value) works. By definition, the value 1.0 can only be converted into a single [text or byte] string, namely 1.0. (Well, to be absolutely pedantic, Python could support other languages, like ۱.۰ which is Arabic, but it doesn't.) > >>> unicode(value, sys.getdefaultencoding()) # [B] > > Traceback (most recent call last): > File "<pyshell#22>", line 1, in <module> > unicode(value, sys.getdefaultencoding()) > TypeError: coercing to Unicode: need string or buffer, float found Here, on the other hand, unicode sees that you are providing a second argument, so it expects a string or buffer object, but gets a float so it raises an error. You're probably thinking something along these lines: "Unicode strings need an encoding, so I want to convert 1.0 into Unicode using the ASCII encoding (or the UTF-8 encoding)" but that's not how it works. Unicode strings DON'T need an encoding. The Unicode string "1.0", or for that matter "π÷⇒Ж", is a string of exactly those characters and nothing else. In the same way that ASCII defines 127 characters, including A, B, C, ... Unicode defines (up to) 1114112 characters. There's no need to specify an encoding, because Unicode *is* the encoding. You only need to use an encoding when converting from Unicode to bytes, or visa versa. -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor