----- Original Message -----
> From: Danny Yoo <d...@hashcollision.org>
> To: Albert-Jan Roskam <fo...@yahoo.com>
> Cc: Python Tutor Mailing List <tutor@python.org>
> Sent: Saturday, August 23, 2014 2:53 AM
> Subject: Re: [Tutor] simple unicode question
> 
> On Fri, Aug 22, 2014 at 2:10 PM, Albert-Jan Roskam
> <fo...@yahoo.com.dmarc.invalid> wrote:
>>  Hi,
>> 
>>  I have data that is either floats or byte strings in utf-8. I need to cast 
> both to unicode strings.
> 
> 
> Just to be sure, I'm parsing the problem statement above as:
> 
>     data :== float
>                 | utf-8-encoded-byte-string


Yep, that's how I meant it :-)

> because the alternative way to parse the statement in English:
> 
>     data :== float-in-utf-8
>                 | byte-string-in-utf-8
> 
> doesn't make any technical sense.  :P
> 
> 
> 
> 
>>  I am probably missing something simple, but.. in the code below, under 
> "float", why does [B] throw an error but [A] does not?
>> 
> 
>>  # float: cannot explicitly give encoding, even if it's the default
>>>>>  value = 1.0
>>>>>  unicode(value)      # [A]
>>  u'1.0'
>>>>>  unicode(value, sys.getdefaultencoding())  # [B]
>> 
>>  Traceback (most recent call last):
>>    File "<pyshell#22>", line 1, in <module>
>>      unicode(value, sys.getdefaultencoding())
>>  TypeError: coercing to Unicode: need string or buffer, float found
> 
> 
> Yeah.  Unfortunately, you're right: this doesn't make too much sense.
>
> What's happening is that the standard library overloads two
> _different_ behaviors to the same function unicode(). It's conditioned
> on whether we're passing in a single value, or if we're passing in
> two.  I would not try to reconcile a single, same behavior for both
> uses: treat them as two distinct behaviors.
> 
> Reference: https://docs.python.org/2/library/functions.html#unicode

Hi,
 
First, THANKS for all your replies!

Aahh, the first two lines in the link clarify things:
unicode(object='')
unicode(object[, encoding[, errors]])

I would  find it better/clearer if the docstring also started with these two 
lines, or, alternatively, with unicode(*args)

I have tried to find these two functions in the source code, but I can´t find 
them (but I don't speak C). Somewhere near line 1200 perhaps? 
http://hg.python.org/cpython/file/74236c8bf064/Objects/unicodeobject.c


> Specifically, the two arg case is meant where you've got an
> uninterpreted source of bytes that should be decoded to Unicode using
> the provided encoding.
> 
> 
> So for your problem statement, the function should look something like:
> 
> ###############################
> def convert(data):
>     if isinstance(data, float):
>         return unicode(data)
>     if isinstance(data, bytes):
>         return unicode(data, "utf-8")
>     raise ValueError("Unexpected data", data)
> ###############################
> 
> where you must use unicode with either the 1-arg or 2-arg variant
> based on your input data.


Interesting, you follow a "look before you leap" approach here, whereas `they` 
always say it is easier to ”ask forgiveness than permission” in python. But 
LBYL is much faster, which is relevant because the function could be called 
millions and millions of times. If have noticed before that try-except is quite 
an expensive structure to initialize (for instance membership testing with ´in´ 
is cheaper than try-except-KeyError when getting items from a dictionary)

In [1]: def convert1(data):
...:         if isinstance(data, float):
...:                 return unicode(data)
...:         if isinstance(data, bytes):
...:                 return unicode(data, "utf-8")
...:         raise ValueError("Unexpected data", data)
...:

In [2]: %%timeit map(convert1, map(float, range(10)) + list("abcdefghij"))
10000 loops, best of 3: 19 us per loop

In [3]: def convert2(data):
....:         try:
....:                 return unicode(data, encoding="utf-8")
....:         except TypeError:
....:                 return unicode(data)
....:         raise ValueError("Unexpected data", data)
....:

In [4]: %timeit map(convert2, map(float, range(10)) + list("abcdefghij"))
10000 loops, best of 3: 40.4 us per loop   
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to