Others have answered your question far better than I could have, but I
wanted to add a caution against clobbering built-ins. "str" is a python
built-in, the string type. In one-off code like this it's unlikely to cause
a problem, but overriding str or other built-ins this way in more complex
code can lead to very confusing bugs.
On Oct 12, 2014 3:09 PM, "Scott Garman" <[email protected]> wrote:

> Hi all,
>
> I'm getting pretty confused by a problem I'm trying to solve in python,
> which is to detect lower-case characters in a string. This would normally
> be a simple regex, but I have to also accept input strings with umlats in
> them, such as 'ä'. I'm using python 2.7.6.
>
> At first I thought this was a unicode problem, but now I'm not so sure.
> About anything.
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> str = 'ä'
>
> if isinstance(str, unicode):
>         print "This is unicode"
>
> Running this tells me that string is *not* unicode. I know that there's a
> thing called extended ASCII, and if I look up a table for that, I see
> characters with accents and umlats:
>
> http://www.asciitable.com/
>
> This table suggests that 'ä' should correspond to an ordinal value of 132.
> But if I run:
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> string = 'ä'
>
> for c in string:
>     print ord(c)
>
> I get:
>
> 195
> 164
>
> which tells me that I'm dealing with a two-byte character, which brings me
> back to this being unicode.
>
> Now looking at which characters in the extended ASCII table correspond to
> those values, I don't see any relation to 'ä'.
>
> Finally, my understanding of python 2.x is that it does not support
> unicode in regexes. Otherwise I'd just use \p{Ll} and have a good deal more
> hair left on my head.
>
> I've also tried forcing the string to ASCII using:
>
> str.decode("ascii", "ignore")
>
> and this is one of those characters that just gets dropped in the
> conversion.
>
> Any insights on what I'm missing would be greatly appreciated.
>
> Thanks,
>
> Scott
>
> _______________________________________________
> Portland mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/portland
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.python.org/pipermail/portland/attachments/20141012/b9158adb/attachment.html>
_______________________________________________
Portland mailing list
[email protected]
https://mail.python.org/mailman/listinfo/portland

Reply via email to