Others have answered your question far better than I could have, but I wanted to add a caution against clobbering built-ins. "str" is a python built-in, the string type. In one-off code like this it's unlikely to cause a problem, but overriding str or other built-ins this way in more complex code can lead to very confusing bugs. On Oct 12, 2014 3:09 PM, "Scott Garman" <[email protected]> wrote:
> Hi all, > > I'm getting pretty confused by a problem I'm trying to solve in python, > which is to detect lower-case characters in a string. This would normally > be a simple regex, but I have to also accept input strings with umlats in > them, such as 'ä'. I'm using python 2.7.6. > > At first I thought this was a unicode problem, but now I'm not so sure. > About anything. > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > str = 'ä' > > if isinstance(str, unicode): > print "This is unicode" > > Running this tells me that string is *not* unicode. I know that there's a > thing called extended ASCII, and if I look up a table for that, I see > characters with accents and umlats: > > http://www.asciitable.com/ > > This table suggests that 'ä' should correspond to an ordinal value of 132. > But if I run: > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > string = 'ä' > > for c in string: > print ord(c) > > I get: > > 195 > 164 > > which tells me that I'm dealing with a two-byte character, which brings me > back to this being unicode. > > Now looking at which characters in the extended ASCII table correspond to > those values, I don't see any relation to 'ä'. > > Finally, my understanding of python 2.x is that it does not support > unicode in regexes. Otherwise I'd just use \p{Ll} and have a good deal more > hair left on my head. > > I've also tried forcing the string to ASCII using: > > str.decode("ascii", "ignore") > > and this is one of those characters that just gets dropped in the > conversion. > > Any insights on what I'm missing would be greatly appreciated. > > Thanks, > > Scott > > _______________________________________________ > Portland mailing list > [email protected] > https://mail.python.org/mailman/listinfo/portland > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/portland/attachments/20141012/b9158adb/attachment.html> _______________________________________________ Portland mailing list [email protected] https://mail.python.org/mailman/listinfo/portland
