On Fri, 12 Mar 2010 06:03:35 am spir wrote: > Hello, > > I need a custom unicode subtype (with additional methods). This will > not be directly used by the user, instead it is just for internal > purpose. I would like the type to be able to cope with either a byte > str or a unicode str as argument. In the first case, it needs to be > first decoded. I cannot do it in __init__ because unicode will first > try to decode it as ascii, which fails in the general case.
Are you aware that you can pass an explicit encoding to unicode? >>> print unicode('cdef', 'utf-16') 摣晥 >>> help(unicode) Help on class unicode in module __builtin__: class unicode(basestring) | unicode(string [, encoding[, errors]]) -> object > So, I > must have my own __new__. The issue is the object (self) is then a > unicode one instead of my own type. > > class Unicode(unicode): > Unicode.FORMAT = "utf8" > def __new__(self, text, format=None): > # text can be str or unicode > format = Unicode.FORMAT if format is None else format > if isinstance(text,str): > text = text.decode(format) > return text > ....... > > x = Unicode("abc") # --> unicode, not Unicode That's because you return a unicode object :) Python doesn't magically convert the result of __new__ into your class, in fact Python specifically allows __new__ to return something else. That's fairly unusual, but it does come in handy. "format" is not a good name to use. The accepted term is "encoding". You should also try to match the function signature of the built-in unicode object, which includes unicode() -> u''. Writing Unicode.FORMAT in the definition of Unicode can't work: >>> class Unicode(unicode): ... Unicode.FORMAT = 'abc' ... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in Unicode NameError: name 'Unicode' is not defined So it looks like you've posted something slightly different from what you are actually running. I have tried to match the behaviour of the built-in unicode as close as I am able. See here: http://docs.python.org/library/functions.html#unicode class Unicode(unicode): """Unicode(string [, encoding[, errors]]) -> object Special Unicode class that has all sorts of wonderful methods missing from the built-in unicode class. """ _ENCODING = "utf8" _ERRORS = "strict" def __new__(cls, string='', encoding=None, errors=None): # If either encodings or errors is specified, then always # attempt decoding of the first argument. if (encoding, errors) != (None, None): if encoding is None: encoding = cls._ENCODING if errors is None: errors = cls._ERRORS obj = super(Unicode, cls).__new__( Unicode, string, encoding, errors) else: # Never attempt decoding. obj = super(Unicode, cls).__new__(Unicode, string) assert isinstance(obj, Unicode) return obj >>> Unicode() u'' >>> Unicode('abc') u'abc' >>> Unicode('cdef', 'utf-16') u'\u6463\u6665' >>> Unicode(u'abcd') u'abcd' -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor