On 9/16/07, Mathieu Fenniak <[EMAIL PROTECTED]> wrote: > On 16-Sep-07, at 12:38 PM, Guido van Rossum wrote: > > I'm not doubting that *your* subclass works well enough. The problem > > is that it must robust in the light of *any* subclass, no matter how > > crazy. > > I understand that, but I'm not sure what kind of problems can be > created by crazy subclasses. But my imagination of "crazy subclass" > is pretty limited. > > > I'd have to understand more about your app to see whether subclassing > > truly makes sense. > > I didn't want to flood too many pointless details into the > discussion, so here's the minimum that I think is relevant. The > project is pyPdf, a library for reading and writing PDF files. I've > been working on making the library support unicode text strings > within PDF documents. > > In a PDF file, a "string" can either be a text string, or a byte > string. A string is a text string if it starts with a UTF-16BE BOM > marker, or if it can be decoded using an encoding called > PDFDocEncoding (which is specified by the PDF reference, similar to > Latin-1 but different just to make life difficult). pyPdf needs to > be capable of reading and writing these string objects. Whether a > string is a byte or a text string, writing out the raw bytes is the > same process after the text has been encoded. This lends itself to a > common StringObject base class: > > class StringObject(PdfObject): > # contains common behavior for both types of strings, such as > the ability to serialize out a byte array, encrypt/decrypt strings > for "secure" PDF files > # also contains reading code that attempts to autodetect whether > the string is a byte or text string > > class ByteStringObject(bytes, StringObject): > # adds the byte array storage, and passes self back to > StringObject for serialization output > > class TextStringObject(str, StringObject): > # overrides the default output serialization to encode the > unicode string to match PDF's requirements, > # passes the resulting byte array up for serialization. > > (complete source code, if you're interested: http://hg.pybrary.net/ > pyPdf-py3/file/fe0dc2014a1b/pyPdf/generic.py) > > Deriving from the bytes type provides storage, and also direct & easy > access to the byte array content. I think in this case using bytes > as a base type makes sense, at least as much as using str as a base > type. pyPdf derives from list and dict for different PDF object > types in a similar manner as well.
So suppose my answer was "no, bytes won't be subclassable". How much would you really lose by having to wrap a separate object around a bytes object, rather than being able to subclass? How much extra code do you think you would have to write? Another way to look at it-- how much of the bytes type's API do your objects really have to support? -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com