Re: [Python-3000] bytes & Py_TPFLAGS_BASETYPE

Mathieu Fenniak Sun, 16 Sep 2007 13:20:30 -0700

On 16-Sep-07, at 12:38 PM, Guido van Rossum wrote:
> I'm not doubting that *your* subclass works well enough. The problem
> is that it must robust in the light of *any* subclass, no matter how
> crazy.


I understand that, but I'm not sure what kind of problems can be  
created by crazy subclasses.  But my imagination of "crazy subclass"  
is pretty limited.

> I'd have to understand more about your app to see whether subclassing
> truly makes sense.

I didn't want to flood too many pointless details into the  
discussion, so here's the minimum that I think is relevant.  The  
project is pyPdf, a library for reading and writing PDF files.  I've  
been working on making the library support unicode text strings  
within PDF documents.

In a PDF file, a "string" can either be a text string, or a byte  
string.  A string is a text string if it starts with a UTF-16BE BOM  
marker, or if it can be decoded using an encoding called  
PDFDocEncoding (which is specified by the PDF reference, similar to  
Latin-1 but different just to make life difficult).  pyPdf needs to  
be capable of reading and writing these string objects.  Whether a  
string is a byte or a text string, writing out the raw bytes is the  
same process after the text has been encoded.  This lends itself to a  
common StringObject base class:

class StringObject(PdfObject):
     # contains common behavior for both types of strings, such as  
the ability to serialize out a byte array, encrypt/decrypt strings  
for "secure" PDF files
     # also contains reading code that attempts to autodetect whether  
the string is a byte or text string

class ByteStringObject(bytes, StringObject):
     # adds the byte array storage, and passes self back to  
StringObject for serialization output

class TextStringObject(str, StringObject):
     # overrides the default output serialization to encode the  
unicode string to match PDF's requirements,
     # passes the resulting byte array up for serialization.

(complete source code, if you're interested: http://hg.pybrary.net/ 
pyPdf-py3/file/fe0dc2014a1b/pyPdf/generic.py)

Deriving from the bytes type provides storage, and also direct & easy  
access to the byte array content.  I think in this case using bytes  
as a base type makes sense, at least as much as using str as a base  
type.  pyPdf derives from list and dict for different PDF object  
types in a similar manner as well.

Mathieu Fenniak
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] bytes & Py_TPFLAGS_BASETYPE

Reply via email to