Re: [Tutor] Design Question: File Object used everywhere

Dave Angel Fri, 14 May 2010 04:03:26 -0700

Jan Jansen wrote:

Hi there,
I'm working on a code to read and write large amounts of binary dataaccording to a given specification. In the specification there are alot of "segments" defined. The segments in turn have defintions ofdatatypes and what they represent, how many of some of the data valuesare present in the file and sometimes the offset from the beginning ofthe file.
Now I wonder, what would be a good way to model the code.
Currently I have one class, that is the "FileReader". This class holdsthe file object, information about the endianess and also a method toread data (using the struct module). Then, I have more classesrepresenting the segements. In those classes I define data-formats,call the read-method of the FileReader object and hold the data.Currently I'm passing the FileReader object as arguement.
Here some examples, first the "FileReader" class:

class JTFile():

    def __init__(self, file_obj):
        self.file_stream = file_obj
        self.version_string = ""
        self.endian_format_prefix = ""

    def read_data(self, fmt, pos = None):
        format_size = struct.calcsize(fmt)
        if pos is not None:
            self.file_stream.seek(pos)
return struct.unpack_from(self.endian_format_prefix + fmt,self.file_stream.read(format_size))
and here an example for a segment class that uses a FileReaderinstance (file_stream):
class LSGSegement():

    def __init__(self, file_stream):
        self.file_stream = file_stream
        self.lsg_root_element = None
        self._read_lsg_root()

    def _read_lsg_root(self):
        fmt = "80Bi"
        raw_data = self.file_stream.read_data(fmt)
self.lsg_root_element = LSGRootElement(raw_data[:79],raw_data[79])
So, now I wonder, what would be a good pythonic way to model theFileReader class. Maybe use a global functions to avoid passing theFileReader object around? Or something like "Singleton" I've heardabout but never used it? Or keept it like that?
Cheers,

Jan

I agree with Luke's advice, but would add some comments.

As soon as you have a global (or a singleton) representing a file,you're making the explicit assumption that you'll never have two suchfiles open. So what happens if you need to merge two such files? Startover? You need to continue to pass something representing the file(JTFile object) into each constructor.

The real question is one of state, which isn't clear from your example.The file_stream attribute of an object of class JTFile has a fileposition, which you are implitly using. But you said some segments areat fixed positions in the file, and presumably some are serially relatedto other segments. Or perhaps some segments are really a section of thefile containing smaller segments of different type(s).

Similarly, each object, after being created, probably has relationshipto other objects. Without knowing that, you can't design those objectclasses.

Finally, you need to decide early on what to do about data validation.If the file happens to be busted, how are you going to notify the user.If you read it in an ad-hoc, random order, you'll have a very hard timeinforming the user anything useful about what's wrong with it, nevermind recovering from it.

It's really a problem in serialization, where you read a file bydeserializing. Consider whether the file is going to be always smallenough to support simply interpreting the entire stream into a tree ofobjects, and then dealing with them. Conceivably you can do thatlazily, only deserializing objects as they are referenced. But thepossibility of doing that depends highly on whether there is whatamounts to a "directory" in the file, or whether each object's positionis determined by the length of the previous one.

In addition to deserializing in one pass, or lazily deserializing,consider deserializing with callbacks. In this approach you do notnecessarily keep the intermediate objects, you just call a specifieduser routine, who should keep the objects if she cares about them, orprocess them or ignore them as needed.

I've had to choose each of these approaches for different projects, andthe choice depended in large part on the definition of the data file,and whether it could be randomly accessed.


DaveA


_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Design Question: File Object used everywhere

Reply via email to