Jan Jansen wrote:
Hi there,

I'm working on a code to read and write large amounts of binary data according to a given specification. In the specification there are a lot of "segments" defined. The segments in turn have defintions of datatypes and what they represent, how many of some of the data values are present in the file and sometimes the offset from the beginning of the file.

Now I wonder, what would be a good way to model the code.

Currently I have one class, that is the "FileReader". This class holds the file object, information about the endianess and also a method to read data (using the struct module). Then, I have more classes representing the segements. In those classes I define data-formats, call the read-method of the FileReader object and hold the data. Currently I'm passing the FileReader object as arguement.

Here some examples, first the "FileReader" class:

class JTFile():

    def __init__(self, file_obj):
        self.file_stream = file_obj
        self.version_string = ""
        self.endian_format_prefix = ""

    def read_data(self, fmt, pos = None):
        format_size = struct.calcsize(fmt)
        if pos is not None:
            self.file_stream.seek(pos)
return struct.unpack_from(self.endian_format_prefix + fmt, self.file_stream.read(format_size))

and here an example for a segment class that uses a FileReader instance (file_stream):

class LSGSegement():

    def __init__(self, file_stream):
        self.file_stream = file_stream
        self.lsg_root_element = None
        self._read_lsg_root()

    def _read_lsg_root(self):
        fmt = "80Bi"
        raw_data = self.file_stream.read_data(fmt)
self.lsg_root_element = LSGRootElement(raw_data[:79], raw_data[79])

So, now I wonder, what would be a good pythonic way to model the FileReader class. Maybe use a global functions to avoid passing the FileReader object around? Or something like "Singleton" I've heard about but never used it? Or keept it like that?

Cheers,

Jan


I agree with Luke's advice, but would add some comments.

As soon as you have a global (or a singleton) representing a file, you're making the explicit assumption that you'll never have two such files open. So what happens if you need to merge two such files? Start over? You need to continue to pass something representing the file (JTFile object) into each constructor.

The real question is one of state, which isn't clear from your example. The file_stream attribute of an object of class JTFile has a file position, which you are implitly using. But you said some segments are at fixed positions in the file, and presumably some are serially related to other segments. Or perhaps some segments are really a section of the file containing smaller segments of different type(s).

Similarly, each object, after being created, probably has relationship to other objects. Without knowing that, you can't design those object classes.

Finally, you need to decide early on what to do about data validation. If the file happens to be busted, how are you going to notify the user. If you read it in an ad-hoc, random order, you'll have a very hard time informing the user anything useful about what's wrong with it, never mind recovering from it.

It's really a problem in serialization, where you read a file by deserializing. Consider whether the file is going to be always small enough to support simply interpreting the entire stream into a tree of objects, and then dealing with them. Conceivably you can do that lazily, only deserializing objects as they are referenced. But the possibility of doing that depends highly on whether there is what amounts to a "directory" in the file, or whether each object's position is determined by the length of the previous one.

In addition to deserializing in one pass, or lazily deserializing, consider deserializing with callbacks. In this approach you do not necessarily keep the intermediate objects, you just call a specified user routine, who should keep the objects if she cares about them, or process them or ignore them as needed.

I've had to choose each of these approaches for different projects, and the choice depended in large part on the definition of the data file, and whether it could be randomly accessed.

DaveA


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to