Been digging ever since I posted this. I suspected that the response might be use a database. I am worried I am trying to reinvent the wheel. The problem is I don't want any dependencies and I also don't need persistence program runs. I kind of wanted to keep the use of petit very similar to cat, head, awk, etc. But, that said, I have realized that if I provide the analysis features as an API, you very well, might want persistence between runs.
What about using an array inside a shelve? Just got done messing with this in python shell: import shelve d = shelve.open(filename="/root/test.shelf", protocol=-1) d["log"] = () d["log"].append("test1") d["log"].append("test2") d["log"].append("test3") Then, always interacting with d["log"], for example: for i in d["log"]: print i Thoughts? I know this won't manage memory, but it will keep the footprint down right? On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__pete...@web.de> wrote: > Scott McCarty wrote: > > > Sorry to ask this question. I have search the list archives and googled, > > but I don't even know what words to find what I am looking for, I am just > > looking for a little kick in the right direction. > > > > I have a Python based log analysis program called petit ( > > http://crunchtools.com/petit). I am trying to modify it to manage the > main > > object types to and from disk. > > > > Essentially, I have one object which is a list of a bunch of "Entry" > > objects. The Entry objects have date, time, date, etc fields which I use > > for analysis techniques. At the very beginning I build up the list of > > objects then would like to start pickling it while building to save > > memory. I want to be able to process more entries than I have memory. > With > > a strait list it looks like I could build from xreadlines(), but once you > > turn it into a more complex object, I don't quick know where to go. > > > > I understand how to pickle the entire data structure, but I need > something > > that will manage the memory/disk allocation? Any thoughts? > > You can write multiple pickled objects into a single file: > > import cPickle as pickle > > def dump(filename, items): > with open(filename, "wb") as out: > dump = pickle.Pickler(out).dump > for item in items: > dump(item) > > def load(filename): > with open(filename, "rb") as instream: > load = pickle.Unpickler(instream).load > while True: > try: > item = load() > except EOFError: > break > yield item > > if __name__ == "__main__": > filename = "tmp.pickle" > from collections import namedtuple > T = namedtuple("T", "alpha beta") > dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3]))) > for item in load(filename): > print item > > To get random access you'd have to maintain a list containing the offsets > of > the entries in the file. > However, a simple database like SQLite is probably sufficient for the kind > of entries you have in mind, and it allows operations like aggregation, > sorting and grouping out of the box. > > Peter > > -- > http://mail.python.org/mailman/listinfo/python-list >
-- http://mail.python.org/mailman/listinfo/python-list