Hi
I created some code to get records from a potentially giant .csv file. This implements a __getitem__ method that gets records from a memory-mapped csv file. In order for this to work, I need to build a lookup table that maps line numbers to line starts/ends. This works, BUT building the lookup table could be time-consuming (and it freezes up the app). The (somewhat pruned) code is here: http://pastebin.com/0x6JKbfh. Now I would like to build the lookup table in a separate process. I used multiprocessing. In the crude example below, it appears to be doing what I have in mind. Is this the way to do it? I have never used multiprocessing/threading before, apart from playing around. One specfic question: __getitem__ is supposed to throw an IndexError when needed. But how do I know when I should do this if I don't yet know the total number of records? If there an uever cheap way of doing getting this number? import multiprocessing as mp import time class Test(object): """Get records from a potentially huge, therefore mmapped, file (.csv)""" def __init__(self): self.lookup = mp.Manager().dict() self.lookup_done = False process = mp.Process(target=self.create_lookup, args=(self.lookup,)) process.start() def create_lookup(self, d): """time-consuming function that is only called once""" for i in xrange(10 ** 7): d[i] = i process.join() self.lookup_done = True def __getitem__(self, key): """In my app, this returns a record from a memory-mapped file The key is the line number, the value is a two-tuple of the start and the end byte of that record""" try: return self.lookup[key] except KeyError: # what's a cheap way to calculate the number of records in a .csv file? self.total_number_of_records = 10 ** 7 if key > self.total_number_of_records: if not self.lookup_done: process.join() raise IndexError("index out of range") print "One moment please, lookup not yet ready enough" if __name__== "__main__": test = Test() # check if it works while True: k = int(raw_input("enter key: ")) try: print "value is ", test[k] time.sleep(1) except KeyError: print "OOPS, value not yet in lookup" print "Max key is now", max(test.lookup.keys()) if test.lookup and max(test.lookup.keys()) == (10 ** 7 - 1): print "Exiting" break print "Done" Thank you in advance! Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor