On 2012-12-21 00:19, larry.mart...@gmail.com wrote:
I have a list of tuples that contains a tool_id, a time, and a message. I want 
to select from this list all the elements where the message matches some 
string, and all the other elements where the time is within some diff of any 
matching message for that tool.

Here is how I am currently doing this:

# record time for each message matching the specified message for each tool
messageTimes = {}
for row in cdata:   # tool, time, message
     if self.message in row[2]:
         messageTimes[row[0], row[1]] = 1

It looks like 'messageTimes' is really a set of tool/time pairs.

You could make it a dict of sets of time; in other words, a set of
times for each tool:

messageTimes = defaultdict(set)
for row in cdata:   # tool, time, message
    if self.message in row[2]:
        messageTimes[row[0]].add(row[1])

# now pull out each message that is within the time diff for each matched 
message
# as well as the matched messages themselves

def determine(tup):
     if self.message in tup[2]: return True      # matched message

     for (tool, date_time) in messageTimes:
         if tool == tup[0]:
             if abs(date_time-tup[1]) <= tdiff:
                return True

     return False

def determine(tup):
     if self.message in tup[2]: return True      # matched message

     # Scan through the times for the tool given by tup[0].
     for date_time in messageTimes[tup[0]]:
         if abs(date_time - tup[1]) <= tdiff:
            return True

     return False

cdata[:] = [tup for tup in cdata if determine(tup)]

This code works, but it takes way too long to run - e.g. when cdata has 600,000 
elements (which is typical for my app) it takes 2 hours for this to run.

Can anyone give me some suggestions on speeding this up?


--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to