Re: help with making my code more efficient

larry.mart...@gmail.com Thu, 20 Dec 2012 17:49:58 -0800

On Thursday, December 20, 2012 6:17:04 PM UTC-7, Dave Angel wrote:
> On 12/20/2012 07:19 PM, larry.mart...@gmail.com wrote:
> 
> > I have a list of tuples that contains a tool_id, a time, and a message. I 
> > want to select from this list all the elements where the message matches 
> > some string, and all the other elements where the time is within some diff 
> > of any matching message for that tool. 
> 
> > Here is how I am currently doing this:
> 
> No, it's not.  This is a fragment of code, without enough clues as to
> 
> what else is going.  We can guess, but that's likely to make a mess.


Of course it's a fragment - it's part of a large program and I was just showing 
the relevant parts. 

> First question is whether this code works exactly correctly?  

Yes, the code works. I end up with just the rows I want. 

> Are you only concerned about speed, not fixing features?  

Don't know what you mean by 'fixing features'. The code does what I want, it 
just takes too long.

> As far as I can tell, the logic that includes the time comparison is bogus.  

Not at all. 

> You don't do  anything there to worry about the value of tup[2], just whether 
> some
> item has a nearby time.  Of course, I could misunderstand the spec.

The data comes from a database. tup[2] is a datetime column. tdiff comes from a 
datetime.timedelta() 

> Are you making a global called 'self' ?  That name is by convention only
> used in methods to designate the instance object.  What's the attribute
> self?

Yes, self is my instance object. self.message contains the string of interest 
that I need to look for. 

> Can cdata have duplicates, and are they significant? 

No, it will not have duplicates.

> Are you including  the time building that as part of your 2 hour measurement? 

No, the 2 hours is just the time to run the 

cdata[:] = [tup for tup in cdata if determine(tup)]

> Is the list sorted in any way?

Yes, the list is sorted by tool and datetime.

> Chances are your performance bottleneck is the doubly-nested loop.  You
> have a list comprehension at top-level code, and inside it calls a
> function that also loops over the 600,000 items.  So the inner loop gets
> executed 360 billion times.  You can cut this down drastically by some
> judicious sorting, as well as by having a map of lists, where the map is
> keyed by the tool.

Thanks. I will try that.

> > # record time for each message matching the specified message for each tool
> 
> > messageTimes = {}
>
> You're building a dictionary;  are you actually using the value (1), or
>  is only the key relevant?  

Only the keys.

> A set is a dict without a value.  

Yes, I could use a set, but I don't think that would make it measurably faster. 

> But more mportantly, you never look up anything in this dictionary.  So why
> isn't it a list?  For that matter, why don't you just use the
> messageTimes list?

Yes, it could be a list too.
 
> > for row in cdata:   # tool, time, message
> 
> >     if self.message in row[2]:
> 
> >         messageTimes[row[0], row[1]] = 1
> 
> >
> 
> > # now pull out each message that is within the time diff for each matched 
> > message
> 
> > # as well as the matched messages themselves
> 
> >
> 
> > def determine(tup):
> 
> >     if self.message in tup[2]: return True      # matched message 
> 
> >
> 
> >     for (tool, date_time) in messageTimes:
> 
> >         if tool == tup[0]:
> 
> >             if abs(date_time-tup[1]) <= tdiff: 
> 
> >                return True
> 
> >
> 
> >     return False
> 
> >         
> 
> > cdata[:] = [tup for tup in cdata if determine(tup)]
> 
> 
> 
> As the code exists, there's no need to copy the list.  Just do a simple
> bind.

This statement is to remove the items from cdata that I don't want. I don't 
know what you mean by bind. I'm not familiar with that python function. 

> 
> 
> 
> >
> 
> > This code works, but it takes way too long to run - e.g. when cdata has 
> > 600,000 elements (which is typical for my app) it takes 2 hours for this to 
> > run. 
> 
> >
> 
> > Can anyone give me some suggestions on speeding this up?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: help with making my code more efficient

Reply via email to