In article <cc869959-c568-4490-b45f-7855c6841...@googlegroups.com>, "larry.mart...@gmail.com" <larry.mart...@gmail.com> wrote:
> On Thursday, December 20, 2012 5:38:03 PM UTC-7, Chris Angelico wrote: > > On Fri, Dec 21, 2012 at 11:19 AM, larry.mart...@gmail.com > > > > <larry.mart...@gmail.com> wrote: > > > > > This code works, but it takes way too long to run - e.g. when cdata has > > > 600,000 elements (which is typical for my app) it takes 2 hours for this > > > to run. > > > > > > > > > > Can anyone give me some suggestions on speeding this up? > > > > > > > > > > > > > It sounds like you may have enough data to want to not keep it all in > > > > memory. Have you considered switching to a database? You could then > > > > execute SQL queries against it. > > It came from a database. Originally I was getting just the data I wanted > using SQL, but that was taking too long also. I was selecting just the > messages I wanted, then for each one of those doing another query to get the > data within the time diff of each. That was resulting in tens of thousands of > queries. So I changed it to pull all the potential matches at once and then > process it in python. If you're doing free-text matching, an SQL database may not be the right tool. I suspect you want to be looking at some kind of text search engine, such as http://lucene.apache.org/ or http://xapian.org/. -- http://mail.python.org/mailman/listinfo/python-list