On Thu, 24 Jul 2008 17:19:41 -0400, Wei Hao <[EMAIL PROTECTED]> wrote:
Hi:I'm pretty new to python and I have some optimization issues. I'll show you the piece of code which is causing it, with pseudo-code before it and comments. I'm accessing a gigantic table (like 15 million rows) in SQL. d is some dictionary, r is a precompiled regex string Big loop, so I search through the table in chunks given by delta SQL query ("select * from table where rowID >= n and rowID < (n + delta)"), result of query stored in a. Each individual row is a[n1], columns of rows are a[n1][n2]. [snip] I am 100% sure it's this code snippet that's the cause of my problems. Here's what I can tell you. Each chunk of rows that I grab is essentially equal in size (rowID skips over stuff, but rather arbitrarily). The time it takes to fetch the SQL query doesn't change. But as the program progresses, this snippet gets slower. Here's the output: 2500 0.441551299341 5000 1.26162739664 7500 2.35092688403 10000 3.48417469666 12500 4.59031305491 15000 5.78972588775 17500 6.28305527139 20000 6.73344570903 22500 8.31732146487 25000 9.65322872159 27500 8.98186042757 30000 11.8042818095 32500 12.1965593712 35000 13.2735763291 37500 14.0282617344 What is it in the code snippet that slows down as n increases? Is there something about the way low level python functions I don't understand which is slowing me down?
Perhaps you need an index on rowID. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list
