Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Seth P Mon, 12 May 2014 16:09:28 -0700

Yep, it's not a SQLAlchemy issue. The following code demonstrates the 
problem with direct pyodbc access.


import pyodbc
import time

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        rows = func(*arg)
        t2 = time.time()
        print("%30s() len=%d, last=%s, runtime=%0.3fs" % (str(func).split(' 
at')[0][10:], len(rows), rows[-1], t2 - t1))
        return t2 - t1
    return wrapper

if __name__ == '__main__':
    cnxn = pyodbc.connect('DRIVER={SQL 
Server};SERVER=Compustat;DATABASE=Compustat')
    cursor = cnxn.cursor()
    sql_select_statement_base = "SELECT datadate, prcod FROM sec_dprc WHERE 
gvkey = ? ORDER BY datadate"
    key = '001045'

    @print_timing
    def execute_explicit_query():
        sql_select_statement_explicit = 
sql_select_statement_base.replace("?", "'%s'" % key)
        rows = cursor.execute(sql_select_statement_explicit).fetchall()
        return rows

    @print_timing
    def execute_parameterized_query():
        rows = cursor.execute(sql_select_statement_base, key).fetchall()
        return rows

    num_iterations = 5
    explicit_runtime = 0.0
    parameterized_runtime = 0.0
    for i in range(num_iterations):
        explicit_runtime += execute_explicit_query()
        parameterized_runtime += execute_parameterized_query()
    print("Total runtime for %d explicit queries = %0.3fs." % 
(num_iterations, explicit_runtime))
    print("Total runtime for %d parameterized queries = %0.3fs." % 
(num_iterations, parameterized_runtime))


On Monday, May 12, 2014 6:40:48 PM UTC-4, Michael Bayer wrote:
>
>
> On May 12, 2014, at 6:33 PM, Seth P <[email protected] <javascript:>> 
> wrote:
>
> Is it possible that the (primary key index (which is a composite index 
> that begins with gvkey, and is the only index on the table) isn't being 
> used because the the gvkey parameter is somehow passed as an integer rather 
> than as a string?
>
>
> There’s nothing in SQLAlchemy that coerces strings to integers.  If the 
> actual type of the column on the DB is an integer, then there might be some 
> conversion within pyodbc or the ODBC driver.
>
> if you’ve got it narrowed down this much the next step is to figure out a 
> raw pyodbc script that illustrates what the problem is. 
>
>
> The first EXEC below is pretty much instantaneous, whereas the second 
> takes about 8 seconds (and produces the same results).
>
> EXEC sp_executesql
> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
> sec_dprc_prcod
> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
> N'@gvkey VARCHAR(6)', '001045'
>
> EXEC sp_executesql
> N'SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
> sec_dprc_prcod
> FROM sec_dprc WHERE sec_dprc.gvkey = @gvkey ORDER BY sec_dprc.datadate',
> N'@gvkey INT', 001045
>
>
>
> On Monday, May 12, 2014 5:00:27 PM UTC-4, Michael Bayer wrote:
>>
>>
>> well there’s only one parameter being processed here so there is clearly 
>> negligible difference in time spent within Python as far as getting the 
>> statement ready to execute and then executing it.
>>
>> So the time is either in what SQL Server spends fetching the rows, or the 
>> number of rows being fetched (which seems to be the same).   Which leaves 
>> pretty much that SQL Server is making a different choice about the query 
>> plan for this SELECT statement, this is typically due to an INDEX being 
>> used or not.    You’d need to analyze the plan being used.   With SQL 
>> Server, the option to get a plan within programmatic execution seems to be 
>> per this answer 
>> http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-planto
>>  execute “SET SHOWPLAN_TEXT ON” ahead of time.
>>
>> Besides that, you can confirm where the time is being spent exactly using 
>> Python profiling.   A description on how to achieve that is here: 
>> http://stackoverflow.com/questions/1171166/how-can-i-profile-a-sqlalchemy-powered-application/1175677#1175677
>>
>>
>>
>> On May 12, 2014, at 3:48 PM, Seth P <[email protected]> wrote:
>>
>> After tracking down some extreme slowness in loading a one-to-many 
>> relationship (e.g. myobject.foobars), I seem to have isolated the issue to 
>> engine.execute() being much slower with parameterized queries than with 
>> explicit queries. The following is actual code and output for loading 
>> 10,971 rows from a database table. (The actual table has more columns than 
>> I'm including here, and is not designed by me.) Note that each explicit 
>> query (where I explicitly set the WHERE clause parameter and pass the 
>> resulting SQL statement to engine.execute()) runs in under 0.1 seconds, 
>> whereas each parameterized query (where I let SQLAlchemy bind the WHERE 
>> clause parameter) takes over 8 seconds.
>>
>> The difference in runtimes is smaller when the number of rows returned is 
>> smaller, which seems odd since I would have thought that the binding of the 
>> WHERE clause parameters is just done once and would be virtually 
>> instantaneous.
>>
>> Any thoughts?
>>
>> Thanks,
>>
>> Seth
>>
>>
>> import sqlalchemy as sa
>> from sqlalchemy.orm import sessionmaker
>> from sqlalchemy.ext.declarative import declarative_base
>> import time
>>
>> engine = sa.create_engine('mssql+pyodbc://Compustat/Compustat')
>> session = sessionmaker(bind=engine, autoflush=False, 
>> expire_on_commit=False)()
>>
>> class FooBar(declarative_base()):
>>     __tablename__ = 'sec_dprc'
>>     gvkey = sa.Column(sa.String(6), primary_key=True)
>>     datadate = sa.Column(sa.DateTime, primary_key=True)
>>     value = sa.Column(sa.Float, name='prcod')
>>
>> def print_timing(func):
>>     def wrapper(*arg):
>>         t1 = time.time()
>>         rows = func(*arg)
>>         t2 = time.time()
>>         print("%30s() len=%d, last=%s, runtime=%0.3fs" % 
>> (str(func).split(' at')[0][10:], len(rows), rows[-1], t2 - t1))
>>         return t2 - t1
>>     return wrapper
>>
>> if __name__ == '__main__':
>>
>>     key = '001045'
>>     query = session.query(FooBar.datadate, 
>> FooBar.value).filter(sa.and_(FooBar.gvkey == key)).order_by(FooBar.datadate)
>>     sql_select_statement_base = str(query)
>>     print(sql_select_statement_base)
>>
>>     @print_timing
>>     def execute_explicit_query():
>>         sql_select_statement_explicit = 
>> sql_select_statement_base.replace(":gvkey_1", "'%s'" % key)
>>         rows = 
>> engine.execute(sa.text(sql_select_statement_explicit)).fetchall()
>>         return rows
>>
>>     @print_timing
>>     def execute_parameterized_query():
>>         rows = engine.execute(sa.text(sql_select_statement_base), 
>> {'gvkey_1':key}).fetchall()
>>         return rows
>>
>>     num_iterations = 5
>>     explicit_runtime = 0.0
>>     parameterized_runtime = 0.0
>>     for i in range(num_iterations):
>>         explicit_runtime += execute_explicit_query()
>>         parameterized_runtime += execute_parameterized_query()
>>     print("Total runtime for %d explicit queries = %0.3fs." % 
>> (num_iterations, explicit_runtime))
>>     print("Total runtime for %d parameterized queries = %0.3fs." % 
>> (num_iterations, parameterized_runtime))
>>
>>
>> SELECT sec_dprc.datadate AS sec_dprc_datadate, sec_dprc.prcod AS 
>> sec_dprc_prcod 
>> FROM sec_dprc 
>> WHERE sec_dprc.gvkey = :gvkey_1 ORDER BY sec_dprc.datadate
>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=0.082s
>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=8.852s
>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=0.032s
>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=8.754s
>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=0.039s
>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=9.182s
>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=0.028s
>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=9.416s
>>         execute_explicit_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=0.080s
>>    execute_parameterized_query() len=10971, last=(datetime.datetime(2014, 
>> 5, 9, 0, 0), 37.96), runtime=8.425s
>> Total runtime for 5 explicit queries = 0.260s.
>> Total runtime for 5 parameterized queries = 44.629s.
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/sqlalchemy.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To post to this group, send email to [email protected]<javascript:>
> .
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] engine.execute() slow with parameterized queries?

Reply via email to