Hello Jonathan,

i already executed the query without using pandas in my programm

query = "SELECT"
for row in conn.execute(query).fetchall():
    pass

the result was the same runtime with pandas.

So this cant be the problem. I think so.

Greetings Manuel

Jonathan Vanasco schrieb am Mittwoch, 8. Juni 2022 um 17:28:01 UTC+2:

> When you select in the database ui tool, you are just displaying raw data.
>
> When you select within your code snippets above, Python is creating 
> pandas' DataFrame objects for the results. 
>
> These two concepts are not comparable at all.  Converting the SQL data to 
> Python data structures in Pandas (and SQLAlchemy's ORM) is a lot of 
> overhead - and that grows with the result size.
>
> You can use memory and code profiling tools to explore this and see where 
> the issues are. The best approach is what Philip suggested above though, 
> and not use pandas, so you can see how Python/SqlAlchemy handles the raw 
> data.
>
>
>
>
>
> On Wednesday, June 8, 2022 at 9:28:38 AM UTC-4 Trainer Go wrote:
>
>> Hello Phil,
>>
>> i tested both and without printing the result.
>>
>> table_df = pd.read_sql_query(''SELECT, engine)
>> #print(table_df)
>> #query = "SELECT"
>> #for row in conn.execute(query).fetchall():
>> #    pass
>>
>>
>> both have nearly the same runtime. So this is not my problem. And yes, 
>> they are the same queries cause i copy pasted the select from my DBUI where 
>> is tested first the results and the runtime and i expected the same runtime 
>> in my program but no ;)
>>
>> Greeting Manuel
>>
>> Philip Semanchuk schrieb am Mittwoch, 8. Juni 2022 um 15:04:08 UTC+2:
>>
>>>
>>>
>>> > On Jun 8, 2022, at 8:29 AM, Trainer Go <[email protected]> wrote: 
>>> > 
>>> > When im using pandas with pd.read_sql_query() 
>>> > with chunksize to minimiza the memory usage there is no difference 
>>> between both runtimes.. 
>>>
>>> Do you know that, or is that speculation? 
>>>
>>> > 
>>> > table_df = pd.read_sql_query('''select , engine, chunksize = 30000) 
>>> > 
>>> > for df in table_df: 
>>> > print(df) 
>>> > 
>>> > the runtime is nearly the same like 5 minutes 
>>>
>>> Printing to the screen also takes time, and your terminal probably 
>>> buffers the results, which requires memory allocation. I’m not saying this 
>>> is your problem (it probably isn’t), but your test still involves pandas 
>>> and your terminal, both of which cloud the issue. You would benefit from 
>>> simplifying your tests. 
>>>
>>> Did you try this suggestion from my previous email? 
>>>
>>>
>>> > for row in conn.execute(my_query).fetchall(): 
>>> > pass 
>>>
>>> Also, are you 100% sure you’re executing the same query from SQLAlchemy 
>>> that you’re pasting into your DB UI? 
>>>
>>> Cheers 
>>> Philip 
>>>
>>>
>>>
>>> > 
>>> > 
>>> > 
>>> > #print(table_df) result: #generator object SQLDatabase._query_iterator 
>>> at 0x0DC69C30> 
>>> > I dont know if the query will be triggered by using print(table_df) 
>>> the result is generator object SQLDatabase._query_iterator at 0x0DC69C30> 
>>> > 
>>> > but the runtime is 6 seconds like in the DBUI im using. 
>>> > 
>>> > I have no clue what to do. 
>>> > 
>>> > Greetings Manuel 
>>> > 
>>> > Trainer Go schrieb am Mittwoch, 8. Juni 2022 um 09:27:04 UTC+2: 
>>> > thank you Philip, 
>>> > 
>>> > I will test it today. 
>>> > 
>>> > 
>>> > Greetings Manuel 
>>> > 
>>> > Philip Semanchuk schrieb am Dienstag, 7. Juni 2022 um 17:13:28 UTC+2: 
>>> > 
>>> > 
>>> > > On Jun 7, 2022, at 5:46 AM, Trainer Go <[email protected]> wrote: 
>>> > > 
>>> > > Hello guys, 
>>> > > 
>>> > > Im executing 2 queries in my python program with sqlalchemy using 
>>> the pyodbc driver. 
>>> > > The database is a Adaptive SQL Anywhere Version 7 32 Bit. 
>>> > > 
>>> > > When im executing the queries in a DB UI it takes 5-6 seconds for 
>>> both together and when im using the same queries in my python programm it 
>>> takes 5-6 minutes instead of 6 seconds. What im doing wrong? Im new at 
>>> this. 
>>> > 
>>> > To start, debug one query at a time, not two. 
>>> > 
>>> > Second, when you test a query in your DB UI, you’re probably already 
>>> connected to the database. Your Python program has to make the connection — 
>>> that’s an extra step, and it might be slow. If you step through the Python 
>>> program in the debugger, you can execute one statement at a time (the 
>>> connection and the query) to understand how long each step takes. That will 
>>> help to isolate the problem. 
>>> > 
>>> > Third, keep in mind that receiving results takes time too. If your DB 
>>> UI is written in C or some other language that allocates memory very 
>>> efficiently, it might be a lot faster than building a Pandas dataframe. 
>>> > 
>>> > You might want to eliminate Pandas entirely so you don’t have to 
>>> question whether or not that’s the source of your slowdown. You could do 
>>> this instead - 
>>> > 
>>> > for row in conn.execute(my_query).fetchall(): 
>>> > pass 
>>> > 
>>> > That will force your Python program to iterate over the result set 
>>> without being forced to allocate memory for all the results. 
>>> > 
>>> > Hope this helps 
>>> > Philip 
>>>
>>>

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/4fff08bd-a625-4094-8a02-ed3300887f75n%40googlegroups.com.

Reply via email to