Hello Jonathan,
i already executed the query without using pandas in my programm
query = "SELECT"
for row in conn.execute(query).fetchall():
pass
the result was the same runtime with pandas.
So this cant be the problem. I think so.
Greetings Manuel
Jonathan Vanasco schrieb am Mittwoch, 8. Juni 2022 um 17:28:01 UTC+2:
> When you select in the database ui tool, you are just displaying raw data.
>
> When you select within your code snippets above, Python is creating
> pandas' DataFrame objects for the results.
>
> These two concepts are not comparable at all. Converting the SQL data to
> Python data structures in Pandas (and SQLAlchemy's ORM) is a lot of
> overhead - and that grows with the result size.
>
> You can use memory and code profiling tools to explore this and see where
> the issues are. The best approach is what Philip suggested above though,
> and not use pandas, so you can see how Python/SqlAlchemy handles the raw
> data.
>
>
>
>
>
> On Wednesday, June 8, 2022 at 9:28:38 AM UTC-4 Trainer Go wrote:
>
>> Hello Phil,
>>
>> i tested both and without printing the result.
>>
>> table_df = pd.read_sql_query(''SELECT, engine)
>> #print(table_df)
>> #query = "SELECT"
>> #for row in conn.execute(query).fetchall():
>> # pass
>>
>>
>> both have nearly the same runtime. So this is not my problem. And yes,
>> they are the same queries cause i copy pasted the select from my DBUI where
>> is tested first the results and the runtime and i expected the same runtime
>> in my program but no ;)
>>
>> Greeting Manuel
>>
>> Philip Semanchuk schrieb am Mittwoch, 8. Juni 2022 um 15:04:08 UTC+2:
>>
>>>
>>>
>>> > On Jun 8, 2022, at 8:29 AM, Trainer Go <[email protected]> wrote:
>>> >
>>> > When im using pandas with pd.read_sql_query()
>>> > with chunksize to minimiza the memory usage there is no difference
>>> between both runtimes..
>>>
>>> Do you know that, or is that speculation?
>>>
>>> >
>>> > table_df = pd.read_sql_query('''select , engine, chunksize = 30000)
>>> >
>>> > for df in table_df:
>>> > print(df)
>>> >
>>> > the runtime is nearly the same like 5 minutes
>>>
>>> Printing to the screen also takes time, and your terminal probably
>>> buffers the results, which requires memory allocation. I’m not saying this
>>> is your problem (it probably isn’t), but your test still involves pandas
>>> and your terminal, both of which cloud the issue. You would benefit from
>>> simplifying your tests.
>>>
>>> Did you try this suggestion from my previous email?
>>>
>>>
>>> > for row in conn.execute(my_query).fetchall():
>>> > pass
>>>
>>> Also, are you 100% sure you’re executing the same query from SQLAlchemy
>>> that you’re pasting into your DB UI?
>>>
>>> Cheers
>>> Philip
>>>
>>>
>>>
>>> >
>>> >
>>> >
>>> > #print(table_df) result: #generator object SQLDatabase._query_iterator
>>> at 0x0DC69C30>
>>> > I dont know if the query will be triggered by using print(table_df)
>>> the result is generator object SQLDatabase._query_iterator at 0x0DC69C30>
>>> >
>>> > but the runtime is 6 seconds like in the DBUI im using.
>>> >
>>> > I have no clue what to do.
>>> >
>>> > Greetings Manuel
>>> >
>>> > Trainer Go schrieb am Mittwoch, 8. Juni 2022 um 09:27:04 UTC+2:
>>> > thank you Philip,
>>> >
>>> > I will test it today.
>>> >
>>> >
>>> > Greetings Manuel
>>> >
>>> > Philip Semanchuk schrieb am Dienstag, 7. Juni 2022 um 17:13:28 UTC+2:
>>> >
>>> >
>>> > > On Jun 7, 2022, at 5:46 AM, Trainer Go <[email protected]> wrote:
>>> > >
>>> > > Hello guys,
>>> > >
>>> > > Im executing 2 queries in my python program with sqlalchemy using
>>> the pyodbc driver.
>>> > > The database is a Adaptive SQL Anywhere Version 7 32 Bit.
>>> > >
>>> > > When im executing the queries in a DB UI it takes 5-6 seconds for
>>> both together and when im using the same queries in my python programm it
>>> takes 5-6 minutes instead of 6 seconds. What im doing wrong? Im new at
>>> this.
>>> >
>>> > To start, debug one query at a time, not two.
>>> >
>>> > Second, when you test a query in your DB UI, you’re probably already
>>> connected to the database. Your Python program has to make the connection —
>>> that’s an extra step, and it might be slow. If you step through the Python
>>> program in the debugger, you can execute one statement at a time (the
>>> connection and the query) to understand how long each step takes. That will
>>> help to isolate the problem.
>>> >
>>> > Third, keep in mind that receiving results takes time too. If your DB
>>> UI is written in C or some other language that allocates memory very
>>> efficiently, it might be a lot faster than building a Pandas dataframe.
>>> >
>>> > You might want to eliminate Pandas entirely so you don’t have to
>>> question whether or not that’s the source of your slowdown. You could do
>>> this instead -
>>> >
>>> > for row in conn.execute(my_query).fetchall():
>>> > pass
>>> >
>>> > That will force your Python program to iterate over the result set
>>> without being forced to allocate memory for all the results.
>>> >
>>> > Hope this helps
>>> > Philip
>>>
>>>
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
http://www.sqlalchemy.org/
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable
Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sqlalchemy/4fff08bd-a625-4094-8a02-ed3300887f75n%40googlegroups.com.