[sqlalchemy] Re: pandas.DataFrame.to_sql method: how to speed up exporting to Microsoft SQL Server (6 minutes for 11 MB!)

Sam Mourad Tue, 28 Apr 2015 14:44:04 -0700

Joe - I'm having the same issue. Dataframe.to_sql performance is fine with 
sqllite and mysql but SQLAlchemy engine it too slow. After a few days of 
research and trial and error, I was able to improve performance by using 
pymssql. However, the performance is still way below standards. 35000 rows 
of an account relation table (accountid - int, relatedaccountid - int) 
takes 2 minutes to finish.


After a long research and reading I found an article explaining that 
SQLAlchemy ORM being used by df.to_sql is VERY SLOW because it uses a Unit 
of Work pattern. SQLAlchemy CORE is supposed to be much faster. Check this 
link out:
http://docs.sqlalchemy.org/en/rel_0_8/faq.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow

My issue is still not solved because my goal is to insert a dataframe into 
a sql table. If I use SQLAlchemy CORE I'll have to iterate through 
dataframe rows and insert them into the SQL table which ends up being even 
slower than to_sql.

Did you find a solution? If so, please share.

Thanks,


On Wednesday, April 22, 2015 at 5:11:50 PM UTC-5, John Doe wrote:
>
>
>
> On Wednesday, April 22, 2015 at 7:19:08 PM UTC+1, Jonathan Vanasco wrote:
>>
>> [...]
>>
>> I'm guessing that this issue is with the driver.
>>
>> Here's a semi-related thread:
>> * 
>> http://stackoverflow.com/questions/5693885/pyodbc-very-slow-bulk-insert-speed
>>  
>> <http://www.google.com/url?q=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F5693885%2Fpyodbc-very-slow-bulk-insert-speed&sa=D&sntz=1&usg=AFQjCNFZ1HfXtEGFbFLWP2uhtIyql-mHZg>
>>  
>>   It looks like the pyodbc driver handles "executemany" in a not very ideal 
>> manner.
>>
>> There were also some threads that noted ODBC tracing being on, and others 
>> that noted most python drivers are just painfully slow. 
>>
>> In any event, if you're talking about a single insert statement.... that 
>> sounds like an executemany context and a driver issue.
>>
>
> Tracing is off.
> If I had to place a bet, my money would be on pyodbc having too slow a 
> network connection, for some reason that's totally beyond me.
> On my home PC I generated a dataframe of random numbers in Python, then 
> used the to_sql() method to transfer it to a SQL Server express running on 
> the same machine, and it was fast. This suggests that SQL server has no 
> issue with the data per se.
> When I ran the same code on my work PC, trying to export to a SQL Server 
> 2014 machine which is part of the same company network and only a few miles 
> away, it took ages.
>
> I'll try having Python installed on the SQL server, and running it from 
> there, to see if this theory is correct. 
>
> I also tried pymssql, but it took ages.
> Some stack overflow users had luck with adodb, but sqlalchemy no longer 
> supports it.
>
> I miss Matlab's database toolbox! Yes, it's expensive, and Matlab has tons 
> of flaws, but at least Matlab's documentation is excellent and doesn't 
> cause you to lose the will to live wasting hours trying to figure out how 
> to carry out a banal task like exporting a table...
>
> The bottom line is that pandas to_sql() methos is basically unusable if 
> you're trying to export more than 2 MBs of data
>
> Anyway, thanks a lot for your help, Jonathan and Michael.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

[sqlalchemy] Re: pandas.DataFrame.to_sql method: how to speed up exporting to Microsoft SQL Server (6 minutes for 11 MB!)

Reply via email to