Joe - I'm having the same issue. Dataframe.to_sql performance is fine with sqllite and mysql but SQLAlchemy engine it too slow. After a few days of research and trial and error, I was able to improve performance by using pymssql. However, the performance is still way below standards. 35000 rows of an account relation table (accountid - int, relatedaccountid - int) takes 2 minutes to finish.
After a long research and reading I found an article explaining that SQLAlchemy ORM being used by df.to_sql is VERY SLOW because it uses a Unit of Work pattern. SQLAlchemy CORE is supposed to be much faster. Check this link out: http://docs.sqlalchemy.org/en/rel_0_8/faq.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow My issue is still not solved because my goal is to insert a dataframe into a sql table. If I use SQLAlchemy CORE I'll have to iterate through dataframe rows and insert them into the SQL table which ends up being even slower than to_sql. Did you find a solution? If so, please share. Thanks, On Wednesday, April 22, 2015 at 5:11:50 PM UTC-5, John Doe wrote: > > > > On Wednesday, April 22, 2015 at 7:19:08 PM UTC+1, Jonathan Vanasco wrote: >> >> [...] >> >> I'm guessing that this issue is with the driver. >> >> Here's a semi-related thread: >> * >> http://stackoverflow.com/questions/5693885/pyodbc-very-slow-bulk-insert-speed >> >> <http://www.google.com/url?q=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F5693885%2Fpyodbc-very-slow-bulk-insert-speed&sa=D&sntz=1&usg=AFQjCNFZ1HfXtEGFbFLWP2uhtIyql-mHZg> >> >> It looks like the pyodbc driver handles "executemany" in a not very ideal >> manner. >> >> There were also some threads that noted ODBC tracing being on, and others >> that noted most python drivers are just painfully slow. >> >> In any event, if you're talking about a single insert statement.... that >> sounds like an executemany context and a driver issue. >> > > Tracing is off. > If I had to place a bet, my money would be on pyodbc having too slow a > network connection, for some reason that's totally beyond me. > On my home PC I generated a dataframe of random numbers in Python, then > used the to_sql() method to transfer it to a SQL Server express running on > the same machine, and it was fast. This suggests that SQL server has no > issue with the data per se. > When I ran the same code on my work PC, trying to export to a SQL Server > 2014 machine which is part of the same company network and only a few miles > away, it took ages. > > I'll try having Python installed on the SQL server, and running it from > there, to see if this theory is correct. > > I also tried pymssql, but it took ages. > Some stack overflow users had luck with adodb, but sqlalchemy no longer > supports it. > > I miss Matlab's database toolbox! Yes, it's expensive, and Matlab has tons > of flaws, but at least Matlab's documentation is excellent and doesn't > cause you to lose the will to live wasting hours trying to figure out how > to carry out a banal task like exporting a table... > > The bottom line is that pandas to_sql() methos is basically unusable if > you're trying to export more than 2 MBs of data > > Anyway, thanks a lot for your help, Jonathan and Michael. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.
