Hi all, Just sharing some perf insights into the bulk operation function bulk_insert_mappings.
I was recently debugging a SQL Alchemy powered web app that was crashing due to out of memory issues on a small Kubernetes node. It turned out to be "caused" by an over optimistic invocation of bulk_insert_mappings. Basically I'm reading a CSV file with ~500,000 entries into a list of dictionaries, and then passing it into the bulk_insert_mappings function at once. It turned out the SQL Alchemy work was using 750mb of RAM, which was enough to OOM the small node the web app was running on. A simple workaround is to split the list of 500,000 entries into chunks of 1000 entries each, and then call bulk_insert_mappings on each chunk. When I do this, the extra memory usage is not even noticeable. But also, it seems that this chunked approach is actually faster! I might benchmark that to quantify that. Thought it was interesting. I wonder would it be worth adding to the docs on bulk_insert_mappings? Given that function is motivated by performance, it seems it might be relevant. James -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com.
