Hi all,

Just sharing some perf insights into the bulk operation function
bulk_insert_mappings.

I was recently debugging a SQL Alchemy powered web app that was crashing
due to out of memory issues on a small Kubernetes node. It turned out to be
"caused" by an over optimistic invocation of bulk_insert_mappings.
Basically I'm reading a CSV file with ~500,000 entries into a list of
dictionaries, and then passing it into the bulk_insert_mappings function at
once. It turned out the SQL Alchemy work was using 750mb of RAM, which was
enough to OOM the small node the web app was running on.

A simple workaround is to split the list of 500,000 entries into chunks of
1000 entries each, and then call bulk_insert_mappings on each chunk. When I
do this, the extra memory usage is not even noticeable. But also, it seems
that this chunked approach is actually faster! I might benchmark that to
quantify that.

Thought it was interesting. I wonder would it be worth adding to the docs
on bulk_insert_mappings? Given that function is motivated by performance,
it seems it might be relevant.

James

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sqlalchemy/CALDF6i1sx_QmW1v1jvopT-iWZqmSGmJ7JpJi10egNnbE0D7FMQ%40mail.gmail.com.

Reply via email to