dbsession.add on each row is pretty much worse case scenario. Start with https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html <https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html> which shows you how to use dbsession.bulk_insert_mappings and several other approaches. The only caveat that's Pyramid-related is that since the dbsession is controlled by the tm, if you drop down to some of the core apis you'll need to use zope.sqlalchemy.mark_changed(dbsession) at the end to have your data committed.
Of course there's always the ability to use the raw psycopg2 connection as well which can get you closer to COPY-like performance - kinda depends on your needs, but I've had success just using bulk_insert_mappings on the ORM. - Michael > On Mar 20, 2020, at 14:41, Emerson Barea <[email protected]> wrote: > > Hi there. > > Some times my app needs to create and save almost a million records in a > Postgres database. In that case, I'm looking for the best way to do this, > because the procedures I've used so far are very slow. > > I will present some ways that I tried that were very slow, as well the > workarounds that I tried to improve them and the errors that occurred: > > 1 - a for loop to generate the records and add them to the transaction manager > > import argparse > import getopt > import sys > > from pyramid.paster import bootstrap, setup_logging > from sqlalchemy.exc import OperationalError > > > class Topology(object): > def __init__(self, dbsession): > self.dbsession = dbsession > > def autonomous_system(self): > > > # SOME PROCESS THAT RETURNS array1 AND array2 > > > for i in array1: > record = Table(field1=array1[i], > #other fields here > ) > self.dbsession.add(record) > > > def parse_args(config_file): > parser = argparse.ArgumentParser() > parser.add_argument( > 'config_uri', > help='Configuration file, e.g., pyramid.ini', > ) > return parser.parse_args(config_file.split()) > > > def main(argv=sys.argv[1:]): > try: > opts, args = getopt.getopt(argv, 'h:', ["config-file="]) > except getopt.GetoptError: > print('* Usage: topology --config-file=pyramid.ini') > sys.exit(2) > for opt, arg in opts: > if opt == '-h': > print('* Usage: topology --config-file=pyramid.ini') > sys.exit() > elif opt == '--config-file': > config_file = arg > > args = parse_args(config_file) > setup_logging(args.config_uri) > env = bootstrap(args.config_uri) > try: > t = Topology(dbsession) > with env['request'].tm: > dbsession = env['request'].dbsession > t.autonomous_system() > except OperationalError: > print('Database error') > > In this case, I noticed that the "dbsession.add (prefix)" inside the loop > made the process very slow. I tried to create an array with the records in > the loop and add everything to the transaction manager just once outside the > loop using "bulk_save_objects" > (https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations > > <https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations>), > but I don't understand what "s = Session ()" is. > > s = Session() > objects = [ > models.AutonomousSystem(autonomous_system=2, id_topology=68), > models.AutonomousSystem(autonomous_system=3, id_topology=68), > models.AutonomousSystem(autonomous_system=4, id_topology=68), > models.AutonomousSystem(autonomous_system=5, id_topology=68) > ] > s.bulk_save_objects(objects) > > 2 - insert a pandas dataframe into the database > > Another way that I tried, and that I think it has the best performance for > working with almost a million records, is to use panda dataframes. It is very > fast to generate all records using pandas dataframes, but I'm not getting > success to insert all data from the dataframe into Postgres database. > > I tried "to_sql" procedure > (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html > > <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html>), > but I don't know what is the "engine" in the "df.to_sql('users', > con=engine)" command. > > Please, can someone help me with these questions? > > Emerson > > > > -- > You received this message because you are subscribed to the Google Groups > "pylons-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com > > <https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/519F02D3-7AFF-43AB-A3BE-36729EE63A25%40gmail.com.
