[pylons-discuss] Saving a lot of data to Postgresql database in Pyramid using SQLAlchemy

Emerson Barea Fri, 20 Mar 2020 12:42:44 -0700

Hi there.

Some times my app needs to create and save almost a million records in a
Postgres database. In that case, I'm looking for the best way to do this,
because the procedures I've used so far are very slow.


I will present some ways that I tried that were very slow, as well the
workarounds that I tried to improve them and the errors that occurred:

1 - a for loop to generate the records and add them to the transaction
manager

import argparse
import getopt
import sys

from pyramid.paster import bootstrap, setup_logging
from sqlalchemy.exc import OperationalError


class Topology(object):
    def __init__(self, dbsession):
        self.dbsession = dbsession

    def autonomous_system(self):


        # SOME PROCESS THAT RETURNS array1 AND array2


        for i in array1:
            record = Table(field1=array1[i],
                           #other fields here
                          )
            self.dbsession.add(record)


def parse_args(config_file):
    parser = argparse.ArgumentParser()
    parser.add_argument(
        'config_uri',
        help='Configuration file, e.g., pyramid.ini',
    )
    return parser.parse_args(config_file.split())


def main(argv=sys.argv[1:]):
    try:
        opts, args = getopt.getopt(argv, 'h:', ["config-file="])
    except getopt.GetoptError:
        print('* Usage: topology --config-file=pyramid.ini')
        sys.exit(2)
    for opt, arg in opts:
        if opt == '-h':
            print('* Usage: topology --config-file=pyramid.ini')
            sys.exit()
        elif opt == '--config-file':
            config_file = arg

    args = parse_args(config_file)
    setup_logging(args.config_uri)
    env = bootstrap(args.config_uri)
    try:
        t = Topology(dbsession)
        with env['request'].tm:
            dbsession = env['request'].dbsession
            t.autonomous_system()
    except OperationalError:
        print('Database error')


In this case, I noticed that the "dbsession.add (prefix)" inside the loop
made the process very slow. I tried to create an array with the records in
the loop and add everything to the transaction manager just once outside
the loop using "bulk_save_objects" (
https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations),
but I don't understand what "s = Session ()" is.

s = Session()
objects = [
    models.AutonomousSystem(autonomous_system=2, id_topology=68),
    models.AutonomousSystem(autonomous_system=3, id_topology=68),
    models.AutonomousSystem(autonomous_system=4, id_topology=68),
    models.AutonomousSystem(autonomous_system=5, id_topology=68)
]
s.bulk_save_objects(objects)


2 - insert a pandas dataframe into the database

Another way that I tried, and that I think it has the best performance for
working with almost a million records, is to use panda dataframes. It is
very fast to generate all records using pandas dataframes, but I'm not
getting success to insert all data from the dataframe into Postgres
database.

I tried "to_sql" procedure (
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html),
but I don't know what is the "engine" in the "df.to_sql('users', con=engine)"
command.

Please, can someone help me with these questions?

Emerson

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com.

[pylons-discuss] Saving a lot of data to Postgresql database in Pyramid using SQLAlchemy

Reply via email to