Re: [pylons-discuss] Saving a lot of data to Postgresql database in Pyramid using SQLAlchemy

Emerson Barea Fri, 20 Mar 2020 18:43:09 -0700

Thank you Michael Merickel and Theron Luhn for the answers.

I'll try to follow bulk_insert_mappings procedure, but I don't know how to
pass database parameters (servername, username, password, database schema)
in a connection script like Theron posted. I know that is a silly doubt,
but can you help me again?


Thank you.

Em sex., 20 de mar. de 2020 às 17:02, Theron Luhn <[email protected]>
escreveu:

> For CLI scripts like this, I usually skip Pyramid entirely and instantiate
> the SQLAlchemy session manually.  Won’t help your performance woes, but it
> is less machinery to deal with.
>
> I second Michael’s bulk_insert_mappings, it's what I usually reach for in
> cases like this and the performance is good.  However, with Postgres you’ll
> need to enable batch mode, especially if the DB is over a network,
> otherwise performance will suffer greatly.
>
> So my scripts usually look like this:
>
> dburl = sys.argv[1]
> create_engine(dburl, executemany_mode='batch’)
> db = Session(bind=engine)
> db.bulk_insert_mappings(MyTable, get_data())
> db.commit()
>
> I wouldn’t use Pandas unless you’re already handling your data with it.
>
> On Mar 20, 2020, at 12:47 PM, Michael Merickel <[email protected]> wrote:
>
> dbsession.add on each row is pretty much worse case scenario. Start with
> https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html
>  which
> shows you how to use dbsession.bulk_insert_mappings and several other
> approaches. The only caveat that's Pyramid-related is that since the
> dbsession is controlled by the tm, if you drop down to some of the core
> apis you'll need to use zope.sqlalchemy.mark_changed(dbsession) at the end
> to have your data committed.
>
> Of course there's always the ability to use the raw psycopg2 connection as
> well which can get you closer to COPY-like performance - kinda depends on
> your needs, but I've had success just using bulk_insert_mappings on the ORM.
>
> - Michael
>
> On Mar 20, 2020, at 14:41, Emerson Barea <[email protected]> wrote:
>
> Hi there.
>
> Some times my app needs to create and save almost a million records in a
> Postgres database. In that case, I'm looking for the best way to do this,
> because the procedures I've used so far are very slow.
>
> I will present some ways that I tried that were very slow, as well the
> workarounds that I tried to improve them and the errors that occurred:
>
> 1 - a for loop to generate the records and add them to the transaction
> manager
>
> import argparse
> import getopt
> import sys
>
> from pyramid.paster import bootstrap, setup_logging
> from sqlalchemy.exc import OperationalError
>
>
> class Topology(object):
>     def __init__(self, dbsession):
>         self.dbsession = dbsession
>
>     def autonomous_system(self):
>
>
>         # SOME PROCESS THAT RETURNS array1 AND array2
>
>
>         for i in array1:
>             record = Table(field1=array1[i],
>                            #other fields here
>                           )
>             self.dbsession.add(record)
>
>
> def parse_args(config_file):
>     parser = argparse.ArgumentParser()
>     parser.add_argument(
>         'config_uri',
>         help='Configuration file, e.g., pyramid.ini',
>     )
>     return parser.parse_args(config_file.split())
>
>
> def main(argv=sys.argv[1:]):
>     try:
>         opts, args = getopt.getopt(argv, 'h:', ["config-file="])
>     except getopt.GetoptError:
>         print('* Usage: topology --config-file=pyramid.ini')
>         sys.exit(2)
>     for opt, arg in opts:
>         if opt == '-h':
>             print('* Usage: topology --config-file=pyramid.ini')
>             sys.exit()
>         elif opt == '--config-file':
>             config_file = arg
>
>     args = parse_args(config_file)
>     setup_logging(args.config_uri)
>     env = bootstrap(args.config_uri)
>     try:
>         t = Topology(dbsession)
>         with env['request'].tm:
>             dbsession = env['request'].dbsession
>             t.autonomous_system()
>     except OperationalError:
>         print('Database error')
>
>
> In this case, I noticed that the "dbsession.add (prefix)" inside the loop
> made the process very slow. I tried to create an array with the records in
> the loop and add everything to the transaction manager just once outside
> the loop using "bulk_save_objects" (
> https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations),
> but I don't understand what "s = Session ()" is.
>
> s = Session()
> objects = [
>     models.AutonomousSystem(autonomous_system=2, id_topology=68),
>     models.AutonomousSystem(autonomous_system=3, id_topology=68),
>     models.AutonomousSystem(autonomous_system=4, id_topology=68),
>     models.AutonomousSystem(autonomous_system=5, id_topology=68)
> ]
> s.bulk_save_objects(objects)
>
>
> 2 - insert a pandas dataframe into the database
>
> Another way that I tried, and that I think it has the best performance for
> working with almost a million records, is to use panda dataframes. It is
> very fast to generate all records using pandas dataframes, but I'm not
> getting success to insert all data from the dataframe into Postgres
> database.
>
> I tried "to_sql" procedure (
> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html),
> but I don't know what is the "engine" in the "df.to_sql('users', con=
> engine)" command.
>
> Please, can someone help me with these questions?
>
> Emerson
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com
> <https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pylons-discuss/519F02D3-7AFF-43AB-A3BE-36729EE63A25%40gmail.com
> <https://groups.google.com/d/msgid/pylons-discuss/519F02D3-7AFF-43AB-A3BE-36729EE63A25%40gmail.com?utm_medium=email&utm_source=footer>
> .
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pylons-discuss/9E7DF195-BC81-4039-8C6C-33BA70A576CD%40luhn.com
> <https://groups.google.com/d/msgid/pylons-discuss/9E7DF195-BC81-4039-8C6C-33BA70A576CD%40luhn.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdUi01m2SYQk6oFj%3DVZwgT7TUHBOmGnNY94a8eSbru4aNg%40mail.gmail.com.

Re: [pylons-discuss] Saving a lot of data to Postgresql database in Pyramid using SQLAlchemy

Reply via email to