Parameters go in the URL. create_engine(‘postgresql://user:pass@server/dbname' <postgresql://user:pass@server/dbname'>)
See https://docs.sqlalchemy.org/en/13/core/engines.html <https://docs.sqlalchemy.org/en/13/core/engines.html> > On Mar 20, 2020, at 6:41 PM, Emerson Barea <[email protected]> wrote: > > Thank you Michael Merickel and Theron Luhn for the answers. > > I'll try to follow bulk_insert_mappings procedure, but I don't know how to > pass database parameters (servername, username, password, database schema) in > a connection script like Theron posted. I know that is a silly doubt, but can > you help me again? > > Thank you. > > Em sex., 20 de mar. de 2020 às 17:02, Theron Luhn <[email protected] > <mailto:[email protected]>> escreveu: > For CLI scripts like this, I usually skip Pyramid entirely and instantiate > the SQLAlchemy session manually. Won’t help your performance woes, but it is > less machinery to deal with. > > I second Michael’s bulk_insert_mappings, it's what I usually reach for in > cases like this and the performance is good. However, with Postgres you’ll > need to enable batch mode, especially if the DB is over a network, otherwise > performance will suffer greatly. > > So my scripts usually look like this: > > dburl = sys.argv[1] > create_engine(dburl, executemany_mode='batch’) > db = Session(bind=engine) > db.bulk_insert_mappings(MyTable, get_data()) > db.commit() > > I wouldn’t use Pandas unless you’re already handling your data with it. > >> On Mar 20, 2020, at 12:47 PM, Michael Merickel <[email protected] >> <mailto:[email protected]>> wrote: >> >> dbsession.add on each row is pretty much worse case scenario. Start with >> https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html >> >> <https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html> >> which shows you how to use dbsession.bulk_insert_mappings and several other >> approaches. The only caveat that's Pyramid-related is that since the >> dbsession is controlled by the tm, if you drop down to some of the core apis >> you'll need to use zope.sqlalchemy.mark_changed(dbsession) at the end to >> have your data committed. >> >> Of course there's always the ability to use the raw psycopg2 connection as >> well which can get you closer to COPY-like performance - kinda depends on >> your needs, but I've had success just using bulk_insert_mappings on the ORM. >> >> - Michael >> >>> On Mar 20, 2020, at 14:41, Emerson Barea <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi there. >>> >>> Some times my app needs to create and save almost a million records in a >>> Postgres database. In that case, I'm looking for the best way to do this, >>> because the procedures I've used so far are very slow. >>> >>> I will present some ways that I tried that were very slow, as well the >>> workarounds that I tried to improve them and the errors that occurred: >>> >>> 1 - a for loop to generate the records and add them to the transaction >>> manager >>> >>> import argparse >>> import getopt >>> import sys >>> >>> from pyramid.paster import bootstrap, setup_logging >>> from sqlalchemy.exc import OperationalError >>> >>> >>> class Topology(object): >>> def __init__(self, dbsession): >>> self.dbsession = dbsession >>> >>> def autonomous_system(self): >>> >>> >>> # SOME PROCESS THAT RETURNS array1 AND array2 >>> >>> >>> for i in array1: >>> record = Table(field1=array1[i], >>> #other fields here >>> ) >>> self.dbsession.add(record) >>> >>> >>> def parse_args(config_file): >>> parser = argparse.ArgumentParser() >>> parser.add_argument( >>> 'config_uri', >>> help='Configuration file, e.g., pyramid.ini', >>> ) >>> return parser.parse_args(config_file.split()) >>> >>> >>> def main(argv=sys.argv[1:]): >>> try: >>> opts, args = getopt.getopt(argv, 'h:', ["config-file="]) >>> except getopt.GetoptError: >>> print('* Usage: topology --config-file=pyramid.ini') >>> sys.exit(2) >>> for opt, arg in opts: >>> if opt == '-h': >>> print('* Usage: topology --config-file=pyramid.ini') >>> sys.exit() >>> elif opt == '--config-file': >>> config_file = arg >>> >>> args = parse_args(config_file) >>> setup_logging(args.config_uri) >>> env = bootstrap(args.config_uri) >>> try: >>> t = Topology(dbsession) >>> with env['request'].tm: >>> dbsession = env['request'].dbsession >>> t.autonomous_system() >>> except OperationalError: >>> print('Database error') >>> >>> In this case, I noticed that the "dbsession.add (prefix)" inside the loop >>> made the process very slow. I tried to create an array with the records in >>> the loop and add everything to the transaction manager just once outside >>> the loop using "bulk_save_objects" >>> (https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations >>> >>> <https://docs.sqlalchemy.org/en/13/orm/persistence_techniques.html#bulk-operations>), >>> but I don't understand what "s = Session ()" is. >>> >>> s = Session() >>> objects = [ >>> models.AutonomousSystem(autonomous_system=2, id_topology=68), >>> models.AutonomousSystem(autonomous_system=3, id_topology=68), >>> models.AutonomousSystem(autonomous_system=4, id_topology=68), >>> models.AutonomousSystem(autonomous_system=5, id_topology=68) >>> ] >>> s.bulk_save_objects(objects) >>> >>> 2 - insert a pandas dataframe into the database >>> >>> Another way that I tried, and that I think it has the best performance for >>> working with almost a million records, is to use panda dataframes. It is >>> very fast to generate all records using pandas dataframes, but I'm not >>> getting success to insert all data from the dataframe into Postgres >>> database. >>> >>> I tried "to_sql" procedure >>> (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html >>> >>> <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html>), >>> but I don't know what is the "engine" in the "df.to_sql('users', >>> con=engine)" command. >>> >>> Please, can someone help me with these questions? >>> >>> Emerson >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "pylons-discuss" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected] >>> <mailto:[email protected]>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com >>> >>> <https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdXJSP_mvJm3Br-_oiGx3FZBR9fcKagQtfFnRXscKWQErg%40mail.gmail.com?utm_medium=email&utm_source=footer>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "pylons-discuss" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/pylons-discuss/519F02D3-7AFF-43AB-A3BE-36729EE63A25%40gmail.com >> >> <https://groups.google.com/d/msgid/pylons-discuss/519F02D3-7AFF-43AB-A3BE-36729EE63A25%40gmail.com?utm_medium=email&utm_source=footer>. > > > -- > You received this message because you are subscribed to the Google Groups > "pylons-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pylons-discuss/9E7DF195-BC81-4039-8C6C-33BA70A576CD%40luhn.com > > <https://groups.google.com/d/msgid/pylons-discuss/9E7DF195-BC81-4039-8C6C-33BA70A576CD%40luhn.com?utm_medium=email&utm_source=footer>. > > -- > You received this message because you are subscribed to the Google Groups > "pylons-discuss" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To view this discussion on the web visit > https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdUi01m2SYQk6oFj%3DVZwgT7TUHBOmGnNY94a8eSbru4aNg%40mail.gmail.com > > <https://groups.google.com/d/msgid/pylons-discuss/CAO1UhdUi01m2SYQk6oFj%3DVZwgT7TUHBOmGnNY94a8eSbru4aNg%40mail.gmail.com?utm_medium=email&utm_source=footer>. -- You received this message because you are subscribed to the Google Groups "pylons-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/2A1249B4-EF0E-47D7-8E96-048F3AA253B8%40luhn.com.
