Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe
Thank you for the insights
Re: Feature request: pg_get_tabledef(text)
On Wed, Nov 22, 2023 at 5:09โฏPM Laurenz Albe wrote: > > One of the problems is what should be included. > Indexes? Policies? Constraints? > A high limit could be all objects except data. All the objects which would be deleted by a 'DROP TABLE'. Maybe including 'CASCADE'? No unsurmountable questions, but someone would have to come up with a > clear design and implement it. > I gave it a try. I'm not that skilled in plpgsql so there is probably room for improvement. https://github.com/chlordk/pg_get_tabledef For your convenience here is a copy/paste of the function. CREATE OR REPLACE FUNCTION pg_get_tabledef(TEXT) RETURNS TEXT LANGUAGE plpgsql AS $_$ -- pg_get_tabledef ( text ) โ text -- Reconstructs the underlying CREATE command for a table and objects related to a table. -- (This is a decompiled reconstruction, not the original text of the command.) DECLARE R TEXT; -- Return result R_c TEXT; -- Comments result, show after table definition rec RECORD; tmp_text TEXT; v_oid OID; -- Table object id v_schema TEXT; -- Schema v_table TEXT; -- Table name rxrelname TEXT; BEGIN rxrelname := '^(' || $1 || ')$'; -- Get oid and schema SELECT c.oid, n.nspname, c.relname INTO v_oid, v_schema, v_table FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE c.relname OPERATOR(pg_catalog.~) rxrelname COLLATE pg_catalog.default AND pg_catalog.pg_table_is_visible(c.oid); -- If table not found exit IF NOT FOUND THEN -- RAISE EXCEPTION 'Table % not found', $1; RETURN '-- Table not found: ''' || $1 || ; END IF; -- Table comment first, columns comment second, init variable R_c, SELECT obj_description(v_oid) INTO tmp_text; IF LENGTH(tmp_text) > 0 THEN R_c := 'COMMENT ON TABLE ' || v_schema || '."' || v_table || '" IS ''' || tmp_text || ''';' || E'\n'; ELSE R_c := ''; END IF; R := 'CREATE TABLE ' || v_schema || '."' || v_table || '" ('; -- Get columns FOR rec IN SELECT a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod), (SELECT pg_catalog.pg_get_expr(d.adbin, d.adrelid, true) FROM pg_catalog.pg_attrdef d WHERE d.adrelid = a.attrelid AND d.adnum = a.attnum AND a.atthasdef), a.attnotnull, (SELECT c.collname FROM pg_catalog.pg_collation c, pg_catalog.pg_type t WHERE c.oid = a.attcollation AND t.oid = a.atttypid AND a.attcollation <> t.typcollation) AS attcollation, a.attidentity, a.attgenerated, a.attnum FROM pg_catalog.pg_attribute a WHERE a.attrelid = v_oid AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum LOOP --RAISE NOTICE '% % %', rec.attnum, rec.attname, rec.format_type; IF rec.attnum > 1 THEN R := R || ','; -- no comma after last column definition END IF; R := R || E'\n' || '"' || rec.attname || '" ' || rec.format_type; IF rec.attnotnull THEN R := R || ' NOT NULL'; END IF; -- Comment on column SELECT col_description( v_oid, rec.attnum) INTO tmp_text; IF LENGTH(tmp_text) > 0 THEN R_c := R_c || 'COMMENT ON COLUMN ' || v_schema || '."' || v_table || '.' || rec.attname || '" IS ''' || tmp_text || ''';' || E'\n'; END IF; END LOOP; -- Columns -- Finalize table R := R || E'\n' || ');' || E'\n'; -- Add COMMENTs IF LENGTH(R_c) > 0 THEN R := R || R_c; END IF; -- Index FOR rec IN SELECT pg_catalog.pg_get_indexdef(i.indexrelid, 0, true) AS indexdef FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i LEFT JOIN pg_catalog.pg_constraint con ON (conrelid = i.indrelid AND conindid = i.indexrelid AND contype IN ('p','u','x')) WHERE c.oid = v_oid AND c.oid = i.indrelid AND i.indexrelid = c2.oid ORDER BY i.indisprimary DESC, c2.relname LOOP R := R || rec.indexdef || ';' || E'\n'; END LOOP; -- Index RETURN R; END; $_$; -- ๐ณ๐๐๐ ๐พ๐๐๐๐ โ โโ โ โ โโช โโช
Re: how to check if the license is expired.
> On Mar 31, 2024, at 09:59, Peter J. Holzer wrote: > Is this an acceptable performance penalty per API call? If not, is it > really necessary to check this on every call? Maybe it can be done just > once per session or once per hour. It's probably not required to check it every API call. Two places come to mind: _PG_init -- Although I don't know the possibility or wisdom of reading from a file there. shmem_startup_hook -- It's definitely OK to read from a file there. Remember that you have the full ability to crash PostgreSQL in an extension, so it really needs to be bulletproof. You don't want the shared library to fail to load if the license isn't valid. Instead: -- If the functionality is exposed as functions, return an error when one of those functions is used, if the extension is not licensed. -- If the functionality modifies PostgreSQL behavior, disable that modification. If you are validating the license via a network call... I would counsel against having a network call as part of PostgreSQL's startup process. It might work to make the call on-demand the first time the extension is used, and the result stored locally, although I would limit that to function calls rather than, say, a hook into the planner. You also need to decide exactly how you want to distribute this extension. Most PostgreSQL extensions are supplied as source and built against PostgreSQL.
Re: how to check if the license is expired.
On 2024-03-30 09:23:15 -0400, Ron Johnson wrote: > On Sat, Mar 30, 2024 at 9:15โฏAM ้ปๅฎ wrote: > > I want to develop a postgresql paid extension, then there is a local > license file, how do I check if the license file is expired, check it once > at each api execution, will that affect the performance of the api, is > there any other way? > > > What you're really asking is "how do I read a file from an extension?". > ย We often chide users for falling into the "XY problem"[1] trap, so think it's nice that ้ปๅฎ asks about the bigger picture. I can't help with the extension (never wrote one), but a few thoughts: Basically I see three ways to get at the license information: * from a file (as mentioned) * from a database table * over the network (e.g. from a license server) On my (not very fast) laptop I can open and read a small text file in about 25 ยตs. Selecting one row from a small database table takes about 100 ยตs, which is quite a bit slower but I tested that from an external process. A stored procedure would be faster than that and possibly even faster than the file access. A query over the network is unlikely to be faster. Plus of course you need to check the contents, which likely involves some cryptographic operation. Checking a 2048 bit RSA signature takes about 30 ยตs on my laptop, most other algorithms are slower (unless you go with a simple HMAC which wouldn't be secure). So realistically we are somewhere in the 50 to 200 ยตs range. Is this an acceptable performance penalty per API call? If not, is it really necessary to check this on every call? Maybe it can be done just once per session or once per hour. hp [1] You have problem X and think that Y is part of the solution. So you ask how to achieve Y. However, Z would be better than Y for solving X, but nobody can tell you because they don't know about X. -- _ | Peter J. Holzer| Story must make more sense than reality. |_|_) || | | | h...@hjp.at |-- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" signature.asc Description: PGP signature
Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe
On Sat, 30 Mar 2024, 10:04 Alexander Farber, wrote: > Thank you, Justin - > > On Sat, Mar 30, 2024 at 4:33โฏAM Justin Clift > wrote: > >> On 2024-03-30 05:53, Alexander Farber wrote: >> > I use the following postgresql.conf in my Dockerfile >> > ( the full version at https://stackoverflow.com/a/78243530/165071 ), >> > when loading a 28 GByte large europe-latest.osm.pbf >> >> Not specific conf file improvements, but for an initial data load >> have you done things like turning off fsync(), deferring index >> creating until after the data load finishes, and that kind of thing? >> > > I will try the following commands in my Dockerfile then > and later report back on any improvements: > > RUN set -eux && \ > pg_ctl init && \ > echo "shared_buffers = 1GB">> $PGDATA/postgresql.conf > && \ > echo "work_mem = 50MB" >> $PGDATA/postgresql.conf > && \ > echo "maintenance_work_mem = 10GB" >> $PGDATA/postgresql.conf > && \ > echo "autovacuum_work_mem = 2GB" >> $PGDATA/postgresql.conf > && \ > echo "wal_level = minimal" >> $PGDATA/postgresql.conf > && \ > echo "checkpoint_timeout = 60min" >> $PGDATA/postgresql.conf > && \ > echo "max_wal_size = 10GB" >> $PGDATA/postgresql.conf > && \ > echo "checkpoint_completion_target = 0.9" >> $PGDATA/postgresql.conf > && \ > echo "max_wal_senders = 0" >> $PGDATA/postgresql.conf > && \ > echo "random_page_cost = 1.0" >> $PGDATA/postgresql.conf > && \ > echo "password_encryption = scram-sha-256" >> $PGDATA/postgresql.conf > && \ > echo "fsync = off">> > $PGDATA/postgresql.conf && \ > pg_ctl start && \ > createuser --username=postgres $PGUSER && \ > createdb --username=postgres --encoding=UTF8 --owner=$PGUSER > $PGDATABASE && \ > psql --username=postgres $PGDATABASE --command="ALTER USER $PGUSER > WITH PASSWORD '$PGPASSWORD';" && \ > psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF > NOT EXISTS postgis;' && \ > psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF > NOT EXISTS hstore;' && \ > osm2pgsql --username=$PGUSER --database=$PGDATABASE --create > --cache=6 --hstore --latlong /data/map.osm.pbf && \ > rm -f /data/map.osm.pbf && \ > pg_ctl stop && \ > echo "fsync = on">> > $PGDATA/postgresql.conf && \ > echo '# TYPE DATABASE USER ADDRESS METHOD'> > $PGDATA/pg_hba.conf && \ > echo "local all postgres peer" >> > $PGDATA/pg_hba.conf && \ > echo "local $PGDATABASE $PGUSER scram-sha-256" >> > $PGDATA/pg_hba.conf && \ > echo "host $PGDATABASE $PGUSER 0.0.0.0/0 scram-sha-256" >> > $PGDATA/pg_hba.conf > > The later fsync = on will override the former, right? > > Best regards > Alex > > > 2hrs sounds reasonable for Europe, it's a big place in terms of osm data and osm2pgsql is doing processing to convert to geometry objects prior to doing anything on the Postgresql side. If you examine the --log--sql output for a small test country you can see what it does in terms of the postgresql. osm2pgsql gives options to trim the output to only what you need (so if you don't want waterways, traffic features, parking places or places of worship etc.. why load them) Hopefully you have found the excellent geofabrik https://download.geofabrik.de/ source for osm data. Rather than load this data afresh each update cycle you would be better off simply loading the changes so the .osc files or ... osm osmosis will create the equivalent of a diff file for you Looks like you are already using osm2psql's recommended postgresql.config settings, I'd be surprised if this was way off. Getting as close to tin rather than virtual machines and containers will also help, lots of io going on here. If you are only interested in the geography you might consider geofabrik's shapefile available for many countries, they have already done some of the work for you. Apologies if you are already a long way down this route & just asking about the final stage of loading the osm2pgsql output to Postgresql but however well you do here I would only expect small marginal gains.
Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe
On 2024-03-31 04:07, Alexander Farber wrote: Turning fsync = off has resulted in no noticable build time reduction for my Dockerfile with OSM Europe data, but still thank you for the suggestion! No worries. :) With this import you're doing, is it something that will be repeated a lot with the exact same data set, or is this a once off thing? If it's something that'll be repeated a lot (maybe part of some automated process?), then it might be worth making a backup / snapshot / something of the database after the import has completed. With a backup or snapshot in place (depends on the storage you're using), you could potentially load things from that backup / snapshot (etc) instead of having to do the import all over again each time. Regards and best wishes, Justin Clift