When I need to load and transform a large dataset before it lands in its final tables/rows, I generally use unlogged tables for the intermediate steps. This saves a lot of disk writing (especially so if running PostgreSQL on a storage layer doing its own copy-on-write). The downside is the unlogged tables won't be saved if for whatever reason PostgreSQL unexpectedly stops, but that generally doesn't matter in this case because you can just restart the load.
On Thu, Nov 1, 2018, 11:54 Paul Ramsey <[email protected]> wrote: > Also, if you created an index during your import process, that'll also > make your update slower. > I usually skip bulk updates in favour of writing a fresh table with the > stuff I want.. > > CREATE TABLE mynewtable AS SELECT ST_SetSRID(geom, 4326) AS geom, ... FROM > myoldtable > > That generally writes as fast as is possible though still only single > threaded. > > P > > > On Thu, Nov 1, 2018 at 9:51 AM Darafei "Komяpa" Praliaskouski < > [email protected]> wrote: > >> Hi, >> >> Please feed in your WKT polygons in EWKT format: >> >> SRID=4326;POLYGON(...) >> >> Since every update in Postgres is essentially Delete+Insert, time of >> rewriting each and every row being equal to initial Insert time is expected >> thing. >> >> You can also update SRID in two columns in one go: >> >> update tablename set geom1 = ST_SetSRID(geom1, 4326), geom2 = >> ST_SetSRID(geom2, 4326); >> >> Out of curiosity, where did you learn about UpdateGeometrySRID before >> learning about ST_SetSRID? >> >> ср, 31 окт. 2018 г. в 21:57, jerry73204 <[email protected]>: >> >>> Hi all, >>> >>> I'm stuck in the low performance of UpdateGeometrySRID(). >>> >>> I get started with a 50GB polygon dataset in CSV in EPSG:4326 >>> coordinates. Since I find no way to `\copy` the csv while preserving the >>> SRID, the data is imported with null SRID and then `SELECT >>> UpdateGeometrySRID('table', 'column', 4326)`. >>> >>> The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which >>> turns out to be approx two hours. The dataset has two geometry columns >>> and thus I have to take triple time to finish this data. >>> >>> I profiled the postgresql daemon. The avg disk writing speed is 30MB/s, >>> while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is >>> capable of up to 150MB/s. The daemon does not utilize the 4-core >>> i5-7600k CPU. It seems to be a single process task with avg CPU load >>> 20%, while other workers are idle. I wonder if there's a room for >>> improving the performance. Also, I'm looking for if it's possible to >>> preserve SRID with `\copy`. >>> >>> Jerry Lin >>> >>> _______________________________________________ >>> postgis-users mailing list >>> [email protected] >>> https://lists.osgeo.org/mailman/listinfo/postgis-users >> >> -- >> Darafei Praliaskouski >> Support me: http://patreon.com/komzpa >> _______________________________________________ >> postgis-users mailing list >> [email protected] >> https://lists.osgeo.org/mailman/listinfo/postgis-users > > _______________________________________________ > postgis-users mailing list > [email protected] > https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________ postgis-users mailing list [email protected] https://lists.osgeo.org/mailman/listinfo/postgis-users
