Hi Paul,

I'm not expert on Postgres so, maybe I should remain in silence?
Anyway, I know Oracle well and I know that the physical organization of your data may be highly relevant to performance.

Maybe you could prepare all the data you need on a temporary table and in the end create another table from the temporary one. In my case, I load longitude/latitude and I create geography and geometry fields. In the end, I create another table, all indexes and finally I get rid of the temporary table. The big advantage of this process is that the table I'm creating in read-only, so, the better organized the data is, the better. Below you can see an example with some useful comments. I hope it helps somehow.



-- working with a temporaty table

CREATE TEMPORARY TABLE tmp
(
  location  character varying(100) NOT NULL,
  easting   integer NOT NULL,
  northing  integer NOT NULL,
  longitude double precision NOT NULL,
  latitude  double precision NOT NULL
)
WITH (
  OIDS=FALSE
);

COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ',' CSV HEADER;

ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;
UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || ' ' || northing || ')' , 97405 );
COMMIT;

ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);
UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude || ' ' || latitude || ')' );
COMMIT;


-- now create the table I really need

drop table location;
CREATE TABLE location AS SELECT location, easting, northing, longitude, latitude, geometry, geography FROM tmp;

-- actually, I would like to avoid this step below. I should really get rid of this:
ALTER TABLE location ADD COLUMN  id SERIAL PRIMARY KEY;
-- I could change the tool which generates the CSV file and I could generate the primary key right in place. -- Once this table "location" is read-only, I could simply generate the primary at load time and get rid of this step. -- I suppose it would be better with regards data organization (Not sure! I'm not an expert on Postgres).


VACUUM ANALYZE location;

CREATE UNIQUE INDEX location_akey ON location ( upper(location) );
CREATE INDEX location_geometry_akey  ON location USING gist(geometry);
CREATE INDEX location_geography_akey ON location USING gist(geography);


Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib


On 14/04/11 10:37, Mark Cave-Ayland wrote:
On 12/04/11 17:38, Paul & Caroline Lewis wrote:

Hi,
Thank you Mark and Richard for your replies. Having looked at this it
seems a Full Vacuum is the answer, however I'm not sure why. Processing
the SQL scripts as originall reported I do get a large table from
TestSet1 and a small table from TestSet2. Once a Full vacuum is
performed on the large table from TestSet1 its size drops to the same as
the small table from TestS2, however adding a full vacuum into the
TestSet1 procedure makes it slower to run than TestSet2, very much
slower especially on uploading the very large data sets (70 mill rows).
This begs the question is TestSet2 very efficient or is it missing
something fundamental that a Full Vacuum provides that I'm not realising
at the moment.

That's strange - do you see the same behaviour if you swap the order of the data load, i.e. do the ordered data set first, and/or use a different table name for each load? I'm just wondering if you're seeing some kind of database bloat if VACUUM fixes the issue.


ATB,

Mark.

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to