Re: [postgis-users] (no subject)

Richard Gomes Thu, 14 Apr 2011 03:01:33 -0700

Hi Paul,

I'm not expert on Postgres so, maybe I should remain in silence?

Anyway, I know Oracle well and I know that the physical organization ofyour data may be highly relevant to performance.

Maybe you could prepare all the data you need on a temporary table andin the end create another table from the temporary one. In my case, Iload longitude/latitude and I create geography and geometry fields. Inthe end, I create another table, all indexes and finally I get rid ofthe temporary table. The big advantage of this process is that the tableI'm creating in read-only, so, the better organized the data is, thebetter. Below you can see an example with some useful comments. I hopeit helps somehow.




-- working with a temporaty table

CREATE TEMPORARY TABLE tmp
(
  location  character varying(100) NOT NULL,
  easting   integer NOT NULL,
  northing  integer NOT NULL,
  longitude double precision NOT NULL,
  latitude  double precision NOT NULL
)
WITH (
  OIDS=FALSE
);

COPY tmp FROM '/home/rgomes/tmp/gis/gaz50k_gb.csv' WITH DELIMITER ','CSV HEADER;


ALTER TABLE tmp ADD COLUMN geometry GEOMETRY;

UPDATE tmp SET geometry = ST_GeometryFromText( 'POINT(' || easting || '' || northing || ')' , 97405 );

COMMIT;

ALTER TABLE tmp ADD COLUMN geography GEOGRAPHY(POINT,4326);

UPDATE tmp SET geography = ST_GeographyFromText( 'POINT(' || longitude|| ' ' || latitude || ')' );

COMMIT;


-- now create the table I really need

drop table location;

CREATE TABLE location AS SELECT location, easting, northing, longitude,latitude, geometry, geography FROM tmp;

-- actually, I would like to avoid this step below. I should really getrid of this:

ALTER TABLE location ADD COLUMN  id SERIAL PRIMARY KEY;

-- I could change the tool which generates the CSV file and I couldgenerate the primary key right in place.-- Once this table "location" is read-only, I could simply generate theprimary at load time and get rid of this step.-- I suppose it would be better with regards data organization (Notsure! I'm not an expert on Postgres).



VACUUM ANALYZE location;

CREATE UNIQUE INDEX location_akey ON location ( upper(location) );
CREATE INDEX location_geometry_akey  ON location USING gist(geometry);
CREATE INDEX location_geography_akey ON location USING gist(geography);


Richard Gomes
http://www.jquantlib.org/index.php/User:RichardGomes
twitter: frgomes

JQuantLib is a library for Quantitative Finance written in Java.
http://www.jquantlib.com/
twitter: jquantlib


On 14/04/11 10:37, Mark Cave-Ayland wrote:

On 12/04/11 17:38, Paul & Caroline Lewis wrote:

Hi,
Thank you Mark and Richard for your replies. Having looked at this it
seems a Full Vacuum is the answer, however I'm not sure why. Processing
the SQL scripts as originall reported I do get a large table from
TestSet1 and a small table from TestSet2. Once a Full vacuum is
performed on the large table from TestSet1 its size drops to the same as
the small table from TestS2, however adding a full vacuum into the
TestSet1 procedure makes it slower to run than TestSet2, very much
slower especially on uploading the very large data sets (70 mill rows).
This begs the question is TestSet2 very efficient or is it missing
something fundamental that a Full Vacuum provides that I'm not realising
at the moment.

That's strange - do you see the same behaviour if you swap the orderof the data load, i.e. do the ordered data set first, and/or use adifferent table name for each load? I'm just wondering if you'reseeing some kind of database bloat if VACUUM fixes the issue.



ATB,

Mark.

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] (no subject)

Reply via email to