Re: [postgis-users] Large point table

Kevin Neufeld Fri, 28 Sep 2007 09:58:43 -0700

Hi Regina,

My only point there was simply this...

If you have a table that is 10GB in physical size due primarily to ageometry column, and you add another geometry column to equal in size tothe first, you will double your table size. So the question is, what isfaster to perform a sequential or index scan through? A 10GB table or a20GB table.The same principle applies which data partitioning. I had a 10GB tablethat I was often querying against, but I was only ever interested ingeometries with a certain attribute (which accounted for about 1/2 ofthe table). Using table inheritance and partitioning, I split my tableinto two, most frequently used and least frequently used geometries, andcut my query times in half. It's takes less time to query a 5GB tablethan a 10GB table.


-- Kevin

-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7

Phone: (250) 383-3022
Email: [EMAIL PROTECTED]



Obe, Regina wrote:

Actually this is a side comment to the part the part below.

I have always stored the most commonly used transforms in my table as a
secondary column, but I was wondering how much it slows down a query if
you don't select the fields you don't need.  I imagine there must be
some slow down.  Otherwise column-oriented databases wouldn't be quite
as useful compared to row-oriented databases as they are -
http://www.databasecolumn.com/2007/09/stonebraker-comment-response.html
..
Has anyone done any benchmarks on using a view with index on transform
vs. index plus realized transform?


"If you want to squeeze a little bit more performance out of your query,
you can cache the transform as a new column, so it's alreadyprecomputed.... but I would not store this in the same table. If thegoal here is speed, then duplicating points in your table will doublethe table size, slowing your query time. Create a new table for this.The problem with this approach is that you now have duplicate data andyou may have to write triggers to deal with modifications done to youroriginal points. "
Thanks,
Regina
-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Kevin Neufeld
Sent: Friday, September 28, 2007 11:39 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Large point table

Unfortunately Shane, I totally disagree.  If one want to do transforming

or grouping in a grid, it's far, far easier to do this in right in
PostGIS.

Steve,
First, aggregating the points into a grid is a great approach. I'vebeen working with a table of ~ 170 million points and using thistechnique, I've speed up my search queries significantly by groupingthese points into a multipoints of about 150-200 points. As you'veprobably discovered, this can be done by creating a new table, andsimply grouping your points using the snaptogrid function.
CREATE TABLE new_pt_tbl AS
SELECT collect(my_pt) AS my_multi_pt FROM pt_tbl
GROUP BY ST_SnapToGrid(my_pt, <grid_size>);
Second, a view is the same as a query. A view in postgresql is just astored query. If you don't want to always type "ST_Transform...." inyour query, then using a view is the way to do this. PostGIS is prettygood in doing transforms on the fly, so using a view is a goodapproach. Keep in mind that you may have to create a functional indexon your multipoint geometry column if you want to use an index in
Albers.
CREATE INDEX new_pt_tbl_idx ON new_pt_tbl USING GIST ON(ST_Transform(my_multi_pt, <Albers SRID>));
If you want to squeeze a little bit more performance out of your query,you can cache the transform as a new column, so it's alreadyprecomputed.... but I would not store this in the same table. If thegoal here is speed, then duplicating points in your table will doublethe table size, slowing your query time. Create a new table for this.The problem with this approach is that you now have duplicate data andyou may have to write triggers to deal with modifications done to youroriginal points.
Hope this helps.
Kevin

-------------
Kevin Neufeld
Software Developer
Refractions Research Inc.
300-1207 Douglas St.
Victoria, B.C., V8W 2E7

Phone: (250) 383-3022
Email: [EMAIL PROTECTED]



Shane Spencer wrote:
stupid mouse..

As I was saying....

You would be better off pre-transforming them with ogr2ogr.. dump them
to a shapefile, write a quicky script to move the points around tomatch your grid (if that is what you are indeed doing) then insertthem into the database as the right projection.
It sounds like you will have a lot of duplicate points. I recommendyou only insert distinct points into a table, then make a referencetable with smaller data types and a primary key referencing the pointtable. Python could handle that kind of insert situation pretty
easily.
On 9/25/07, *Shane Spencer* <[EMAIL PROTECTED]<mailto:[EMAIL PROTECTED]>> wrote:
    you would be better off pre-transforming them with ogr2ogr.. dump
    them to a shapefile, write a q


    On 9/25/07, * Stephen Crawford* < [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]>> wrote:

        All,

        I have a table of about 30 million point observations.  The
        geometries are
        stored in lon/lat., with a gist index on the column.  For
        display and
        analysis I want to use an albers projection....mostly I'm
        aggregating the
        points to a grid.  Currently I just do the transform(geom) in
        my queries.
        Would it be better for me to perhaps create a view with the
        tranformation
        instead of in my query....or should I instead create another
        geometry column
        in the original table, with the tranformation and do another
        gist index on
        the new column?  What's the best approach?

        Thanks,
        Steve


        Stephen Crawford
        Center for Environmental Informatics
        The Pennsylvania State University

        _______________________________________________
        postgis-users mailing list
        [email protected]
        <mailto:[email protected]>
        http://postgis.refractions.net/mailman/listinfo/postgis-users
<http://postgis.refractions.net/mailman/listinfo/postgis-users>
------------------------------------------------------------------------
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] Large point table

Reply via email to