Re: [postgis-users] ST_DWithin Performance

Carl S. Yestrau Jr. Mon, 14 Feb 2011 09:39:09 -0800

Hi Brent,
This is a little out there, but what about swapping out the created_on
timestamp type with a floating point type using epoch time (ie.,
extract(epoch FROM now())) and creating a gist index on both the location
(geography type) and created_on (now floating point) columns?
Carl


On Sat, Feb 12, 2011 at 6:24 PM, <[email protected]> wrote:

> Hi Carl,
>
> If most of your data is reasonably static (which time series data often
> isn't) we have used this approach successfully with some 250,000,000 records
> in a table:
>
> use Postgres inheritance facility to partition the table in chunks by time,
> eg: one year's data per partition. If your request is for data from one
> year, it can pretty much ignore all the other partitions when searching for
> data, which speeds things up.
>
> Apply a Postgres clustered index to the timestamp column for each
> partition.This physically orders the data on disk by the column values, and
> given timeseries databases are often queried by time, this can significantly
> improve performance, as each disk read will normally grab sequential data
> needed by the query, reducing disk reads & seek times required. A clustered
> index does not need to be applied to the current year's partition unless
> really useful, as maintaining it requires the index to be rebuilt after
> every insert or update.
>
> An order by/limit query can be significantly faster with data managed in
> this way.
>
> HTH,
>
>   Brent Wood
>
> --- On *Sun, 2/13/11, Paul Ramsey <[email protected]>* wrote:
>
>
> From: Paul Ramsey <[email protected]>
> Subject: Re: [postgis-users] ST_DWithin Performance
> To: "PostGIS Users Discussion" <[email protected]>
> Date: Sunday, February 13, 2011, 11:42 AM
>
>
> How many records get returned when you drop the LIMIT? If you have to
> sort 700K things into order, I can see that taking a second or two.
>
> P.
>
> On Sat, Feb 12, 2011 at 2:16 PM, Carl S. Yestrau Jr.
> <[email protected] <http://mc/[email protected]>>
> wrote:
> > When I removed the "ORDER BY created_on DESC" the results were almost
> > instantaneous (which makes sense as it just returns the first 'x'
> > result defined by LIMIT). I don't have access to the machine at the
> > moment, but can get the plan for that if it's helpful.
> >
> > I'm pretty confused why things are so slow. The only thing I can think
> > of is possibly the data density, I have over 700K records within a 20
> > minute window.
> >
> > On Sat, Feb 12, 2011 at 12:27 AM, strk 
> > <[email protected]<http://mc/[email protected]>>
> wrote:
> >> On Fri, Feb 11, 2011 at 02:27:16PM -0800, Carl S. Yestrau Jr. wrote:
> >>
> >>> Table schema:
> >>>                                        Table "public.geobits"
> >>>     Column     |            Type             |
> >>> Modifiers
> >>>
> ---------------+-----------------------------+------------------------------------------------------
> >>>  id            | integer                     | not null default
> >>> nextval('geobits_id_seq'::regclass)
> >>>  uuid          | uuid                        | not null
> >>>  raw           | text                        | not null
> >>>  search_config | regconfig                   |
> >>>  ip            | inet                        | not null
> >>>  user_agent    | text                        |
> >>>  created_on    | timestamp without time zone | default now()
> >>>  location      | geography(Point,4326)       | not null
> >>>  parent_id     | integer                     |
> >>>  language_code | text                        |
> >>>  group_id      | integer                     |
> >>> Indexes:
> >>>     "geobits_pkey" PRIMARY KEY, btree (id)
> >>>     "geobits_uuid_key" UNIQUE, btree (uuid)
> >>>     "geobits_created_on_index" btree (created_on)
> >>>     "geobits_group_id_index" btree (group_id)
> >>>     "geobits_location_index" gist (location)
> >>>     "geobits_parent_id_index" btree (parent_id)
> >>>     "geobits_search_config_index" gin (to_tsvector(search_config, raw))
> >>> Foreign-key constraints:
> >>>     "geobits_group_id_fkey" FOREIGN KEY (group_id) REFERENCES
> groups(id)
> >>>     "geobits_language_code_fkey" FOREIGN KEY (language_code)
> >>> REFERENCES languages(code) ON DELETE SET NULL
> >>>     "geobits_parent_id_fkey" FOREIGN KEY (parent_id) REFERENCES
> >>> geobits(id) ON DELETE SET NULL
> >>> Referenced by:
> >>>     TABLE "geobits" CONSTRAINT "geobits_parent_id_fkey" FOREIGN KEY
> >>> (parent_id) REFERENCES geobits(id) ON DELETE SET NULL
> >>>
> >>>
> >>> Slow Query:
> >>> EXPLAIN ANALYZE SELECT id, created_on FROM geobits WHERE
> >>> ST_DWithin(location, ST_GeographyFromText('SRID=4326;POINT(180 90)'),
> >>> 1000) ORDER BY created_on DESC LIMIT 10;
> >>>
> >>>
> >>>                         QUERY PLAN
> >>>
> >>>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>
> --------------------------------------------------------------------------
> >>>  Limit  (cost=205352.10..205352.10 rows=1 width=12) (actual
> >>> time=4078.057..4078.090 rows=10 loops=1)
> >>>    ->  Sort  (cost=205352.10..205352.10 rows=1 width=12) (actual
> >>> time=4078.053..4078.064 rows=10 loops=1)
> >>>          Sort Key: created_on
> >>>          Sort Method:  top-N heapsort  Memory: 17kB
> >>>          ->  Seq Scan on geobits  (cost=0.00..205352.08 rows=1
> >>> width=12) (actual time=0.015..3100.471 rows=708661 loops=1)
> >>>                Filter: ((location &&
> >>> '0101000020E610000000000000008066400000000000805640'::geography) AND
> >>> ('0101000020E610000000000000008066400000000000805640'::geography &&
> >>> _st_expand(location, 1000::double precision)) AND
> >>> _st_dwithin(location, '0101000020E6100000000
> >>> 00000008066400000000000805640'::geography, 1000::double precision,
> true))
> >>>  Total runtime: 4078.127 ms
> >>> (7 rows)
> >>
> >> The "geobits_location_index" gist (location) index is not being used,
> >> for some reason. Why do you say ORDER BY is the culprit ? Does
> >> the plan come out differently w/out that ?
> >>
> >> --strk;
> >>
> >>  ()   Free GIS & Flash consultant/developer
> >>  /\   http://strk.keybit.net/services.html
> >> _______________________________________________
> >> postgis-users mailing list
> >> [email protected]<http://mc/[email protected]>
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>
> > _______________________________________________
> > postgis-users mailing list
> > [email protected]<http://mc/[email protected]>
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
> >
> _______________________________________________
> postgis-users mailing list
> [email protected]<http://mc/[email protected]>
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> [email protected]
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>

_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Re: [postgis-users] ST_DWithin Performance

Reply via email to