Re: [PERFORM] trouble with a join on OS X

2007-02-05 Thread Shane Ambler

Kirk Wythers wrote:

The 4G (32bit) limit may be where you hit the out of memory errors (or 
is postgres get around that with it's caching?).


Any idea if postgres on OS X can truely access more that 4 gigs if the 
64 bit version is built? I have tried building the 64 bit version of 
some other apps on OS X, and I have never been convinced that they 
behaved as true 64 bit.




I haven't tried myself



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] optimizing a geo_distance() proximity query

2007-02-05 Thread Mark Stosberg
Bruno Wolff III wrote:
 On Sat, Feb 03, 2007 at 14:00:26 -0500,
   Mark Stosberg [EMAIL PROTECTED] wrote:
 I'm using geo_distance() from contrib/earthdistance would like to find a
 way to spend up the geo distance calculation if possible. This is for a
 proximity search: Show me adoptable pets within 250 miles of this
 zipcode.
 
 If you are using the cube based part of the earth distance package,
 then you can use gist indexes to speed those searches up. 

Thanks for the tip. Any idea what kind of improvement I can expect to
see, compared to using geo_distance()?

 There are functions for creating boxes that include all of the points some 
 distance
 from a fixed point. This is lossy, so you need to recheck if you don't
 want some points a bit farther away returned. Also you would need to
 pick a point to be where the zip code is located, rather than using area
 based zip codes. 

This is also interesting. Is this approach practical if I want to index
what's near each of about 40,000 US zipcodes, or the approach mostly
useful if you there are just a small number of fixed points to address?

I'm going to start installing the cube() and earth_distance() functions
today and see where I can get with the approach.

  Mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] optimizing a geo_distance() proximity query

2007-02-05 Thread Merlin Moncure

On 2/5/07, Mark Stosberg [EMAIL PROTECTED] wrote:

Bruno Wolff III wrote:
 On Sat, Feb 03, 2007 at 14:00:26 -0500,
   Mark Stosberg [EMAIL PROTECTED] wrote:
 I'm using geo_distance() from contrib/earthdistance would like to find a
 way to spend up the geo distance calculation if possible. This is for a
 proximity search: Show me adoptable pets within 250 miles of this
 zipcode.

 If you are using the cube based part of the earth distance package,
 then you can use gist indexes to speed those searches up.

Thanks for the tip. Any idea what kind of improvement I can expect to
see, compared to using geo_distance()?


a lot. be aware that gist takes longer to build than btree, but very
fast to search.  Index search and filter to box is basically an index
lookup (fast!). for mostly static datasets that involve a lot of
searching, gist is ideal.

keep in mind that the cube based gist searches out a the smallest
lat/lon 'square' projected onto the earth which covers your circular
radius so you have to do extra processing if you want exact matches. (
you can speed this up to, by doing an 'inner box' search and not
recomputing distance to those points)

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] optimizing a geo_distance() proximity query

2007-02-05 Thread Bruno Wolff III
On Mon, Feb 05, 2007 at 14:47:25 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 This is also interesting. Is this approach practical if I want to index
 what's near each of about 40,000 US zipcodes, or the approach mostly
 useful if you there are just a small number of fixed points to address?

I think the answer depends on what your data model is. If you treat each
zip code as having a location at a single point, the earth distance stuff
should work. If you are trying to include the shape of each zip code in
your model and measure distances to the nearest point of zip codes, then
you will probably be better off using postgis.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Mark Stosberg
Merlin Moncure wrote:
 On 2/5/07, Mark Stosberg [EMAIL PROTECTED] wrote:
 Bruno Wolff III wrote:
  On Sat, Feb 03, 2007 at 14:00:26 -0500,
Mark Stosberg [EMAIL PROTECTED] wrote:
  I'm using geo_distance() from contrib/earthdistance would like to
 find a
  way to spend up the geo distance calculation if possible. This is
 for a
  proximity search: Show me adoptable pets within 250 miles of this
  zipcode.
 
  If you are using the cube based part of the earth distance package,
  then you can use gist indexes to speed those searches up.

 Thanks for the tip. Any idea what kind of improvement I can expect to
 see, compared to using geo_distance()?
 
 a lot. be aware that gist takes longer to build than btree, but very
 fast to search.  Index search and filter to box is basically an index
 lookup (fast!). for mostly static datasets that involve a lot of
 searching, gist is ideal.

The documentation in contrib/ didn't provide examples of how to create
or the index or actually a the proximity search. Here's what I figured
out to do:

I added a new column as type 'cube':

 ALTER table zipcodes add column earth_coords cube;

Next I converted the old lat/lon data I had stored in a 'point'
type to the new format:

-- Make to get lat/lon in the right order for your data model!
 UPDATE zipcodes set earth_coords = ll_to_earth( lon_lat[1], lon_lat[0] );

Now I added a GIST index on the field:

 CREATE index earth_coords_idx on zipcodes using gist (earth_coords);

Finally, I was able to run a query, which I could see used the index (by
checking EXPLAIN ANALYZE ...

   select * from zipcodes where earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords;

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more real world query that involved
 a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?

Any other indexing or optimization tips would be appreciated.

   Mark

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] index scan through a subquery

2007-02-05 Thread Bill Howe

Why should these queries have different plans?


create table foo (a int PRIMARY KEY);

Q1: explain select max(a) from foo

 Result  (cost=0.04..0.05 rows=1 width=0)
  InitPlan
-  Limit  (cost=0.00..0.04 rows=1 width=4)
  -  Index Scan Backward using foo_pkey on foo
  (cost=0.00..76.10 rows=2140 width=4)
Filter: (a IS NOT NULL)

Q2: explain select max(a) from (select * from foo) as f

 Aggregate  (cost=36.75..36.76 rows=1 width=4)
  -  Seq Scan on foo  (cost=0.00..31.40 rows=2140 width=4)


I need the lovely index scan, but my table is hidden behind a view, and
all I get is the ugly sequential scan.  Any ideas on how to convince the
optimizer to unfold the subquery properly?

Bill

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] How long should it take to insert 200,000 records?

2007-02-05 Thread Karen Hill
I have a pl/pgsql function that is inserting 200,000 records for
testing purposes.  What is the expected time frame for this operation
on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
a 2ghz cpu.  So far I've been sitting here for about 2 million ms
waiting for it to complete, and I'm not sure how many inserts postgres
is doing per second.

regards,
karen


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How long should it take to insert 200,000 records?

2007-02-05 Thread Tom Lane
Karen Hill [EMAIL PROTECTED] writes:
 I have a pl/pgsql function that is inserting 200,000 records for
 testing purposes.  What is the expected time frame for this operation
 on a pc with 1/2 a gig of ram and a 7200 RPM disk?

I think you have omitted a bunch of relevant facts.  Bare INSERT is
reasonably quick:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# \timing
Timing is on.
regression=# insert into foo select x from generate_series(1,20) x;
INSERT 0 20
Time: 5158.564 ms
regression=# 

(this on a not-very-fast machine) but if you weigh it down with a ton
of index updates, foreign key checks, etc, it could get slow ...
also you haven't mentioned what else that plpgsql function is doing.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] optimizing a geo_distance() proximity query (example and benchmark)

2007-02-05 Thread Bruno Wolff III
On Mon, Feb 05, 2007 at 18:01:05 -0500,
  Mark Stosberg [EMAIL PROTECTED] wrote:
 
 It's also notable that the units used are meters, not miles like
 geo_distance(). That's what the magic number of 16093.44 is-- 10 miles
 converted to meters.

You can change the earth() function in earthdistance.sql before running it
to use some other unit other than meters:

-- earth() returns the radius of the earth in meters. This is the only
-- place you need to change things for the cube base distance functions
-- in order to use different units (or a better value for the Earth's radius).

CREATE OR REPLACE FUNCTION earth() RETURNS float8
LANGUAGE 'sql' IMMUTABLE
AS 'SELECT ''6378168''::float8';

 However, my next step was to try a more real world query that involved
  a more complex where clause and a couple of table joins. So far, that
 result is coming out /slower/ with the new approach, even though the
 index is being used. I believe this may be cause of the additional
 results found that are outside of the sphere, but inside the cube. This
 causes additional rows that need processing in the joined tables.

This is unlikely to be the cause. The ratio of the area of the cube to
the circle for small radii (compared to the radius of the earth, so that
we can consider thinsg flat) is 4/pi = 1.27 which shouldn't cause that
much of a change.
It might be that you are getting a bad plan. The guess on the selectivity
of the gist constraint may not be very good.
Some people here may be able to tell you more if you show us explain
analyze output.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] index scan through a subquery

2007-02-05 Thread Tom Lane
Bill Howe [EMAIL PROTECTED] writes:
 I need the lovely index scan, but my table is hidden behind a view, and
 all I get is the ugly sequential scan.  Any ideas on how to convince the
 optimizer to unfold the subquery properly?

You should provide some context in this sort of gripe, like which PG
version you're using.  But I'm going to guess that it's 8.2.x, because
8.1.x gets it right :-(.  Try the attached.

regards, tom lane

Index: planagg.c
===
RCS file: /cvsroot/pgsql/src/backend/optimizer/plan/planagg.c,v
retrieving revision 1.25
diff -c -r1.25 planagg.c
*** planagg.c   9 Jan 2007 02:14:13 -   1.25
--- planagg.c   6 Feb 2007 06:30:23 -
***
*** 70,75 
--- 70,76 
  optimize_minmax_aggregates(PlannerInfo *root, List *tlist, Path *best_path)
  {
Query  *parse = root-parse;
+   FromExpr   *jtnode;
RangeTblRef *rtr;
RangeTblEntry *rte;
RelOptInfo *rel;
***
*** 102,115 
 * We also restrict the query to reference exactly one table, since join
 * conditions can't be handled reasonably.  (We could perhaps handle a
 * query containing cartesian-product joins, but it hardly seems worth 
the
!* trouble.)
 */
!   Assert(parse-jointree != NULL  IsA(parse-jointree, FromExpr));
!   if (list_length(parse-jointree-fromlist) != 1)
!   return NULL;
!   rtr = (RangeTblRef *) linitial(parse-jointree-fromlist);
!   if (!IsA(rtr, RangeTblRef))
return NULL;
rte = rt_fetch(rtr-rtindex, parse-rtable);
if (rte-rtekind != RTE_RELATION || rte-inh)
return NULL;
--- 103,121 
 * We also restrict the query to reference exactly one table, since join
 * conditions can't be handled reasonably.  (We could perhaps handle a
 * query containing cartesian-product joins, but it hardly seems worth 
the
!* trouble.)  However, the single real table could be buried in several
!* levels of FromExpr.
 */
!   jtnode = parse-jointree;
!   while (IsA(jtnode, FromExpr))
!   {
!   if (list_length(jtnode-fromlist) != 1)
!   return NULL;
!   jtnode = linitial(jtnode-fromlist);
!   }
!   if (!IsA(jtnode, RangeTblRef))
return NULL;
+   rtr = (RangeTblRef *) jtnode;
rte = rt_fetch(rtr-rtindex, parse-rtable);
if (rte-rtekind != RTE_RELATION || rte-inh)
return NULL;

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster