Re: [GENERAL] How to get RTREE performance from GIST index?
Actually, autovacuum doesn't process temp tables at all because it cannot get to them; they might live solely in the creating process' private memory area. Does that mean that, in between creating a temporary table and actually using it in a complicate query, it is desirable to run an ANALYZE command on it? I haven't been doing that, because I didn't know. Regards -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
Thanks to all those who responded to my posting yesterday. I have now tried a simple simulation of joining tables with partly overlapping rectangular boxes using Rtrees (with GIST automatically replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the latter a bit faster. But my original data processing script still hangs (or takes 3 hours, much the same thing as far as getting work done is concerned) at various points when using v8.4.1. The identical script works fine using a 8.1.0 server, which fortunately we still have available. I have now inserted ANALYSE table commands before each SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help. It will obviously take a lot of time and effort to track this down. For the moment I shall stick to using v8.1.0, as there's really no alternative. I still think it a great pity that rather than merely deprecating R-tree indexing or making GIST the default but still allowing R-trees to be used if one really wanted them, you actually removed Rtrees from the code. No doubt some tests show GIST to work and work faster than Rtrees in test cases; clearly from my experience when using complicated real-world data that's not necessarily true. Maybe there's some magic spell that can be used to restore the earlier performance, but I really don't have time at present to do the necessary experimenting. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
Thanks to all those who responded to my posting yesterday. I have now tried a simple simulation of joining tables with partly overlapping rectangular boxes using Rtrees (with GIST automatically replacing them in 8.4.1), and this works in 8.1.0 and 8.4.1, with the latter a bit faster. But my original data processing script still hangs (or takes 3 hours, much the same thing as far as getting work done is concerned) at various points when using v8.4.1. The identical script works fine using a 8.1.0 server, which fortunately we still have available. I have now inserted ANALYSE table commands before each SELECT that depends upon an R-tree (GIST) index. This doesn't seem to help. It will obviously take a lot of time and effort to track this down. For the moment I shall stick to using v8.1.0, as there's really no alternative. I still think it a great pity that rather than merely deprecating R-tree indexing or making GIST the default but still allowing R-trees to be used if one really wanted them, you actually removed Rtrees from the code. No doubt some tests show GIST to work and work faster than Rtrees in test cases; clearly from my experience when using complicated real-world data that's not necessarily true. Maybe there's some magic spell that can be used to restore the earlier performance, but I really don't have time at present to do the necessary experimenting. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
On 22/11/2009 12:15, Martijn van Oosterhout wrote: Looking forward to your explain output. Here it is (I wrapped some of the longer lines as might not have survived the translation to email): Postgres v8.1.0 EXPLAIN SELECT a.longid AS longid, b.longid AS blongid, gcdist(a.ra, a.dec, b.ra, b.dec) AS dist FROM pos AS a, pos AS b WHERE a.errbox && b.errbox AND gcdist(a.ra, a.dec, b.ra, b.dec) < LEAST(0.9*a.dist_nn, 0.9*b.dist_nn, 7.0, 3.0 * (a.poserr + b.poserr) ) AND (a.obsid <> b.obsid OR a.longid = b.longid) ; QUERY PLAN --- Nested Loop (cost=22.16..1241963555.61 rows=205459449 width=48) Join Filter: ((gcdist("outer".ra, "outer"."dec", "inner".ra, "inner"."dec") < LEAST((0.9::double precision * "outer".dist_nn), (0.9 ::double precision * "inner".dist_nn), 7::double precision, (3::double precision * ("outer".poserr + "inner".poserr AND (("outer". obsid <> "inner".obsid) OR ("outer".longid = "inner".longid))) -> Seq Scan on pos a (cost=0.00..8213.83 rows=351983 width=68) -> Bitmap Heap Scan on pos b (cost=22.16..3469.79 rows=1760 width=68) Recheck Cond: ("outer".errbox && b.errbox) -> Bitmap Index Scan on pos_errbox (cost=0.00..22.16 rows=1760 width=0) Index Cond: ("outer".errbox && b.errbox) (7 rows) Actual timing using v8.1.0: SELECT Time: 71351.102 ms Postgres 8.4.1 EXPLAIN output: - Nested Loop (cost=0.00..235836993.78 rows=205459449 width=48) Join Filter: (((a.obsid <> b.obsid) OR (a.longid = b.longid)) AND (gcdist(a.ra, a."dec", b.ra, b."dec") < LEAST((0.9::double precision * a.dist_nn), (0.9::double precision * b.dist_nn), 7::double precision, (3::double precision * (a.poserr + b.poserr) -> Seq Scan on pos a (cost=0.00..8032.83 rows=351983 width=68) -> Index Scan using pos_errbox on pos b (cost=0.00..31.27 rows=1760 width=68) Index Cond: (a.errbox && b.errbox) (5 rows) Actual timing using v8.4.1 was 10228 seconds (sorry didn't record the milliseconds). It only worked when I left it running overnight! Regards -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
On 22/11/2009 12:09, Alban Hertroys wrote: If you expect indexes to work efficiently on temporary tables you should analyse them after filling them to update the planner's statistics on their contents. If you don't you get the default query plan that's often not efficient. Alban Thanks - I didn't know that. I'll try removing the TEMPORARY tag. Is it documented somewhere that I should have seen? Regards -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
On 22/11/2009 11:52, Thom Brown wrote: Since this is a performance issue, this should probably have been sent to the pgsql-performance mailing list. But in any case, the vast majority of performance issues require an EXPLAIN output, or preferably with ANALYZE also as there is nothing to help diagnose what the query planner it attempting to do. You may think it's a performance issue, but it stems from the decision to remove from Postgres an essential facility, that of generating and using R-trees. I'm currently trying to generate test cases, together with EXPLAIN output. Regards -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
On 22/11/2009 10:44, Martijn van Oosterhout wrote: PostgreSQL is used extensively for geometric queries, see postgis. They abandoned rtree a while back because the GiST rtree support was better, Maybe the support is better, but the performance is obviously not. And when there is a difference between under a minute and 3 hours, then performance matters. At least it does to me and my colleagues. You are AFAICR the first person to have a problem is this area, but if I find it extremely hard to believe that. All I am doing is finding whether pairs of rectangular boxes overlap or not. That is the most trivial use of R-trees possible. Surely someone thought to time that using GIST? you can't take the few minutes needed to run EXPLAIN on before and after then there is zero chance of it being fixed either. Unfortunately it isn't a "few minutes". To re-run in v8.1 I have to reload many tables into a different installation using v8.1: some of the tables have a few million rows and hundreds of columns. Then I have change some scripts to add an EXPLAIN command and log the resulting output (rather than getting the results that I actually want). This will take hours. I will try to do it soon, but cannot do it instantly. I have some data that I want to process first. I agree that this is a bug in Postgres - the bug was removing code that worked perfectly well and upon which some users depended. I simply don't understand why the Rtree code could not have been left in there, for those who found that the new-fangled GIST indexing did not work. Regards -- Clive Page -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] How to get RTREE performance from GIST index?
On 22/11/2009 05:40, Tom Lane wrote: No, because the rtree code is gone entirely. We took it out on the basis of tests showing that the gist implementation performed as well or better. I'm not sure why it's not working for you, but if you can provide a more complete test case, we could look into it. One thing to check into right away is whether the system is even trying to use the index --- what does EXPLAIN show about it? Do you by any chance have EXPLAIN output for the same query on the old system? What was the old PG version, anyway? Tom Thanks for your reply. I should have said that I was using v8.1. After I posted my question, I retried with CREATE INDEX ... USING GIST(errbox box_ops) and left it to run overnight. The query using the index, which finds overlaps between rectangular boxes using the && operator, took 10228 seconds, whereas using RTREES in v8.1 it took around 50 seconds. I have several such queries to do, and cannot afford to wait for hours. I discovered the "box_ops" syntax only by reading lots of disparate bits of documentation: it is very unsatisfactory that your indexing options are so very poorly documented. I saw that as well as GIST indexing there is something called GIN indexing but failed to find anything useful about these at all. I tried to use them, but without success. There is no point in having these facilities if they are not documented adequately. I am truly sorry that you made the decision to remove R-trees from Postgres and had no regard for backward compatibility. The availability and high performance of R-trees was one of the main reasons I switched to Postgres and have been using it for the last few years. I realise that if I take the time to experiment and use the EXPLAIN command and play around for a week or two I *might* be able to restore something like the earlier performance, but unfortunately I have a job I want to get done in the next day or two. Fortunately I have a simple work-around: Postgres v8.1 is still installed here, and I'll use it right away. For the longer term, I may have to switch to MySQL, which had R-trees but not implemented very efficiently (the last time I checked). No doubt the new owners of MySQL will have tried hard to get them working properly. I'm truly sorry that you don't take the need for R-tree indexing seriously. I would have thought that geometric queries such as the ones that I've been doing would be more and more important in the real world. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to get RTREE performance from GIST index?
I have been using Postgres for some years, in particular the RTREE indexes to perform spatial queries on astronomical datasets. I misguidedly got our system manager to install Postgres 8.4 and I find that I can no longer use rtrees - the system gives me a message substituting access method "gist" for obsolete method "rtree" The performance has dropped by at least a factor of 100 (I am not sure how much more, because the relevant bit of my SQL is still running after more than an hour, previously it took a minute or so to do this bit of the script). The relevant bits of SQL I have been using are: CREATE TEMPORARY TABLE cat4p AS SELECT longid, srcid, ra, dec, poserr, BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0), POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox FROM cat4; CREATE INDEX cat4pind ON cat4p USING RTREE(errbox); CREATE TEMPORARY TABLE apair AS SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr FROM avcatpos AS a, cat4p AS c WHERE a.errbox && c.errbox AND gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) < LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr)) AND a.srcid <> c.srcid; It is this latter query, involving the && operator to find where two rectangular boxes overlap, which seems to be taking the huge amount of time. Is there a way of forcing the use of Rtree indexing in v8.4, or any other work-around? Regards -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] How to get RTREE performance from GIST indexing?
I have been using Postgres for some years, in particular the RTREE indexes to perform spatial queries on astronomical datasets. I misguidedly got our system manager to install Postgres 8.4 and I find that I can no longer use rtrees - the system gives me a message substituting access method "gist" for obsolete method "rtree" The performance has dropped by at least a factor of 100 (I am not sure how much more, because the relevant bit of my SQL is still running after more than an hour, previously it took a minute or so to do this bit of the script). The relevant bits of SQL I have been using are: CREATE TEMPORARY TABLE cat4p AS SELECT longid, srcid, ra, dec, poserr, BOX(POINT(ra+10.0/(3600*COS(RADIANS(dec))), dec+10.0/3600.0), POINT(ra-10.0/(3600*COS(RADIANS(dec))), dec-10.0/3600.0)) AS errbox FROM cat4; CREATE INDEX cat4pind ON cat4p USING RTREE(errbox); CREATE TEMPORARY TABLE apair AS SELECT c.longid, c.srcid, c.ra, c.dec, c.poserr FROM avcatpos AS a, cat4p AS c WHERE a.errbox && c.errbox AND gcdist(a.sc_ra, a.sc_dec, c.ra, c.dec) < LEAST(7.0, 3.0 * (a.sc_poserr + c.poserr)) AND a.srcid <> c.srcid; It is this latter query, involving the && operator to find where two rectangular boxes overlap, which seems to be taking the huge amount of time. Is there a way of forcing the use of Rtree indexing in v8.4, or any other work-around? Regards -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Column descriptions - could they be propagated to new
On Thu, 6 Apr 2006, Jim Nasby wrote: I seem to recall some astronomer having created some custom types for storing astronomical data in PostgreSQL. Or perhaps he was using PostGIS. I know that other astronomers are using PostgreSQL/PostGIS so if you look around you might be able to save yourself quite a bit of work. Well I know about pgAstro and pgSphere (and helped a little in testing them) but maybe there are others. Will look, thanks. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Column descriptions - could they be propagated to new
On Wed, 5 Apr 2006, Merlin Moncure wrote: Have you considered using domains for these types? You can comment the domain appropriately. While the domain description will not show in \d+. to get the description you can do \dT on the domain. Based on your background I also think you might appreciate domains from a design perspective. There are a couple of disadvantages to using them so I'd suggest reading about them. Thanks for the suggestion, I'll start reading up on them. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Column descriptions - could they be propagated to new tables?
Since I discovered the facilities in Postgres for providing and listing column descrptions, I have found them very useful, especially for adding a string showing physical units to my columns. For example: \d+ cat Table "public.cat" Column | Type | Modifiers | Description -+--+---+-- src_num | integer | | ra | double precision | | deg decl| double precision | | deg radec_err | real | | arcsec lii | double precision | | deg bii | double precision | | deg pn_cts | real | | counts However if one performs a JOIN creating a new table, all these descriptions fail to transfer. I haven't been able to find any easy way of propagating the descriptions - would it be a useful facility to have them propagated automatically? I would have thought that things like units would be useful even in many scientific applications, e.g. to have monetary columns described as dollars/pounds/euros or whatever. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Baffled by failure to use index when WHERE uses a
On Fri, 10 Mar 2006, Martijn van Oosterhout wrote: You don't describe the exact structure of your table nor the exact declaraion of your function, but is it possible your function is marked VOLATILE rather tha STABLE or IMMUTABLE? Thanks for that hint - my function was not marked in any way, so I guess it got to be VOLATILE by default. I have just marked it as IMMUTABLE and it now uses the index as expected, with a huge performance gain. I confess that I was totally ignorant of the differences between these three types of function. Sorry I slightly messed up the cut/paste of my posting, I had been experimenting with various versions of the same table and didn't quite get the details consistent in what I posted. Thanks also to Richard Huxton and Martin van Oosterhout who gave me the same hint. What an excellent support group this is. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Baffled by failure to use index when WHERE uses a function
I have a table cov3 of about 3 million rows, with a B-tree index on an integer column called hpix. If I do a simple select on this column it works in milliseconds, using the index naturally: select * from cov3 where hpixint = 482787587; hpix| expos | hpixint ---+-+--- 482787587 | 30529.6 | 482787587 (1 row) The problem is that I want to use a user-defined function called healpix which returns a single integer value in my queries; the function details are unlikely to be relevant (it selects a pixel from a celestial position), but its definition is: \df healpix List of functions Schema | Name | Result data type |Argument data types +-+--+ public | healpix | integer | double precision, double precision So I would like to use this function to find rows, and I try for example: select * from cov3 where hpix = healpix(2.85,-11.48); but it takes ages. An EXPLAIN shows why, it insists upon a sequential scan: explain select * from cov3 where hpix = healpix(2.85,-11.48); QUERY PLAN -- Seq Scan on cov3 (cost=0.00..93046.81 rows=1 width=20) Filter: (hpix = (healpix(2.85::double precision, -11.48::double precision))::text) Does anyone have any idea why, or know how I can restore adequate performance? I am using Postgres 8.1.0 on Linux. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Postgres crashed when adding a sequence column
This is just to report success: I dropped all indices and repeated: UPDATE intwfs SET id = nextval('myseq'); and it worked fine - took 3681 secs (my estimate of an hour wasn't far out). Now doing a VACUUM FULL to remove the old tuples. -- Clive Page Dept of Physics & Astronomy, University of Leicester,Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Postgres crashed when adding a sequence column
On Wed, 19 Jan 2005, Richard Huxton wrote: > Hmm - wonder if there might be some memory leak in updates to the R-tree > - it gets used less than B-tree, so it could be. If you reply to this, > make sure you mention your version of PG - one of the developers might > know more. Probably also worth looking in the documentation for 8.0 and > check the release notes section to see if any changes look applicable to > your problem. That's a possibility. I've now dropped all indices, and am trying again. I forgot to post the version in use, it was 7.4.1 - will try to upgrade to v8.0 soon, as it clearly solves another problem I encountered recently. Thanks for all the help. -- Clive Page Dept of Physics & Astronomy, University of Leicester,Tel +44 116 252 3551 Leicester, LE1 7RH, U.K. Fax +44 116 252 3311 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [GENERAL] Postgres crashed when adding a sequence column
On Wed, 19 Jan 2005, Richard Huxton wrote: > Nothing wrong with what you're doing, however, you are running a > transaction that touches 142 million rows (expiring the old rows and > adding new ones). Still, unless you are particularly short of memory, or > haven't tuned PostgreSQL it should be fine. > > Some questions: > 1. Is the table particularly wide (i.e. number/size of columns)? Well 28 columns, one varchar, one box, rest 4-byte real/integer. > 2. Do you have any foreign keys/triggers on the table? No triggers, but there's one R-tree on the box, and one B-tree on an integer column. I guess I should have dropped those and re-created afterwards, but the recreation takes some time, so I hoped to avoid the need to do that. I've now managed to restart the server, which took ~1hr to clean itself up, and am doing a VACUUM FULL ANALYZE on the table. Looks as if the data are ok, but the new column is still empty of course. Thanks very much for your help, Richard. Will try again without indices, keeping an eye on the cpu/memory consumption. -- Clive Page Dept of Physics & Astronomy, University of Leicester, Leicester, LE1 7RH, U.K. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[GENERAL] Postgres crashed when adding a sequence column
I have a largish table (71 million rows) to which I needed to add a new integer column containing a unique identifier - a simple sequence seemed to be good enough. I discovered the CREATE SEQUENCE command which looked as if it would do the job, and did the following: ALTER TABLE intwfs ADD COLUMN id int ; CREATE SEQUENCE myseq; UPDATE intwfs SET id = nextval('myseq'); I expected it to take under an hour, but the process was still running after several hours, taking ~15% cpu and a modest amount of memory. Later on other users reported the machine was almost unusable and I found that postmaster was hogging over 99% of cpu and all of memory. I was about to stop the process, but before I could do that the postmaster crashed. Obviously I'm doing something that Postgres doesn't support, but I'm not quite clear what. Any suggestions on how to achieve the same objective more easily? -- Clive Page ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [GENERAL] How to use dblink within pl/pgsql function:
Thanks to Joe Conway for pointing out a couple of typos in what I posted (the original code that I used didn't have quite as many, I edited it to try to simplify). He also pointed out that I had not initialised my 'count' variable, which was a genuine mistake. I am pleased to say that the corrected function does work as expected: CREATE OR REPLACE FUNCTION find() RETURNS INTEGER AS ' DECLARE count INTEGER := 0; myrec RECORD; BEGIN FOR myrec IN SELECT * FROM DBLINK(''hostaddr=127.0.0.1'', ''select ra, decl from twomass where errbox && box(point(120.45,0.5),point(120.50,0.75))'') as temp(x float8, y float8) LOOP count := count + 1; END LOOP; RETURN count; END; ' LANGUAGE 'plpgsql'; I can use it in a query such as: SELECT * FROM find(); And it returns the number of rows returned from the join using R-trees. Of course there are easier ways of doing what this function does, I just wanted to get that working as a basis on which to build something more advanced. -- Clive Page ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[GENERAL] How to use dblink within pl/pgsql function:
What I'd like to do is use dblink to extract a few rows from a remote database and manipulate these within a function in pl/pgsql. Something like this: CREATE OR REPLACE FUNCTION find() RETURNS INTEGER AS ' DECLARE count INTEGER: myrec RECORD; BEGIN FOR myrec IN SELECT * FROM DBLINK(''select x,y from mytab'') as temp(x integer, y real) LOOP count := count + 1; END LOOP; RETURN count; END; ' LANGUAGE 'plpgsql'; But this syntax does not work, and I cannot find a form which does work. Does anyone know how to do this? Thanks in advance. -- Clive Page ---(end of broadcast)--- TIP 8: explain analyze is your friend