Re: [HACKERS] sp-gist porting to postgreSQL
On Wed, 10 Nov 2004, Ramy M. Hassan wrote: Oleg, Thanks for your prompt reply. Actually, I am able to create a new access method for testing and add an operator class for the type integer using the new access method. Then created a table with two integer fields, one indexed using the new access method and the other using a btree index, and everything is ok so far. Even using EXPLAIN statement for queries show that the indexes are used correctly as they should. I am using postgresql version 8.0.0beta3 from CVS. I was wrong, Ramy. You could have several indices for the same type as soon as they support different operations. I don't know if it's possible to have them for the same operation but for different conditions. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:35 AM To: Ramy M. Hassan; Pgsql Hackers Cc: Teodor Sigaev; Walid G. Aref Subject: Re: sp-gist porting to postgreSQL Ramy, glad to hear from you ! AFAIK, posgresql doesnt' supports several indices for the same type. I think this is a problem of optimizer. Probably other hackers know better. I forward your message to -hackers mailing list which is a relevant place for GiST discussion. regards, Oleg On Tue, 9 Nov 2004, Ramy M. Hassan wrote: Dear Oleg and Teodor, Thanks for offering help. I have a design question for now. Currently in the postgresql GiST implementation, I noticed that the way to have a GiST based index is to define an operator class for a certain type using GiST index. There is no new index type defined from the point of view of postgresql ( nothing is added to pg_am ). This means that for a certain type there could only be one GiST based index. I mean that there is no way in the same server to use gist to implement an xtree index and a ytree for the same type even if they index different fields in different relations. is that correct ? What about doing it the other way ( I am talking about SP-GiST now ) , by providing the extension writer with an API to use it to instantiate a standalone SP-GiST based index ( for example trie index ) that has a record in the pg_am relation. In my point of view this would give more flexibility, and also would not require the extension writer to learn the postgresql API ( maybe oneday SP-GiST will be ported to another database engine ) he will just need to learn the SP-GiST API which will propably be less amount of study (and this is what GiST and SP-GiST is all about if I correctly understand ). Please let me know your opinions regarding to this. Thanks Ramy Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] [Pgsphere-dev] GIST index concurrency concern
Patrick, you didn't say us about your setup. Have you proved you've seen locking issue for reading ? Are you sure you have no any locks in your code ? Any tests demonstrated your problem would be great. Oleg On Tue, 9 Nov 2004, Patrick Clery wrote: Oleg, Daniel and I have both been collaborating on this structure for a while now. We are aware that GiST reads work very fast. But won't they be paralyzed when there are writes? Both of us are working on dating sites, and the main problem that concerns us is a very heavy traffic load. At this point I am planning to queue all changes to a GiST index and commit them every 10-15 minutes. Is that really necessary? It's realistic to assume here that if there is a problem with locking the table for writes, it will be a problem in this situation because this structure is going to be hit VERY hard (and Daniel's situation is on an even larger scale). We hope that we can alleviate that with a transaction queue, but this is not a simple fix. Have you seen any projects that were under a heavy load using a GiST index, and were they able to avoid being paralyzed somehow? Thanks in advance, Patrick On Tuesday 09 November 2004 22:08, Oleg Bartunov wrote: Oleg Bartunov [EMAIL PROTECTED] Daniel, concurrency is a big issue of current implementation of GiST. But it should don't bite you for READ ops ! -hackers mailing list is a very relevant mailing list for GiST discussions. It's pity we several times claimed to work on GiST concurrency and recovery, but never got a chance :) I see Neil become interested in GiST concurrency, though. Oleg On Tue, 9 Nov 2004, Daniel Ceregatti wrote: Hi, It's recently come to my attention that GIST indices suffer from concurrency issues. I have already developed a dating sites using GIST for use with attributes using the intarray contrib, and for Earth distance/radius calculations using pg_sphere. I'm wondering if I haven't shot myself in the foot here. So far, I understand that a GIST index will be locked by a backend for any DML. Basically I'm concerned that my database will not scale in the manner that I was hoping, because the sites that access the database are to be used by many multiple concurrent users, doing some DML. I expect my site to sustain something around 1000-3000 new user acquisitions per day, all of which will account for an insert into 3 GIST indices. Additionally there will be people that will be updating their attributes and locations as well, but this will probably only account for a small fraction of the DML. We don't allow people to delete stuff. My concern now is this concurrency issue. My question is: Is there anyone out there using a GIST index on a database where there's a lot of DML? Should I be concerned with this issue at all? If so, what can be done to minimize the impact of heavy DML on a GIST index? I've pondered rolling all DML into queues via triggers and then de-queuing them in one transaction every so often, like 15 minutes, via cron. Any other suggestions? I'm posting to this list because I understand that both Oleg and Teodor read it, and I found no other relevant list. If I've misposted, please accept my apology and please direct me to the appropriate list. Thanks, Daniel ___ Pgsphere-dev mailing list [EMAIL PROTECTED] http://gborg.postgresql.org/mailman/listinfo/pgsphere-dev Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] sp-gist porting to postgreSQL
I believe that it is still possible to have several index access methods for the same type and the same operations. But this requires that each index access method has its own tuple in the pg_am relation and therefore postgresql recognizes it by itself. But this is not the case with GiST based indices. They are all recognized by postgresql as same index access method, and from here comes the limitation. Also, I think GiST and SP-GiST are better viewed as index classes not as indices by themselves. So may be it is more logical to say: CREATE INDEX index_name ON table_name USING spgist_trie(field) Where spgist_trie is an spgist based index instance. Than to say: CREATE INDEX index_name ON table_name USING spgist(field) And depend on the operator classes to define the required support methods for the trie function. I am not sure I have a complete vision, but this is what I see. I would appreciate your opinions regarding to this design issue. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 5:21 AM To: Ramy M. Hassan Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref' Subject: RE: sp-gist porting to postgreSQL On Wed, 10 Nov 2004, Ramy M. Hassan wrote: Oleg, Thanks for your prompt reply. Actually, I am able to create a new access method for testing and add an operator class for the type integer using the new access method. Then created a table with two integer fields, one indexed using the new access method and the other using a btree index, and everything is ok so far. Even using EXPLAIN statement for queries show that the indexes are used correctly as they should. I am using postgresql version 8.0.0beta3 from CVS. I was wrong, Ramy. You could have several indices for the same type as soon as they support different operations. I don't know if it's possible to have them for the same operation but for different conditions. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:35 AM To: Ramy M. Hassan; Pgsql Hackers Cc: Teodor Sigaev; Walid G. Aref Subject: Re: sp-gist porting to postgreSQL Ramy, glad to hear from you ! AFAIK, posgresql doesnt' supports several indices for the same type. I think this is a problem of optimizer. Probably other hackers know better. I forward your message to -hackers mailing list which is a relevant place for GiST discussion. regards, Oleg On Tue, 9 Nov 2004, Ramy M. Hassan wrote: Dear Oleg and Teodor, Thanks for offering help. I have a design question for now. Currently in the postgresql GiST implementation, I noticed that the way to have a GiST based index is to define an operator class for a certain type using GiST index. There is no new index type defined from the point of view of postgresql ( nothing is added to pg_am ). This means that for a certain type there could only be one GiST based index. I mean that there is no way in the same server to use gist to implement an xtree index and a ytree for the same type even if they index different fields in different relations. is that correct ? What about doing it the other way ( I am talking about SP-GiST now ) , by providing the extension writer with an API to use it to instantiate a standalone SP-GiST based index ( for example trie index ) that has a record in the pg_am relation. In my point of view this would give more flexibility, and also would not require the extension writer to learn the postgresql API ( maybe oneday SP-GiST will be ported to another database engine ) he will just need to learn the SP-GiST API which will propably be less amount of study (and this is what GiST and SP-GiST is all about if I correctly understand ). Please let me know your opinions regarding to this. Thanks Ramy Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] sp-gist porting to postgreSQL
On Wed, 10 Nov 2004, Ramy M. Hassan wrote: I believe that it is still possible to have several index access methods for the same type and the same operations. But this requires that each index access method has its own tuple in the pg_am relation and therefore postgresql recognizes it by itself. But this is not the case with GiST based indices. They are all recognized by postgresql as same index access method, and from here comes the limitation. It's possible, see contrib/intarray, for example. You can specify opclass in CREATE INDEX command: CREATE INDEX text_idx on test__int using gist ( a gist__int_ops ); CREATE INDEX text_idx2 on test__int using gist ( a gist__intbig_ops ); Here gist__int_ops and gist__intbig_ops are different opclasses for the same type and intended to use with different cardinality. The problem is how to use them (indices) automatically, how planner/optimizer could select which indices to use. Also, I think GiST and SP-GiST are better viewed as index classes not as indices by themselves. So may be it is more logical to say: CREATE INDEX index_name ON table_name USING spgist_trie(field) Where spgist_trie is an spgist based index instance. Than to say: CREATE INDEX index_name ON table_name USING spgist(field) And depend on the operator classes to define the required support methods for the trie function. why not use existed syntax ? CREATE INDEX index_name ON table_name USING spgist (fiels trie_ops) I am not sure I have a complete vision, but this is what I see. I would appreciate your opinions regarding to this design issue. Teodor is rather busy right now, but he certainly knows better GiST internals, so we'll wait his comments. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 5:21 AM To: Ramy M. Hassan Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref' Subject: RE: sp-gist porting to postgreSQL On Wed, 10 Nov 2004, Ramy M. Hassan wrote: Oleg, Thanks for your prompt reply. Actually, I am able to create a new access method for testing and add an operator class for the type integer using the new access method. Then created a table with two integer fields, one indexed using the new access method and the other using a btree index, and everything is ok so far. Even using EXPLAIN statement for queries show that the indexes are used correctly as they should. I am using postgresql version 8.0.0beta3 from CVS. I was wrong, Ramy. You could have several indices for the same type as soon as they support different operations. I don't know if it's possible to have them for the same operation but for different conditions. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:35 AM To: Ramy M. Hassan; Pgsql Hackers Cc: Teodor Sigaev; Walid G. Aref Subject: Re: sp-gist porting to postgreSQL Ramy, glad to hear from you ! AFAIK, posgresql doesnt' supports several indices for the same type. I think this is a problem of optimizer. Probably other hackers know better. I forward your message to -hackers mailing list which is a relevant place for GiST discussion. regards, Oleg On Tue, 9 Nov 2004, Ramy M. Hassan wrote: Dear Oleg and Teodor, Thanks for offering help. I have a design question for now. Currently in the postgresql GiST implementation, I noticed that the way to have a GiST based index is to define an operator class for a certain type using GiST index. There is no new index type defined from the point of view of postgresql ( nothing is added to pg_am ). This means that for a certain type there could only be one GiST based index. I mean that there is no way in the same server to use gist to implement an xtree index and a ytree for the same type even if they index different fields in different relations. is that correct ? What about doing it the other way ( I am talking about SP-GiST now ) , by providing the extension writer with an API to use it to instantiate a standalone SP-GiST based index ( for example trie index ) that has a record in the pg_am relation. In my point of view this would give more flexibility, and also would not require the extension writer to learn the postgresql API ( maybe oneday SP-GiST will be ported to another database engine ) he will just need to learn the SP-GiST API which will propably be less amount of study (and this is what GiST and SP-GiST is all about if I correctly understand ). Please let me know your opinions regarding to this. Thanks Ramy Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 Regards, Oleg _ Oleg Bartunov, sci.researcher,
Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable
On Tue, 10 Nov 2004, Greg Stark wrote: Tom Lane [EMAIL PROTECTED] writes: I'd like to propose that we get rid of GUC's USERLIMIT category and convert all the variables in it to plain SUSET. In my mind, USERLIMIT is a failed experiment: it's way too complicated, and it still doesn't do quite what it was intended to do, because there are times when it can't check whether you're a superuser. The only variables that are in the category are log-verbosity-related: Would that mean I wouldn't be able to change the logging level on the fly at all? I would think you'd still be able to do it through a security definer wrapper function owned by a superuser. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category
Andrew McMillan [EMAIL PROTECTED] writes: When tracking down gnarly problems in heavily multi-user applications enabling higher log levels at selective points has the potential to help _a lot_ with diagnostic detail, without smothering you in _every_ detail. Sure. As I pointed out in the other thread, if you want to allow an app to do this, you can make available a SECURITY DEFINER function that performs the desired SET on its behalf. By setting execute permissions on the function and/or including restrictions in the function's code, you can make this as tight or as loose a loophole as you like. So it's certainly possible to do what you want in any case. I think the issue at hand is what's appropriate to provide as hard-wired functionality. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
John Hansen [EMAIL PROTECTED] writes: When doing CREATE or REPLACE FUNCTION of a builtin function, it seems to have no effect if its in the 'C language. SQL functions seem to work, but as neilc pointed out, it may be due to the SQL function being inlined. The builtin function is still called, not the userdefined function for 'C' language functions. You can't override a builtin C function that way because there is a built-in map from function OID to builtin function address, and it's consulted before trying to look in pg_proc. This behavior is not really open to negotiation; not only on grounds of speed, but on grounds of circularity. (The functions used in the process of looking up entries in pg_proc itself obviously must have such a short circuit...) You'd have to build a modified backend in which the particular functions you want to replace are not listed in the builtin mapping table. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: I'd like to propose that we get rid of GUC's USERLIMIT category and convert all the variables in it to plain SUSET. Would that mean I wouldn't be able to change the logging level on the fly at all? No, it would mean that you'd need to be superuser to change it. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] sp-gist porting to postgreSQL
Oleg Bartunov [EMAIL PROTECTED] writes: AFAIK, posgresql doesnt' supports several indices for the same type. I think what's really being asked for is several operator classes for the same type. This most certainly *is* possible. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
elein [EMAIL PROTECTED] writes: Isn't there a load/unload function for the .so that would work in this case? Won't affect the fmgr_builtins table ... he wanted to replace a builtin, not a previously-dynamically-loaded function. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
The builtin function is still called, not the userdefined function for 'C' language functions. You can't override a builtin C function that way because there is a built-in map from function OID to builtin function address, and it's consulted before trying to look in pg_proc. That doesn't make sense, since if I delete the entry from pg_proc and then create the funtion, everything works fine. Eg: delete from pg_proc whete proname = 'funcname'; create function pg_catalog.funcname(); This behavior is not really open to negotiation; not only on grounds of speed, but on grounds of circularity. (The functions used in the process of looking up entries in pg_proc itself obviously must have such a short circuit...) You'd have to build a modified backend in which the particular functions you want to replace are not listed in the builtin mapping table. regards, tom lane Well, as someone pointed out, if it is possible to execute replace function on a builtin, then it should work. ... John ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Increasing the length of pg_stat_activity.current_query...
On 11/8/2004 5:32 PM, Tom Lane wrote: Another relevant question is why you are expecting to get this information through pgstats and not by looking in the postmaster log. The pgstats were originally designed to give hints for tuning. That's why they cover cache hits vs. misses per table and numbers that can be used to point out missing as well as obsolete indexes. That was what led to the design of the pgstats file, the UDP communication and those fixed sizes. The goal was to let it have as little impact on the server performance as possible. The whole current query stuff was added later on request. In my opinion it is quite pointless to attempt to transmit the last byte of every single query sent to the backend, when all you can get out of that view is a random query every 500 milliseconds. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
John Hansen [EMAIL PROTECTED] writes: You can't override a builtin C function that way because there is a built-in map from function OID to builtin function address, and it's consulted before trying to look in pg_proc. That doesn't make sense, since if I delete the entry from pg_proc and then create the funtion, everything works fine. Sure, because then the new entry has a new OID that doesn't match any entry in the fmgr_builtin table. Well, as someone pointed out, if it is possible to execute replace function on a builtin, then it should work. [ shrug... ] Nobody promised that you could change any arbitrary thing without breaking your system. Superusers are allowed to do delete from pg_proc, too, but that doesn't mean you'll be pleased with the results. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Increasing the length of
On Wed, 2004-11-10 at 20:25, Jan Wieck wrote: On 11/8/2004 5:32 PM, Tom Lane wrote: Another relevant question is why you are expecting to get this information through pgstats and not by looking in the postmaster log. The pgstats were originally designed to give hints for tuning. That's why they cover cache hits vs. misses per table and numbers that can be used to point out missing as well as obsolete indexes. That was what led to the design of the pgstats file, the UDP communication and those fixed sizes. The goal was to let it have as little impact on the server performance as possible. The whole current query stuff was added later on request. OK, and thanks for writing it. Evolution is a wonderful thing... our original design point was slightly away from where we are now. In my opinion it is quite pointless to attempt to transmit the last byte of every single query sent to the backend, when all you can get out of that view is a random query every 500 milliseconds. If you are certain you have no queries whose text is 1 KB, or you have no SQL that lasts a few seconds, then increasing the UDP limit would just be a painful waste, I agree. My intention was towards a data warehouse situation, and my comments are only relevant in that context. Possibly 25+% of the user base use this style of processing. In that case, I expect queries to run for minutes or hours. What are the alternatives when a query is still running when you return from lunch? Kill it? Hope? These rogue queries can be a problem, using up much of the capacity of the system for hours. Many such queries are generated by applications and hard to recreate. The server is running it, so we should be able to access the SQL and diagnose. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Increasing the length of
Simon Riggs wrote: On Mon, 2004-11-08 at 22:32, Tom Lane wrote: Another relevant question is why you are expecting to get this information through pgstats and not by looking in the postmaster log. This is only available if you log all queries, which isn't normally done while you are in production. When you hit a long running query, you do wish you had that enabled, and if it was you could look there. It would be best to leave the postmaster logging turned off, then allow dynamic inspection of the query iff you have a rogue query. Isn't that: log_min_duration_statement (integer) Sets a minimum statement execution time (in milliseconds) for statement to be logged. All SQL statements that run in the time specified or longer will be logged with their duration. Setting this to zero will print all queries and their durations. Minus-one (the default) disables this. For example, if you set it to 250 then all SQL statements that run 250ms or longer will be logged. Enabling this option can be useful in tracking down unoptimized queries in your applications. Only superusers can increase this or set it to minus-one if this option is set by the administrator. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Increasing the length of
Hi Richard On Wed, 2004-11-10 at 21:48, Richard Huxton wrote: Simon Riggs wrote: On Mon, 2004-11-08 at 22:32, Tom Lane wrote: Another relevant question is why you are expecting to get this information through pgstats and not by looking in the postmaster log. This is only available if you log all queries, which isn't normally done while you are in production. When you hit a long running query, you do wish you had that enabled, and if it was you could look there. It would be best to leave the postmaster logging turned off, then allow dynamic inspection of the query iff you have a rogue query. Isn't that: log_min_duration_statement (integer) Sets a minimum statement execution time (in milliseconds) for statement to be logged. All SQL statements that run in the time specified or longer will be logged with their duration. Setting this to zero will print all queries and their durations. Minus-one (the default) disables this. For example, if you set it to 250 then all SQL statements that run 250ms or longer will be logged. Enabling this option can be useful in tracking down unoptimized queries in your applications. Only superusers can increase this or set it to minus-one if this option is set by the administrator. That gets written when a statement completes, not during execution. It's great for finding out if you have them AFTER the fact, but no good for telling what's going on in the middle of execution. (It's great BTW, thanks to whoever wrote it) -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Beta time early next week
FYI, I am going to need a few more days to apply patches submitted in the past few days and Magnus needs a few more days to fix the windows signal problems. So, rather than planning a beta for later this week, I think we should focus on early next week. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Suggestions about the most suitable point in the parser/planner stage to perform this sort of rewrite would be most welcome! (as this would be my first non trivial getting of hands dirty in the code). My initial thoughts revolved around extending the existing RULE system to be able to handle more general types of rewrite - like conditionals in SELECT rules and rewrites that change elements of the query other than the target relation. Planning for future note: I would like whatever mechanism that is added for this MAX/MIN stuff to be amenable to more subtle things like aggregate navigation (see R.Kimball's article http://www.dbmsmag.com/9608d54.html). regards Mark ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Increasing the length of
On Wed, Nov 10, 2004 at 09:52:17PM +, Simon Riggs wrote: On Wed, 2004-11-10 at 21:48, Richard Huxton wrote: Isn't that: log_min_duration_statement (integer) That gets written when a statement completes, not during execution. I've been following this thread, and I was thinking the same thing. I wonder how much work it'd be to have another log setting -- say log_statement_after_min_duration (integer) -- which did what Simon wants. That'd more than satisfy my need, for sure. Might the cost of that be too high, though? A -- Andrew Sullivan | [EMAIL PROTECTED] When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Increasing the length of
On Wed, Nov 10, 2004 at 05:51:01PM -0500, Andrew Sullivan wrote: log_statement_after_min_duration (integer) -- which did what Simon wants. Uh, well, not what Simon wants, of course, but which gave us a useful capability anyway. I agree that the full-bore profiling for the DBA would be awful nice. But in its absence, if you could see your long-running query in the log after a minute, and then go do an EXPLAIN and realise uh-oh, that's gonna take 3 days to complete and kill it, it would be a big help. A -- Andrew Sullivan | [EMAIL PROTECTED] A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[HACKERS] multiline CSV fields
Darcy Buskermolen has drawn my attention to unfortunate behaviour of COPY CSV with fields containing embedded line end chars if the embedded sequence isn't the same as those of the file containing the CSV data. In that case we error out when reading the data in. This means there are cases where we can produce a CSV data file which we can't read in, which is not at all pleasant. Possible approaches to the problem: . make it a documented limitation . have a csv read mode for backend/commands/copy.c:CopyReadLine() that relaxes some of the restrictions on inconsistent line endings . escape embedded line end chars The last really isn't an option, because the whole point of CSVs is to play with other programs, and my understanding is that those that understand multiline fields (e.g. Excel) expect them not to be escaped, and do not produce them escaped. So right now I'm tossing up in my head between the first two options. Or maybe there's another solution I haven't thought of. Thoughts? cheers andrew ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
On Thu, 2004-11-11 at 04:11, Tom Lane wrote: You can't override a builtin C function that way because there is a built-in map from function OID to builtin function address, and it's consulted before trying to look in pg_proc. This behavior is not really open to negotiation Then shouldn't we disallow the CREATE OR REPLACE FUNCTION? -Neil ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] CREATE or REPLACE function pg_catalog.*
Neil Conway [EMAIL PROTECTED] writes: On Thu, 2004-11-11 at 04:11, Tom Lane wrote: You can't override a builtin C function that way because there is a built-in map from function OID to builtin function address, and it's consulted before trying to look in pg_proc. Then shouldn't we disallow the CREATE OR REPLACE FUNCTION? We shouldn't disallow it completely; for instance you could validly change the volatility or other attributes that way. There might be an argument for rejecting an attempt to replace the prolang or prosrc values of a built-in, but frankly I think it's a waste of effort to code up such a thing ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
Mark Kirkwood [EMAIL PROTECTED] writes: I am looking at implementing this TODO item. e.g. (max case): My initial thoughts revolved around extending the existing RULE system to be able to handle more general types of rewrite - like conditionals in SELECT rules and rewrites that change elements of the query other than the target relation. The rule rewriter is almost certainly the wrong place, because it has only the most superficial understanding of a query's semantics. Doing this processing there would require re-inventing (or at least duplicating the execution of) a lot of the planner's query analysis work. My thoughts would run towards doing this after the prepqual and prepjointree steps (probably somewhere in grouping_planner). Even there is a bit early since you'd have to duplicate plancat.c's extraction of information about related indexes; but possibly it'd be reasonable to move the add_base_rels_to_query() call out of query_planner and do it in grouping_planner. A more radical way of handling it would be to detect the relevance of an indexscan in indxpath.c and generate a special kind of Path node; this would not generalize to other sorts of things as you were hoping, but I'm unconvinced that the mechanism is going to be very general-purpose anyway. The major advantage is that this would work conveniently for comparing the cost of a rewritten query to a non-rewritten one. How are you planning to represent the association between MIN/MAX and particular index orderings in the system catalogs? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] sp-gist porting to postgreSQL
On Wed, 10 Nov 2004, Oleg Bartunov wrote: On Wed, 10 Nov 2004, Ramy M. Hassan wrote: I believe that it is still possible to have several index access methods for the same type and the same operations. But this requires that each index access method has its own tuple in the pg_am relation and therefore postgresql recognizes it by itself. But this is not the case with GiST based indices. They are all recognized by postgresql as same index access method, and from here comes the limitation. It's possible, see contrib/intarray, for example. You can specify opclass in CREATE INDEX command: CREATE INDEX text_idx on test__int using gist ( a gist__int_ops ); CREATE INDEX text_idx2 on test__int using gist ( a gist__intbig_ops ); Here gist__int_ops and gist__intbig_ops are different opclasses for the same type and intended to use with different cardinality. The problem is how to use them (indices) automatically, how planner/optimizer could select which indices to use. This is great. I didn't know that. Thanks. Also, I think GiST and SP-GiST are better viewed as index classes not as indices by themselves. So may be it is more logical to say: CREATE INDEX index_name ON table_name USING spgist_trie(field) Where spgist_trie is an spgist based index instance. Than to say: CREATE INDEX index_name ON table_name USING spgist(field) And depend on the operator classes to define the required support methods for the trie function. why not use existed syntax ? CREATE INDEX index_name ON table_name USING spgist (fiels trie_ops) That's ok now. The only concern now is the portability of the extensions. Currently Are there any plans to introduce GiST to some other DBMS ? If yes, then I think all GiST based indexes will have to be rewritten or atleast modified to a great extent, as they depend on postgresql API and how index access methods work in postgresql. Do you see any value in defining an SP-GiST API for the extensions to completely isolate the extensions code from postgresql ? Such isolation will require that SP-GiST code loads the extensions instead of relying on postgresql to do that so it will no longer be a matter of operator classes that defines extension. I am not sure I have a complete vision, but this is what I see. I would appreciate your opinions regarding to this design issue. Teodor is rather busy right now, but he certainly knows better GiST internals, so we'll wait his comments. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 5:21 AM To: Ramy M. Hassan Cc: 'Pgsql Hackers'; 'Teodor Sigaev'; 'Walid G. Aref' Subject: RE: sp-gist porting to postgreSQL On Wed, 10 Nov 2004, Ramy M. Hassan wrote: Oleg, Thanks for your prompt reply. Actually, I am able to create a new access method for testing and add an operator class for the type integer using the new access method. Then created a table with two integer fields, one indexed using the new access method and the other using a btree index, and everything is ok so far. Even using EXPLAIN statement for queries show that the indexes are used correctly as they should. I am using postgresql version 8.0.0beta3 from CVS. I was wrong, Ramy. You could have several indices for the same type as soon as they support different operations. I don't know if it's possible to have them for the same operation but for different conditions. Thanks Ramy -Original Message- From: Oleg Bartunov [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:35 AM To: Ramy M. Hassan; Pgsql Hackers Cc: Teodor Sigaev; Walid G. Aref Subject: Re: sp-gist porting to postgreSQL Ramy, glad to hear from you ! AFAIK, posgresql doesnt' supports several indices for the same type. I think this is a problem of optimizer. Probably other hackers know better. I forward your message to -hackers mailing list which is a relevant place for GiST discussion. regards, Oleg On Tue, 9 Nov 2004, Ramy M. Hassan wrote: Dear Oleg and Teodor, Thanks for offering help. I have a design question for now. Currently in the postgresql GiST implementation, I noticed that the way to have a GiST based index is to define an operator class for a certain type using GiST index. There is no new index type defined from the point of view of postgresql ( nothing is added to pg_am ). This means that for a certain type there could only be one GiST based index. I mean that there is no way in the same server to use gist to implement an xtree index and a ytree for the same type even if they index different fields in different relations. is that correct ? What about doing it the other way ( I am talking about SP-GiST now ) , by providing the extension writer with an API to use it to instantiate a standalone SP-GiST based index ( for example trie index ) that has a record in the pg_am relation. In my point of view this would give more flexibility, and also would not require the extension writer to learn the postgresql API ( maybe oneday SP-GiST
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
On Wed, Nov 10, 2004 at 07:18:59PM -0500, Tom Lane wrote: A more radical way of handling it would be to detect the relevance of an indexscan in indxpath.c and generate a special kind of Path node; this would not generalize to other sorts of things as you were hoping, but I'm unconvinced that the mechanism is going to be very general-purpose anyway. The major advantage is that this would work conveniently for comparing the cost of a rewritten query to a non-rewritten one. What about having a new column in pg_aggregate which would point to a function that would try to optimize the aggregate's handling? There could be multiple calls to that function along the query's way to executor, each one at a different point (with a parameter specifying which one it is), that would try to rewrite the query optimizing the aggregate. So we could optimize some aggregates at one point, and others at a different point, whichever makes the most sense. -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) Hay dos momentos en la vida de un hombre en los que no deberÃa especular: cuando puede permitÃrselo y cuando no puede (Mark Twain) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Reorganization of the translation files
On Tue, Nov 09, 2004 at 12:00:46PM -0500, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: I was thinking about organizing the translation files in a more efficient manner. (not for 8.0, obviously) [snip] And it would easily solve issues like translatable strings appearing in the pgport library, which has no makefile structure to support translations (which would be a waste for like 5 strings), and no run-time support either. Sounds like a win to me on that grounds alone; but probably the translators need to have the biggest say here, since they'll be affected the most. The biggest downside I see is the long execution time for msgmerge. I can live with that, especially if it will save me translating out of memory six or seven times. The other downside is that the cool status table will be reduced to one line. I can live with that too ... -- Alvaro Herrera (alvherre[a]dcc.uchile.cl) La tristeza es un muro entre dos jardines (Khalil Gibran) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Tom Lane wrote: A more radical way of handling it would be to detect the relevance of an indexscan in indxpath.c and generate a special kind of Path node; this would not generalize to other sorts of things as you were hoping, but I'm unconvinced that the mechanism is going to be very general-purpose anyway. The major advantage is that this would work conveniently for comparing the cost of a rewritten query to a non-rewritten one. I like this point - it makes sense to check that the rewritten query is less costly to execute than the original! How are you planning to represent the association between MIN/MAX and particular index orderings in the system catalogs? That is the next item to think on, we could have a rewrite catalog that holds possible transformations for certain functions (certain aggregates at this stage I guess). This is a bit like Alvaro's idea - however it may be better to represent it the way he suggested! regards Mark ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
On Wed, Nov 10, 2004 at 22:21:31 -0300, Alvaro Herrera [EMAIL PROTECTED] wrote: On Wed, Nov 10, 2004 at 07:18:59PM -0500, Tom Lane wrote: A more radical way of handling it would be to detect the relevance of an indexscan in indxpath.c and generate a special kind of Path node; this would not generalize to other sorts of things as you were hoping, but I'm unconvinced that the mechanism is going to be very general-purpose anyway. The major advantage is that this would work conveniently for comparing the cost of a rewritten query to a non-rewritten one. What about having a new column in pg_aggregate which would point to a function that would try to optimize the aggregate's handling? I think you want to store an operator class and a direction. This allows you to figure out what indexes might be usable. This could be used on all of the max and min aggregates and the boolean and and or aggregates. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Reorganization of the translation files
Alvaro Herrera wrote: The biggest downside I see is the long execution time for msgmerge. I can live with that, especially if it will save me translating out of memory six or seven times. I'm using poedit which can replace empty translations with previous translated values for the same/similar strings. This can save you some time when translating common errors and command-line options; unfotunately, importing strings from other projects (kde, gnome, ...) doesn't help much. :( Regards, Aleksander ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Reorganization of the translation files
I'm not sure about other translators, but I'd like to see two files: one for the backend and one for the utilities. Translating strings can be a slow and tedious process and completing them all at the same time is not likely. With two files you can choose to translate the tools during one development cycle and the backend during the next one, while keeping some overview over your progress at the same time. Regards, Aleksander ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote: I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Out of curiosity, will you be doing this in such a way that SELECT min(foo), max(foo) FROM bar will end up as SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC LIMIT 1) ? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT variable category
Stephan Szabo [EMAIL PROTECTED] writes: Would that mean I wouldn't be able to change the logging level on the fly at all? I would think you'd still be able to do it through a security definer wrapper function owned by a superuser. Oh yeah, well that would be sufficient for my purposes. I must say I thought the behaviour of being able to raise but not lower logging levels beyond what the system had set was pretty slick when I first found out about it. But it's not the most important thing in the world, as long as there's an escape hatch. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? Jim C. Nasby wrote: On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote: I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Out of curiosity, will you be doing this in such a way that SELECT min(foo), max(foo) FROM bar will end up as SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC LIMIT 1) ? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Increasing the length of pg_stat_activity.current_query...
Simon Riggs [EMAIL PROTECTED] writes: My intention was towards a data warehouse situation, and my comments are only relevant in that context. Possibly 25+% of the user base use this style of processing. In that case, I expect queries to run for minutes or hours. I come from the opposite angle but have also ended up with the same conclusion. In an OLTP environment you can't be trying to save every single SQL query in the log file. And saving only queries that take longer than some arbitrary amount of time might not be capturing enough to give a good picture of what's going on. I like the idea of a stats daemon that's isolated from the server by something like UDP and keeps statistics. It would let me turn off logging while still being able to peek into what queries are running, which take the longest, which are being executed the most often, and which are taking the most cumulative time (which isn't necessarily the same thing as either of the other two). The idea of tracking cache misses is great, though in the current design a postgres buffer cache miss doesn't necessarily mean a cache miss. If Postgres moves to O_DIRECT then it would be a valuable statistic, or if instrumentation to test for timing of cache hits and misses is added then it could be a good statistic to have. I can say that with Oracle it was *incredibly* useful to have the queries being executed and cached queryable in the cache. The ora_explain tool that comes with DBD::Oracle makes it extremely easy to identify queries consuming resources, experiment with rewrites, and then copy the new query into the application. It would be great to have something equivalent for Postgres. It would be extremely kludgy by comparison to have to dig through the log files for queries. Much better would be to have an interface to access the data pgstats gathers. But that only works if the entire query is there. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] Vacuum info clarification
The last two lines of VACUUM VERBOSE are: INFO: free space map: 49 relations, 32 pages stored; 784 total pages needed DETAIL: Allocated FSM size: 1000 relations + 2 pages = 182 kB shared memory. I think total pages needed should be total pages used or total pages requested. I am confused about the difference between stored and requested. I read the comments in freespace.c but am still confused. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
On Thu, Nov 11, 2004 at 17:57:42 +1300, Mark Kirkwood [EMAIL PROTECTED] wrote: Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? I don't think you should be rewriting queries as much as providing alternate plans and letting the rest of the optimizer decided which plan to use. If you just rewrite a query you might lock yourself into using a poor plan. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
Alvaro Herrera [EMAIL PROTECTED] writes: What about having a new column in pg_aggregate which would point to a function that would try to optimize the aggregate's handling? I can't get very excited about this, because how would you make a reasonably stable/narrow API for such a thing? The function as you propose it would have to know everything about not only the planner's data representations but the N specific places it would be called from. The existing selectivity functions are bad enough on this score ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)
Why not just change the function all together to 'select $1 from $2 order by $1 desc limit 1;' Is there ANY situation where max(col) as it is, would be faster? ... John ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Certainly handling only one case is better than none. I just wanted to bring up the multiple aggregate scenario. Also, consider that SELECT min(a), max(a), min(b), max(c) FROM table could be optimized as well (into 4 index scans, assuming a, b, and c all had indexes). I don't think any other aggregates are candidates for optimization right now, though I guess I could be wrong. On Thu, Nov 11, 2004 at 05:57:42PM +1300, Mark Kirkwood wrote: Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? Jim C. Nasby wrote: On Thu, Nov 11, 2004 at 11:48:49AM +1300, Mark Kirkwood wrote: I am looking at implementing this TODO item. e.g. (max case): rewrite SELECT max(foo) FROM bar as SELECT foo FROM bar ORDER BY foo DESC LIMIT 1 if there is an index on bar(foo) Out of curiosity, will you be doing this in such a way that SELECT min(foo), max(foo) FROM bar will end up as SELECT (SELECT foo FROM bar ORDER BY foo ASC LIMIT 1), (SELECT ... DESC LIMIT 1) ? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
There seems to be (as Tom indicated) a choice of approaches: i) rewrite max/min querys and then plan 'em ii) provide alternate plans based on presence of certain aggregate types in the query when I first examined this TODO item, I was really thinking about i), but I suspect that ii) is probably the best approach. regards Mark Bruno Wolff III wrote: On Thu, Nov 11, 2004 at 17:57:42 +1300, Mark Kirkwood [EMAIL PROTECTED] wrote: Your example and ones like : SELECT max(foo), count(foo) FROM bar SELECT max(a.foo1), max(b.foo2) FROM bar1 AS a NATURAL JOIN bar2 AS b have made me realize that the scope of what should be optimized is somewhat subtle. I am inclined to keep it simple (i.e rather limited) for a first cut, and if that works well, then look at extending to more complex rewrites. What do you think? I don't think you should be rewriting queries as much as providing alternate plans and letting the rest of the optimizer decided which plan to use. If you just rewrite a query you might lock yourself into using a poor plan. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites
Probably for a small table, where the machinery of reading the index, followed by checking the table for non-visible tuples is more costly than just scanning the table! regards Mark John Hansen wrote: Why not just change the function all together to 'select $1 from $2 order by $1 desc limit 1;' Is there ANY situation where max(col) as it is, would be faster? ... John ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster