Re: [PERFORM] Caching of Queries
I've looked at PREPARE, but apparently it only lasts per-session - that's worthless in our case (web based service, one connection per data-requiring connection). You don't use persistent connections ??? Your problem might simply be the connection time overhead (also including a few TCP roudtrips). ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
Added to TODO: * Add RESET CONNECTION command to reset all session state This would include resetting of all variables (RESET ALL), dropping of all temporary tables, removal of any NOTIFYs, etc. This could be used for connection pooling. We could also change RESET ALL to have this functionality. --- Tatsuo Ishii wrote: > > Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > > First, it's not a particular problem with pgpool. As far as I know any > > > connection pool solution has exactly the same problem. Second, it's > > > easy to fix if PostgreSQL provides a functionarity such as:"drop all > > > temporary tables if any". > > > > I don't like that definition exactly --- it would mean that every time > > we add more backend-local state, we expect client drivers to know to > > issue the right incantation to reset that kind of state. > > > > I'm thinking we need to invent a command like "RESET CONNECTION" that > > resets GUC variables, drops temp tables, forgets active NOTIFYs, and > > generally does whatever else needs to be done to make the session state > > appear virgin. When we add more such state, we can fix it inside the > > backend without bothering clients. > > Great. It's much better than I propose. > > > I now realize that our "RESET ALL" command for GUC variables was not > > fully thought out. We could possibly redefine it as doing the above, > > but that might break some applications ... > > > > regards, tom lane > > > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
> Tatsuo Ishii <[EMAIL PROTECTED]> writes: > > First, it's not a particular problem with pgpool. As far as I know any > > connection pool solution has exactly the same problem. Second, it's > > easy to fix if PostgreSQL provides a functionarity such as:"drop all > > temporary tables if any". > > I don't like that definition exactly --- it would mean that every time > we add more backend-local state, we expect client drivers to know to > issue the right incantation to reset that kind of state. > > I'm thinking we need to invent a command like "RESET CONNECTION" that > resets GUC variables, drops temp tables, forgets active NOTIFYs, and > generally does whatever else needs to be done to make the session state > appear virgin. When we add more such state, we can fix it inside the > backend without bothering clients. Great. It's much better than I propose. > I now realize that our "RESET ALL" command for GUC variables was not > fully thought out. We could possibly redefine it as doing the above, > but that might break some applications ... > > regards, tom lane > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
Tatsuo Ishii <[EMAIL PROTECTED]> writes: > First, it's not a particular problem with pgpool. As far as I know any > connection pool solution has exactly the same problem. Second, it's > easy to fix if PostgreSQL provides a functionarity such as:"drop all > temporary tables if any". I don't like that definition exactly --- it would mean that every time we add more backend-local state, we expect client drivers to know to issue the right incantation to reset that kind of state. I'm thinking we need to invent a command like "RESET CONNECTION" that resets GUC variables, drops temp tables, forgets active NOTIFYs, and generally does whatever else needs to be done to make the session state appear virgin. When we add more such state, we can fix it inside the backend without bothering clients. I now realize that our "RESET ALL" command for GUC variables was not fully thought out. We could possibly redefine it as doing the above, but that might break some applications ... regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
> > I don't know what you are exactly referring to in above URL > > when you are talking about "potential pitfalls of pooling". > > Please explain more. > > Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that > some people aren't necessarily aware of them up front. For instance, pgpool > does an 'abort transaction' and a 'reset all' in lieu of a full reconnect > (of course, since a full reconnect is exactly what we are trying to avoid). > Is this is enough to guarantee that a given pooled connection behaves > exactly as a non-pooled connection would from a client perspective? For > instance, temporary tables are usually dropped at the end of a session, so a > client (badly coded perhaps) that does not already use persistent > connections might be confused when the sequence 'connect, create temp table > foo ..., disconnect, connect, create temp table foo ...' results in the > error 'Relation 'foo' already exists'. First, it's not a particular problem with pgpool. As far as I know any connection pool solution has exactly the same problem. Second, it's easy to fix if PostgreSQL provides a functionarity such as:"drop all temporary tables if any". I think we should implement it if we agree that connection pooling should be implemented outside the PostgreSQL engine itself. I think cores agree with this. -- Tatsuo Ishii ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
> I don't know what you are exactly referring to in above URL > when you are talking about "potential pitfalls of pooling". > Please explain more. Sorry, I wasn't implying that pgpool doesn't deal with the issues, just that some people aren't necessarily aware of them up front. For instance, pgpool does an 'abort transaction' and a 'reset all' in lieu of a full reconnect (of course, since a full reconnect is exactly what we are trying to avoid). Is this is enough to guarantee that a given pooled connection behaves exactly as a non-pooled connection would from a client perspective? For instance, temporary tables are usually dropped at the end of a session, so a client (badly coded perhaps) that does not already use persistent connections might be confused when the sequence 'connect, create temp table foo ..., disconnect, connect, create temp table foo ...' results in the error 'Relation 'foo' already exists'. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote: > Transparent "query caching" is the "industry standard" for how these things > are handled. However, Postgres' lack of this feature has made me consider > other approaches, and I'm starting to wonder if the "standard" query caching > -- where a materialized query result, or some reduction thereof, is cached in > database memory -- isn't the best way to cache things. I'm going to > abbreviate it "SQC" for the rest of this e-mail. Not to quibble, but are you sure that's the standard? Oracle and DB2 don't do this, and I didn't think MSSQL did either. What they do do is cache query *plans*. This is a *huge* deal in Oracle; search http://asktom.oracle.com for 'soft parse'. In any case, I think a means of marking some specific queries as being cachable is an excellent idea; perfect for 'static data' scenarios. What I don't know is how much will be saved. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
> >>More to the point though, I think this is a feature that really really > >>should be in the DB, because then it's trivial for people to use. > >> > >> > > > >How does putting it into PGPool make it any less trivial for people to > >use? > > > The answers are at http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html > . Specifically, it's a separate application that needs configuration, > the homepage has no real discussion of the potential pitfalls of pooling > and what this implementation does to get around them, you get the idea. I don't know what you are exactly referring to in above URL when you are talking about "potential pitfalls of pooling". Please explain more. -- Tatsuo Ishii ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
If it was in pgpool or something similar, I could devote a separate machine just for caching results leaving the db server untouched. BUT you would be limited to caching complete queries. There is a more efficient strategy... ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
1) The materialized data is available in 3 different forms; a list, a detail view, and a spreadsheet. Each form as somewhat different columns and different rules about ordering, which would likely confuse an SQC planner. In this implementation, all 3 forms are able to share the same cache. See my proposal to cache function results. You can create a cached function and : SELECT your rows FROM cached_function(parameters) WHERE ... ORDER BY... GROUP BY... will only fetch the function result from the cache, and then the only additional costs are the ORDER and GROUP BY... the query parsing is very simple, it's just a select, and a "cached function scan" I think caching can be made much more powerful if it is made usable like this. I mean, not only cache a query and its result, but being able to use cached queries internally like this and manipulaing them, adds value to the cached data and allows storing less data in the cache because duplicates are avoided. Thus we could use cached results in CHECK() conditions, inside plsql functions, anywhere... ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. That's why it should not be tied to something specific as pgpool. If you want performance, which is the case here, usually you have a webserver serving static files, and an application server serving dynamic pages. This is not necessarily a huge application server, it can be as simple as an Apache instance serving static files, with a special path mod_proxy'ed to another instance of apache acting as an application server. IMHO this is a nice way to do it, because you have a light weight static files server which can spawn many processes without using precious resources like memory and postgres connections, and a specialized server which has a lot less processes, each one having more size, a db connection, etc. The connexions are permanent, of course, so there is no connection overhead. The proxy has an extra advantage buffering the data from the "app server" and sending it back slowly to the client, so the app server can then very quickly process the next request instead of hogging a db connection while the html is slowly trickled back to the client. IMHO the standard PHP way of doing things (just one server) is wrong because every server process, even if it's serving static files, hogs a connection and thus needs an extra layer for pooling. Thus, I see query result caching as a way to pushing further architectures which are already optimized for performance, not as a band-aid for poor design solutions like the one-apache server with pooling. Now, a proposition : Here is where we are now, a typical slow query : PREPARE myquery(text,integer) EXECUTE myquery('john',2) My proposition : PREPARE myquery(text,integer) PLANNED USING ('john',2) CACHED IF $1 IS NOT NULL AND $2 IS NOT NULL DEPENDS ON $1, $2 MAXIMUM CACHE TIME '5 minute'::interval MINIMUM CACHE TIME '1 minute'::interval MAXIMUM CACHE SIZE 200 AS SELECT count(*) as number FROM mytable WHERE myname=$2 AND myfield>=$1; EXECUTE myquery('john',2) Explainations : --- PLANNED USING ('john',2) Tells the planner to compute the stored query plan using the given parameters. This is independent from caching but could be a nice feature as it would avoid the possibility of storing a bad query plan. --- CACHED IF $1 IS NOT NULL AND $2 IS NOT NULL Specifies that the result is to be cached. There is an optional condition (here, IF ...) telling postgres of when and where it should cache, or not cache. It could be useful to avoid wasting cache space. --- DEPENDS ON $1, $2 Defines the cache key. I don't know if this is useful, as the query parameters make a pretty obvious cache key so why repeat them. It could be used to add other data as a cache key, like : DEPENDS ON (SELECT somefunction($1)) Also a syntax for specifying which tables should be watched for updates, and which should be ignored, could be interesting. --- MAXIMUM CACHE TIME '5 minute'::interval Pretty obvious. --- MINIMUM CACHE TIME '1 minute'::interval This query is a count and I want a fast but imprecise count. Thus, I specify a minimum cache time of 1 minute, meaning that the result will stay in the cache even if the tables change. This is dangerous, so I'd suggest the following : MINIMUM CACHE TIME CASE WHEN result.number>10 THEN '1 minute'::interval ELSE '5 second'::interval Thus the cache time is an expression ; it is evaluated after performed the query. There needs to be a way to access the 'count' result, which I called 'result.number' because of the SELECT count() as number. The result could also be used in the CACHE IF. The idea here is that the count will vary over time, but we accept some imprecision to gain speed. SWho cares if there are 225 or 227 messages in a forum thread counter anyway ? However, if there are 2 messages, first caching the query is less necessary because it's fast, and second a variation in the count will be much easier to spot, thus we specify a shorter cache duration for small counts and a longer duration for large counts. For queries returning result sets, this is not usable of course, but a special feature for speeding count() queries would be welcome ! --- MAXIMUM CACHE SIZE 200 Pretty obvious. Size in bytes. For queries returning several rows, MIN/MAX on result rows could be useful also : MAXIMUM RESULT ROWS nnn Or maybe : CACHE IF (select count(*) from result) > nnn Thinking about it, using prepared queries seems a bad idea ; maybe the cache should act on the result of functions. This would force the application programmers to put the queries they want to optimize in fun
Re: [PERFORM] Caching of Queries
William, > Just my 2 cents on this whole issue. I would lean towards having result > caching in pgpool versus the main backend. I want every ounce of memory > on a database server devoted to the database. Caching results would > double the effect of cache flushing ... ie, now both the results and the > pages used to build the results are in memory pushing out other stuff to > disk that may be just as important. > > If it was in pgpool or something similar, I could devote a separate > machine just for caching results leaving the db server untouched. Oddly, Joe Conway just mentioned the same idea to me. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
Josh Berkus wrote: 1) Query caching is not a single problem, but rather several different problems requiring several different solutions. 2) Of these several different solutions, any particular query result caching implementation (but particularly MySQL's) is rather limited in its applicability, partly due to the tradeoffs required.Per your explanation, Oracle has improved this by offering a number of configurable options. 3) Certain other caching problems would be solved in part by the ability to construct "in-memory" tables which would be non-durable and protected from cache-flushing. This is what I'm interested in chatting about. Just my 2 cents on this whole issue. I would lean towards having result caching in pgpool versus the main backend. I want every ounce of memory on a database server devoted to the database. Caching results would double the effect of cache flushing ... ie, now both the results and the pages used to build the results are in memory pushing out other stuff to disk that may be just as important. If it was in pgpool or something similar, I could devote a separate machine just for caching results leaving the db server untouched. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
Aaron, > I'm not sure I understand your req fully. I'm not surprised.I got wrapped up in an overly involved example and completely left off the points I was illustrating. So here's the points, in brief: 1) Query caching is not a single problem, but rather several different problems requiring several different solutions. 2) Of these several different solutions, any particular query result caching implementation (but particularly MySQL's) is rather limited in its applicability, partly due to the tradeoffs required.Per your explanation, Oracle has improved this by offering a number of configurable options. 3) Certain other caching problems would be solved in part by the ability to construct "in-memory" tables which would be non-durable and protected from cache-flushing. This is what I'm interested in chatting about. BTW, I AM using a summary table. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
I'm not sure I understand your req fully. If the same request is repeatedly done with same parameters, you could implement a proxy web server with a croned script to purge stale pages. If there is substantially the same data being summarized, doing your own summary tables works; if accessed enough, they're in memory. I interleaved some notes into your posting. - Original Message - From: "Josh Berkus" <[EMAIL PROTECTED]> To: "Postgresql Performance" <[EMAIL PROTECTED]> Sent: Friday, October 01, 2004 1:10 PM Subject: Re: [PERFORM] Caching of Queries > People: > > Transparent "query caching" is the "industry standard" for how these things > are handled. However, Postgres' lack of this feature has made me consider > other approaches, and I'm starting to wonder if the "standard" query caching > -- where a materialized query result, or some reduction thereof, is cached in > database memory -- isn't the best way to cache things. I'm going to > abbreviate it "SQC" for the rest of this e-mail. > > Obviously, the draw of SQC is its transparency to developers. With it, the > Java/Perl/PHP programmers and the DBA don't have to communicate at all -- you > set it up, give it some RAM, and it "just works". As someone who frequently > has to consult based on limited knowledge, I can understand the appeal. My sense is that pg is currently unique among popular dbmses in having the majority of applications being homegrown (a chicken / egg / advocacy issue - if I install a CMS, I'm not the DBA or the PHP programmer - and I don't want to change the code; we'll see more about this when native WinPg happens). > > However, one of the problems with SQC, aside from the ones already mentioned > of stale data and/or cache-clearing, is that (at least in applications like > MySQL's) it is indiscriminate and caches, at least breifly, unique queries as > readily as common ones. Possibly Oracle's implementation is more > sophisticated; I've not had an opportunity. I'm not sure I agree here. Stale data and caching choice are optimizer/buffer manager choices and implementation can decide whether to allow stale data. These are design choices involving development effort and choices of where to spend server cycles and memory. All buffering choices cache unique objects, I'm not sure why this is bad (but sensing you want control of the choices). FWIW, this is my impression of other dbmses. In MySQL, a global cache can be specified with size and globally, locally, or through statement hints in queries to suggest caching results. I don't believe that these could be used as common subexpressions (with an exception of MERGE table component results). The optimizer knows nothing about the cached results - SQL select statements are hashed, and can be replaced by the the cached statement/results on a match. In DB2 and Oracle result sets are not cached. They have rich sets of materialized view features (that match your requirements). They allow a materialized view to be synchronous with table updates or asynchronous. Synchronous is often an unrealistic option, and asynchronous materialized views are refreshed at a specified schedule. The optimizers allow "query rewrite" (in Oracle it is a session option) so one can connect to the database and specify that the optimizer is allowed to replace subexpressions with data from (possibly stale) materialized views. SQL Server 2K has more restrictive synchronous MVs, but I've never used them. So, in your example use in Oracle, you would need to define appropriate MVs with a ½ hour refresh frequency, and hope that the planner would use them in your queries. The only change in the app is on connection you would allow use of asynchronous stale data. You're suggesting an alternative involving identifying common, but expensive, subexpressions and generating MVs for them. This is a pretty sophisticated undertaking, and probably requires some theory research to determine if it's viable. > > The other half of that problem is that an entire query is cached, rather than > just the relevant data to uniquely identify the request to the application. > This is bad in two respects; one that the entire query needs to be parsed to > see if a new query is materially equivalent, and that two materially > different queries which could utilize overlapping ranges of the same > underlying result set must instead cache their results separately, eating up > yet more memory. There are two separate issues. The cost of parse/optimization and the cost of results retrieval. Other dbmses hash statement text. This is a good thing, and probably 3 orders of magnitude faster than parse and optimization. (Oracle also has options to replace li
Re: [PERFORM] Caching of Queries
People: Transparent "query caching" is the "industry standard" for how these things are handled. However, Postgres' lack of this feature has made me consider other approaches, and I'm starting to wonder if the "standard" query caching -- where a materialized query result, or some reduction thereof, is cached in database memory -- isn't the best way to cache things. I'm going to abbreviate it "SQC" for the rest of this e-mail. Obviously, the draw of SQC is its transparency to developers. With it, the Java/Perl/PHP programmers and the DBA don't have to communicate at all -- you set it up, give it some RAM, and it "just works". As someone who frequently has to consult based on limited knowledge, I can understand the appeal. However, one of the problems with SQC, aside from the ones already mentioned of stale data and/or cache-clearing, is that (at least in applications like MySQL's) it is indiscriminate and caches, at least breifly, unique queries as readily as common ones. Possibly Oracle's implementation is more sophisticated; I've not had an opportunity. The other half of that problem is that an entire query is cached, rather than just the relevant data to uniquely identify the request to the application. This is bad in two respects; one that the entire query needs to be parsed to see if a new query is materially equivalent, and that two materially different queries which could utilize overlapping ranges of the same underlying result set must instead cache their results seperately, eating up yet more memory. To explain what I'm talking about, let me give you a counter-example of another approach. I have a data-warehousing application with a web front-end.The data in the application is quite extensive and complex, and only a summary is presented to the public users -- but that summary is a query involving about 30 lines and 16 joins. This summary information is available in 3 slightly different forms. Further, the client has indicated that an up to 1/2 hour delay in data "freshness" is acceptable. The first step is forcing that "materialized" view of the data into memory. Right now I'm working on a reliable way to do that without using Memcached, which won't install on our Solaris servers. Temporary tables have the annoying property of being per-connection, which doesn't work in a pool of 60 connections. The second step, which I completed first due to the lack of technical obstacles, is to replace all queries against this data with calls to a Set-Returning Function (SRF). This allowed me to re-direct where the data was coming from -- presumably the same thing could be done through RULES, but it would have been considerably harder to implement. The first thing the SRF does is check the criteria passed to it against a set of cached (in a table) criteria with that user's permission level which is < 1/2 hour old. If the same criteria are found, then the SRF is returned a set of row identifiers for the materialized view (MV), and looks up the rows in the MV and returns those to the web client. If no identical set of criteria are found, then the query is run to get a set of identifiers which are then cached, and the SRF returns the queried rows. Once I surmount the problem of storing all the caching information in protected memory, the advantages of this approach over SQC are several: 1) The materialized data is available in 3 different forms; a list, a detail view, and a spreadsheet. Each form as somewhat different columns and different rules about ordering, which would likely confuse an SQC planner. In this implementation, all 3 forms are able to share the same cache. 2) The application is comparing only sets of unambguous criteria rather than long queries which would need to be compared in planner form in order to determine query equivalence. 3) With the identifier sets, we are able to cache other information as well, such as a count of rows, further limiting the number of queries we must run. 4) This approach is ideally suited to the pagination and re-sorting common to a web result set. As only the identifiers are cached, the results can be re-sorted and broken in to pages after the cache read, a fast, all-in-memory operation. In conclusion, what I'm saying is that while forms of transparent query caching (plan, materialized or whatever) may be desirable for other reasons, it's quite possible to acheive a superior level of "query caching" through tight integration with the front-end application. If people are interested in this, I'd love to see some suggestions on ways to force the materialized view into dedicated memory. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
The context of the discussion was a hack to speed queries against static tables, so MVCC is not relevent. As soon as any work unit against a referenced table commits, the cache is invalid, and in fact the table shouldn't be a candidate for this caching for a while. In fact, this cache would reduce some the MVCC 'select count(*) from us_states' type of horrors. (The attraction of a server side cache is obviously that it could *with no server or app changes* dramatically improve performance. A materialized view is a specialized denormalization-ish mechanism to optimize a category of queries and requires the DBA to sweat the details. It is very hard to cache things stochastically without writing a server. Trigger managed extracts won't help you execute 1,000 programs issuing the query "select sec_level from sec where division=23" each second or a big table loaded monthly.) - Original Message - From: "Jeff" <[EMAIL PROTECTED]> To: "Mitch Pirtle" <[EMAIL PROTECTED]> Cc: "Aaron Werman" <[EMAIL PROTECTED]>; "Scott Kirkwood" <[EMAIL PROTECTED]>; "Neil Conway" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; "Tom Lane" <[EMAIL PROTECTED]> Sent: Monday, September 27, 2004 2:25 PM Subject: Re: [PERFORM] Caching of Queries > [ discussion of server side result caching ] > > and lets not forget PG's major fork it will throw into things: MVCC > The results of query A may hold true for txn 1, but not txn 2 and so on > . > That would have to be taken into account as well and would greatly > complicate things. > > It is always possible to do a "poor man"'s query cache with triggers.. > which would just leave you with basically a materialized view. > > -- > Jeff Trout <[EMAIL PROTECTED]> > http://www.jefftrout.com/ > http://www.stuarthamm.net/ > > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
> OK, that'd work too... the point is if you're re-connecting > all the time it doesn't really matter what else you do for > performance. Yeah, although there is the chap who was asking questions on the list recently who had some very long-running code on his app servers, so was best off closing the connection because he had far too many postmaster processes just sitting there idle all the time! But you're right, it's a killer usually. M ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 06:43:42AM +0100, Matt Clark wrote: > > >If you're not using a connection pool of some kind then you might as > >well forget query plan caching, because your connect overhead will swamp > >the planning cost. This does not mean you have to use something like > >pgpool (which makes some rather questionable claims IMO); any decent web > >application language/environment will support connection pooling. > > > > > Hmm, a question of definition - there's a difference between a pool and > a persistent connection. Pretty much all web apps have one connection > per process, which is persistent (i.e. not dropped and remade for each > request), but not shared between processes, therefore not pooled. OK, that'd work too... the point is if you're re-connecting all the time it doesn't really matter what else you do for performance. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
If you're not using a connection pool of some kind then you might as well forget query plan caching, because your connect overhead will swamp the planning cost. This does not mean you have to use something like pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. Hmm, a question of definition - there's a difference between a pool and a persistent connection. Pretty much all web apps have one connection per process, which is persistent (i.e. not dropped and remade for each request), but not shared between processes, therefore not pooled. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
On Mon, Sep 27, 2004 at 09:30:31PM +0100, Matt Clark wrote: > It's certainly the case that the typical web app (which, along with > warehouses, seems to be one half of the needy apps), could probably do > worse than use pooling as well. I'm not well up enough on pooling to > know how bulletproof it is though, which is why I included it in my list > of things that make me go 'hmm'. It would be really nice not to > have to take both things together. If you're not using a connection pool of some kind then you might as well forget query plan caching, because your connect overhead will swamp the planning cost. This does not mean you have to use something like pgpool (which makes some rather questionable claims IMO); any decent web application language/environment will support connection pooling. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
I could spend a week or two tweaking the performance of my database servers and probably make some sizeable improvements, but I'm not going to. Why? Because PostgreSQL screams as it is. I would make sure that if the consensus is to add some sort of caching that it be done only if there is no hit to current performance and stability. That being said, I think that server side caching has major buzz and there's nothing wrong with adding features that sell. I will disagree with 3 points made on the argument against caching. Specifically, the benefit of doing caching on the db server is that the benefits may be reaped by multiple clients where as caching on the client side must be done by each client and may not be as effective. So what if the caching has a slight chance of returning stale results? Just make sure people know about it in advance. There are some things where stale results are no big deal and if I can easily benefit from an aggressive caching system, I will (and I do now with the adodb caching library, but like I said, caching has to be done for each client). In fact, I'm all for using a low-tech cache expiration algorithm to keep complexity down. Finally, if the caching is not likely to help (or may even hurt) simple queries but is likely to help complex queries then fine, make sure people know about it and let them decide if they can benefit. Sorry if I'm beating a dead horse or playing the devil's advocate. Just felt compelled to chime in. -- Matthew Nuzum + "Man was born free, and everywhere www.bearfruit.org : he is in chains," Rousseau +~~+ "Then you will know the truth, and the TRUTH will set you free," Jesus Christ (John 8:32 NIV) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Monday, September 27, 2004 1:19 AM To: Neil Conway Cc: Aaron Werman; Scott Kirkwood; [EMAIL PROTECTED] Subject: Re: [PERFORM] Caching of Queries Neil Conway <[EMAIL PROTECTED]> writes: > I think the conclusion of past discussions about this feature is that > it's a bad idea. Last I checked, MySQL has to clear the *entire* query > cache when a single DML statement modifying the table in question is > issued. Do they actually make a rigorous guarantee that the cached result is still accurate when/if it is returned to the client? (That's an honest question --- I don't know how MySQL implements this.) IIRC, in our past threads on this topic, it was suggested that if you can tolerate not-necessarily-up-to-date results, you should be doing this sort of caching on the client side and not in the DB server at all. I wouldn't try that in a true "client" scenario, but when the DB client is application-server middleware, it would make some sense to cache in the application server. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
- Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Aaron Werman" <[EMAIL PROTECTED]> Cc: "Iain" <[EMAIL PROTECTED]>; "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, September 28, 2004 9:58 AM Subject: Re: [PERFORM] Caching of Queries > "Aaron Werman" <[EMAIL PROTECTED]> writes: > > I imagine a design where a shared plan cache would consist of the plans, > > indexed by a statement hash and again by dependant objects. A statement to > > be planned would be hashed and matched to the cache. DDL would need to > > synchronously destroy all dependant plans. If each plan maintains a validity > ^ > > flag, changing the cache wouldn't have to block so I don't see where there >^^ > > would be contention. > > You have contention to access a shared data structure *at all* -- for > instance readers must lock out writers. Or didn't you notice the self- > contradictions in what you just said? > > Our current scalability problems dictate reducing such contention, not > adding whole new sources of it. You're right - that seems unclear. What I meant is that there can be a global hash table that is never locked, and the hashes point to chains of plans that are only locally locked for maintenance, such as gc and chaining hash collisions. If maintenance was relatively rare and only local, my assumption is that it wouldn't have global impact. The nice thing about plan caching is that it can be sloppy, unlike block cache, because it is only an optimization tweak. So, for example, if the plan has atomic refererence times or counts there is no need to block, since overwriting is not so bad. If the multiprocessing planner chains the same plan twice, the second one would ultimately age out /Aaron > > regards, tom lane > ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
"Aaron Werman" <[EMAIL PROTECTED]> writes: > I imagine a design where a shared plan cache would consist of the plans, > indexed by a statement hash and again by dependant objects. A statement to > be planned would be hashed and matched to the cache. DDL would need to > synchronously destroy all dependant plans. If each plan maintains a validity ^ > flag, changing the cache wouldn't have to block so I don't see where there ^^ > would be contention. You have contention to access a shared data structure *at all* -- for instance readers must lock out writers. Or didn't you notice the self- contradictions in what you just said? Our current scalability problems dictate reducing such contention, not adding whole new sources of it. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
- Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Iain" <[EMAIL PROTECTED]> Cc: "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, September 27, 2004 11:17 PM Subject: Re: [PERFORM] Caching of Queries > "Iain" <[EMAIL PROTECTED]> writes: > > I can only tell you (roughly) how it works wth Oracle, > > Which unfortunately has little to do with how it works with Postgres. > This "latches" stuff is irrelevant to us. Latches are the Oracle term for semaphores. Both Oracle and pg use semaphores and spin locks to serialize activity in critical sections. I believe that the point that blocking/queuing reduces scalability is valid. > > In practice, any repetitive planning in PG is going to be consulting > catalog rows that it draws from the backend's local catalog caches. > After the first read of a given catalog row, the backend won't need > to re-read it unless the associated table has a schema update. (There > are some other cases, like a VACUUM FULL of the catalog the rows came > from, but in practice catalog cache entries don't change often in most > scenarios.) We need place only one lock per table referenced in order > to interlock against schema updates; not one per catalog row used. > > The upshot of all this is that any sort of shared plan cache is going to > create substantially more contention than exists now --- and that's not > even counting the costs of managing the cache, ie deciding when to throw > away entries. I imagine a design where a shared plan cache would consist of the plans, indexed by a statement hash and again by dependant objects. A statement to be planned would be hashed and matched to the cache. DDL would need to synchronously destroy all dependant plans. If each plan maintains a validity flag, changing the cache wouldn't have to block so I don't see where there would be contention. > > A backend-local plan cache would avoid the contention issues, but would > of course not allow amortizing planning costs across multiple backends. > > I'm personally dubious that sharing planning costs is a big deal. > Simple queries generally don't take that long to plan. Complicated > queries do, but I think the reusability odds go down with increasing > query complexity. > I think both the parse and planning are major tasks if the transaction rate is high. Simple queries can easily take much longer to plan than execute, so this is a scalability concern. Caching complicated queries is valuable - apps seem to have lots of similar queries because they are intimately related to the data model. > regards, tom lane > > ---(end of broadcast)--- > TIP 6: Have you searched our list archives? > >http://archives.postgresql.org > ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
Hi Tom, (B (B> This "latches" stuff is irrelevant to us. (B (BWell, that's good to know anyway, thanks for setting me straight. Maybe (BOracle could take a leaf out of PGs book instead of the other way around. I (Brecall that you mentioned the caching of the schema before, so even though I (Bassumed PG was latching the metadata, I had begun to wonder if it was (Bactually neccessary. (B (BWhile it7s obviously not as critical as I thought, I think there may still (Bbe some potential for query caching by pg. It would be nice to have the (Boption anyway, as different applications have different needs. (B (BI think that re-use of SQL in applications (ie controlling the proliferation (Bof SQL statements that are minor variants of each other) is a good goal for (Bmaintainability, even if it doesn't have a major impact on performance as it (Bseems you are suggesting in the case of pg. Even complex queries that must (Bbe constructed dynamically typically only have a finite number of options (Band can still use bind variables, so in a well tuned system, they should (Bstill be viable candidates for caching (ie, if they aren't being bumped out (Bof the cache by thousands of little queries not using binds). (B (BI'll just finish by saying that, developing applications in a way that would (Btake advantage of any query caching still seems like good practice to me, (Beven if the target DBMS has no query caching. For now, that's what I plan to (Bdo with future PG/Oracle/Hypersonic (my 3 favourite DBMSs) application (Bdevelopment anyway. (B (BRegards (BIain (B (B (B- Original Message - (BFrom: "Tom Lane" <[EMAIL PROTECTED]> (BTo: "Iain" <[EMAIL PROTECTED]> (BCc: "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> (BSent: Tuesday, September 28, 2004 12:17 PM (BSubject: Re: [PERFORM] Caching of Queries (B (B (B> "Iain" <[EMAIL PROTECTED]> writes: (B> > I can only tell you (roughly) how it works wth Oracle, (B> (B> Which unfortunately has little to do with how it works with Postgres. (B> This "latches" stuff is irrelevant to us. (B> (B> In practice, any repetitive planning in PG is going to be consulting (B> catalog rows that it draws from the backend's local catalog caches. (B> After the first read of a given catalog row, the backend won't need (B> to re-read it unless the associated table has a schema update. (There (B> are some other cases, like a VACUUM FULL of the catalog the rows came (B> from, but in practice catalog cache entries don't change often in most (B> scenarios.) We need place only one lock per table referenced in order (B> to interlock against schema updates; not one per catalog row used. (B> (B> The upshot of all this is that any sort of shared plan cache is going to (B> create substantially more contention than exists now --- and that's not (B> even counting the costs of managing the cache, ie deciding when to throw (B> away entries. (B> (B> A backend-local plan cache would avoid the contention issues, but would (B> of course not allow amortizing planning costs across multiple backends. (B> (B> I'm personally dubious that sharing planning costs is a big deal. (B> Simple queries generally don't take that long to plan. Complicated (B> queries do, but I think the reusability odds go down with increasing (B> query complexity. (B> (B> regards, tom lane (B> (B> ---(end of broadcast)--- (B> TIP 6: Have you searched our list archives? (B> (B>http://archives.postgresql.org (B (B (B---(end of broadcast)--- (BTIP 2: you can get off all lists at once with the unregister command (B(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
"Iain" <[EMAIL PROTECTED]> writes: > I can only tell you (roughly) how it works wth Oracle, Which unfortunately has little to do with how it works with Postgres. This "latches" stuff is irrelevant to us. In practice, any repetitive planning in PG is going to be consulting catalog rows that it draws from the backend's local catalog caches. After the first read of a given catalog row, the backend won't need to re-read it unless the associated table has a schema update. (There are some other cases, like a VACUUM FULL of the catalog the rows came from, but in practice catalog cache entries don't change often in most scenarios.) We need place only one lock per table referenced in order to interlock against schema updates; not one per catalog row used. The upshot of all this is that any sort of shared plan cache is going to create substantially more contention than exists now --- and that's not even counting the costs of managing the cache, ie deciding when to throw away entries. A backend-local plan cache would avoid the contention issues, but would of course not allow amortizing planning costs across multiple backends. I'm personally dubious that sharing planning costs is a big deal. Simple queries generally don't take that long to plan. Complicated queries do, but I think the reusability odds go down with increasing query complexity. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
Jim, I can only tell you (roughly) how it works wth Oracle, and it's a very well documented and laboured point over there - it's the cornerstone of Oracle's scalability architecture, so if you don't believe me, or my explanation is just plain lacking, then it wouldn't be a bad idea to check it out. The "other Tom" aka Tomas Kyte runs the Ask Tom site which is a great source of info on this. It's also very well explained in his book "Expert one on one Oracle" I think it was called. I havn't seen any reason yet as to why the same issues shouldn't, don't or wouldn't apply to pg. Your comment is both right and wrong. Yes, metadata lookups are essentially the same as as access methods for normal queries. Any time you read data in the DB you have to place a shared lock, often called a latch - it's a lightweight type of lock. The trouble is that while a data page can have multiple latches set at any time, only 1 process can be placing a a latch on a page at a time. This doesn't sound so serious so far, latches are "lightweight" afterall, however... even in a database of a billion rows and 100+ tables, the database metadata is a very _small_ area. You must put latches on the metadata tables to do optimization, so for example, if you are optimizing a 10 table join, you must queue up 10 times to place your latchs. You then do your optimization and queue up 10 more times to remove your latches. In fact it is worse than this, because you won't queue up 10 times it's more likely to be a hundred times since it is far more complex than 1 latch per table being optimized (you will be looking up statistics and other things). As I already said, even in a huge DB of a billion rows, these latches are happening on a realatively small and concentrated data set - the metadata. Even if there is no contention for the application data, the contention for the metadata may be furious. Consider this scenario, you have a 1000 users constantly submitting queries that must not only be soft parsed (SQL statement syntax) but hard parsed (optimized) because you have no query cache. Even if they are looking at completely different data, they'll all be queuing up for latches on the same little patch of metadata. Doubling your CPU speed or throwing in a fibre channel disk array will not help here, the system smply won't scale. Tom Lane noted that since the query cache would be in shared memory the contention issue does not go away. This is true, but I don't think that it's hard to see that the amount of contention is consderably less in any system that is taking advantage of the caching facility - ie applications using bind variables to reduce hard parsing. However, badly written applications (from the point of view of query cache utilization) could very well experience a degradation in performance. This could be handled with an option to disable caching - or even better to disable caching of any sql not using binds. I don't think even the mighty Oracle has that option. As you may have guessed, my vote is for implementing a query cache that includes plans. I have no specific preference as to data caching. It doesn't seem to be so important to me. Regards Iain > On Thu, Sep 23, 2004 at 08:29:25AM -0700, Mr Pink wrote: > > Not knowing anything about the internals of pg, I don't know how this relates, but in theory, > > query plan caching is not just about saving time re-planning queries, it's about scalability. > > Optimizing queries requires shared locks on the database metadata, which, as I understand it > > causes contention and serialization, which kills scalability. > > One of the guru's can correct me if I'm wrong here, but AFAIK metadata > lookups use essentially the same access methods as normal queries. This > means MVCC is used and no locking is required. Even if locks were > required, they would be shared read locks which wouldn't block each > other. > -- > Jim C. Nasby, Database Consultant [EMAIL PROTECTED] > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
Any competently written application where caching results would be a suitable performance boost can already implement application or middleware caching fairly easily, and increase performance much more than putting result caching into the database would. I guess the performance increase is that you can spend $10,000 on a developer, or $10,000 on hardware, and for the most part get a more reliable result the second way. MemcacheD is fine(ish), but it's not a panacea, and it's more than easy to shoot yourself in the foot with it. Caching is hard enough that lots of people do it badly - I'd rather use an implementation from the PG team than almost anywhere else. I don't see caching results in the database as much of a win for most well written applications. Toy benchmarks, sure, but for real apps it seems it would add a lot of complexity, and violate the whole point of using an ACID database. Well the point surely is to _remove_ complexity from the application, which is written by God Knows Who, and put it in the DB, which is written by God And You. And you can still have ACID (cached data is not the same as stale data, although once you have the former, the latter can begin to look tempting sometimes). M ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? The answers are at http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html . Specifically, it's a separate application that needs configuration, the homepage has no real discussion of the potential pitfalls of pooling and what this implementation does to get around them, you get the idea. I'm sure it's great software, but it doesn't come as part of the DB server, so 95% of people who would benefit from query caching being implemented in it never will. If it shipped with and was turned on by default in SUSE or RedHat that would be a different matter. Which I realise makes me look like one of those people who doesn't appreciate code unless it's 'popular', but I hope I'm not *that* bad... Oh OK, I'll say it, this is a perfect example of why My*** has so much more mindshare. It's not better, but it sure makes the average Joe _feel_ better. Sorry, I've got my corporate hat on today, I'm sure I'll feel a little less cynical tomorrow. M
Re: [PERFORM] Caching of Queries
On Mon, Sep 27, 2004 at 09:19:12PM +0100, Matt Clark wrote: > >Basically you set a default in seconds for the HTML results to be > >cached, and then have triggers set that force the cache to regenerate > >(whenever CRUD happens to the content, for example). > > > >Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a > >believer out of me! > > > > > > > Nice to have it in a library, but if you want to be that simplistic then > it's easy in any language. What if a process on server B modifies a n > important value that server A has cached though? Coherency (albeit that > the client may choose to not use it) is a must for a general solution. memcached is one solution designed for that situation. Easy to use from most languages. Works. Lets you use memory on systems where you have it, rather than using up valuable database server RAM that's better spent caching disk sectors. Any competently written application where caching results would be a suitable performance boost can already implement application or middleware caching fairly easily, and increase performance much more than putting result caching into the database would. I don't see caching results in the database as much of a win for most well written applications. Toy benchmarks, sure, but for real apps it seems it would add a lot of complexity, and violate the whole point of using an ACID database. (Caching parse trees or query plans, though? It'd be interesting to model what effect that'd have.) Cheers, Steve ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
> More to the point though, I think this is a feature that really really > should be in the DB, because then it's trivial for people to use. How does putting it into PGPool make it any less trivial for people to use? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
It might be easiest to shove the caching logic into pgpool instead. ... When pg_pool is told to cache a query, it can get a table list and monitor for changes. When it gets changes, simply dumps the cache. It's certainly the case that the typical web app (which, along with warehouses, seems to be one half of the needy apps), could probably do worse than use pooling as well. I'm not well up enough on pooling to know how bulletproof it is though, which is why I included it in my list of things that make me go 'hmm'. It would be really nice not to have to take both things together. More to the point though, I think this is a feature that really really should be in the DB, because then it's trivial for people to use. Taking an existing production app and justifying a switch to an extra layer of pooling software is relatively hard compared with grabbing data from a view instead of a table (or setting a variable, or adding a tweak to a query, or however else it might be implemented). Eminiently doable in pgpool though, and just the right thing for anyone already using it. M ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
Basically you set a default in seconds for the HTML results to be cached, and then have triggers set that force the cache to regenerate (whenever CRUD happens to the content, for example). Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a believer out of me! Nice to have it in a library, but if you want to be that simplistic then it's easy in any language. What if a process on server B modifies a n important value that server A has cached though? Coherency (albeit that the client may choose to not use it) is a must for a general solution. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
On Thu, Sep 23, 2004 at 08:29:25AM -0700, Mr Pink wrote: > Not knowing anything about the internals of pg, I don't know how this relates, but > in theory, > query plan caching is not just about saving time re-planning queries, it's about > scalability. > Optimizing queries requires shared locks on the database metadata, which, as I > understand it > causes contention and serialization, which kills scalability. One of the guru's can correct me if I'm wrong here, but AFAIK metadata lookups use essentially the same access methods as normal queries. This means MVCC is used and no locking is required. Even if locks were required, they would be shared read locks which wouldn't block each other. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
On Mon, 27 Sep 2004 18:20:48 +0100, Matt Clark <[EMAIL PROTECTED]> wrote: > This is very true. Client side caching is an enormous win for apps, but it > requires quite a lot of logic, triggers to update last-modified fields on > relevant tables, etc etc. Moving some of this logic to the DB would perhaps > not usually be quite as efficient as a bespoke client caching solution, but > it will above all be a lot easier for the application developer! In the world of PHP it is trivial thanks to PEAR's Cache_Lite. The project lead for Mambo implemented page-level caching in a day, and had all the triggers for clearing the cache included in the content management interface - not difficult at all. Basically you set a default in seconds for the HTML results to be cached, and then have triggers set that force the cache to regenerate (whenever CRUD happens to the content, for example). Can't speak for Perl/Python/Ruby/.Net/Java, but Cache_Lite sure made a believer out of me! -- Mitch ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
[ discussion of server side result caching ] and lets not forget PG's major fork it will throw into things: MVCC The results of query A may hold true for txn 1, but not txn 2 and so on . That would have to be taken into account as well and would greatly complicate things. It is always possible to do a "poor man"'s query cache with triggers.. which would just leave you with basically a materialized view. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
There is a difference between MySQL and Oracle here. Oracle, to reduce parse/planner costs, hashes statements to see if it can match an existing optimizer plan. This is optional and there are a few flavors that range from a characher to characyter match through parse tree matches through replacing of literals in the statements with parameters. This dramatically improves performance in almost all high transaction rate systems. MySQL stores a statement with its results. This is optional and when a client allows this type of processing, the SQL is hashed and matched to the statement - and the stored *result* is returned. The point is that a lot of systems do lots of static queries, such as a pick list on a web page - but if the data changes the prior result is returned. This (plus a stable jdbc driver) was the reason MySQL did well in the eWeek database comparison. /Aaron - Original Message - From: "Scott Kirkwood" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, September 22, 2004 3:50 PM Subject: [PERFORM] Caching of Queries > I couldn't find anything in the docs or in the mailing list on this, > but it is something that Oracle appears to do as does MySQL. > The idea, I believe, is to do a quick (hash) string lookup of the > query and if it's exactly the same as another query that has been done > recently to re-use the old parse tree. > It should save the time of doing the parsing of the SQL and looking up > the object in the system tables. > It should probably go through the planner again because values passed > as parameters may have changed. Although, for extra points it could > look at the previous query plan as a hint. > On the surface it looks like an easy enhancement, but what do I know? > I suppose it would benefit mostly those programs that use a lot of > PQexecParams() with simple queries where a greater percentage of the > time is spent parsing the SQL rather than building the execute plan. > What do you think? > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries (now with pgpool)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Michael Adler wrote: | On Thu, Sep 23, 2004 at 09:23:51PM -0400, Jason Coene wrote: | |>I ran some previous queries to get pgpool to pre-establish all the |>connections, and ab ran for a few minutes (with one query per page, eek!). |>It was still exhibiting the same problems as before. While so many new |>connections at once can surely make the problem worse (and pgpool will |>surely help there), shouldn't this prove that it's not the only issue? | | |>Any ideas? | | | Now that your connections are persistent, you may benefit from using | PREPAREd queries. | | -Mike With his load will not change anything. Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBVany7UpzwH2SGd4RAj9UAJ0SO3VE7zMbwrgdwPQc+HP5PHClMACgtTvn KIp1TK2lVbmXZ+s62fpJ46U= =sjT0 -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Joshua D. Drake wrote: | |>>Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 |>>connections |>>/ second is realistic, if not underestimated. As is the nature of |>>our site |>>(realtime information about online gaming), there's a huge fan base |>>and as a |>>big upset happens, we'll do 50,000 page views in a span of 3-5 |>>minutes. |>> |>> |> |>First, your posts show no evidences of the CS storm bug. |> |>Second, 300 *new* connections a second is a lot. Each new connection |>requires a significant amount of both database and OS overhead. This |>is why all the other web developers use a connection pool. |> |> |> | I would second this. You need to be running a connection pool and | probably multiple web servers in | front of that. You are talking about a huge amount of connections in | that amount of time. | | Josh Drake | | | |>In fact, I wouldn't be surprised if your lockups are on the OS level, |>even; I don't recall that you cited what OS you're using, but I can |>imagine locking up Linux 2.4 trying to spawn 300 new processes a |>second. Not to mention that a proxy squid mounted in reverse proxy mode will help a lot. Regards Gaetano Mendola -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.5 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBVaOg7UpzwH2SGd4RAnW4AJ9TYV0oSjYcv8Oxt4Ot/T/nJikoRgCg1Egx r4KKm14ziu/KWFb3SnTK/U8= =xgmw -END PGP SIGNATURE- ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Caching of Queries (now with pgpool)
On Thu, Sep 23, 2004 at 09:23:51PM -0400, Jason Coene wrote: > I ran some previous queries to get pgpool to pre-establish all the > connections, and ab ran for a few minutes (with one query per page, eek!). > It was still exhibiting the same problems as before. While so many new > connections at once can surely make the problem worse (and pgpool will > surely help there), shouldn't this prove that it's not the only issue? > Any ideas? Now that your connections are persistent, you may benefit from using PREPAREd queries. -Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Caching of Queries (now with pgpool)
Hi Josh, I just tried using pgpool to pool the connections, and ran: ab -n 1000 -c 50 http://wstg.int/portal/news/ I ran some previous queries to get pgpool to pre-establish all the connections, and ab ran for a few minutes (with one query per page, eek!). It was still exhibiting the same problems as before. While so many new connections at once can surely make the problem worse (and pgpool will surely help there), shouldn't this prove that it's not the only issue? We're running FreeBSD 5.2.1 I've attached open locks, running queries, query plans, top output and vmstat 5 output for while ab was running, from start to finish. Any ideas? Jason > -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-performance- > [EMAIL PROTECTED] On Behalf Of Josh Berkus > Sent: Thursday, September 23, 2004 8:06 PM > To: Jason Coene; [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: [PERFORM] Caching of Queries > > Jason, > > > Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 > > connections > > / second is realistic, if not underestimated. As is the nature of > > our site > > (realtime information about online gaming), there's a huge fan base > > and as a > > big upset happens, we'll do 50,000 page views in a span of 3-5 > > minutes. > > First, your posts show no evidences of the CS storm bug. > > Second, 300 *new* connections a second is a lot. Each new connection > requires a significant amount of both database and OS overhead. This > is why all the other web developers use a connection pool. > > In fact, I wouldn't be surprised if your lockups are on the OS level, > even; I don't recall that you cited what OS you're using, but I can > imagine locking up Linux 2.4 trying to spawn 300 new processes a > second. > > --Josh > > ---(end of broadcast)--- > TIP 8: explain analyze is your friend OPEN LOCKS: gf=# SELECT r.relname, l."mode", count(*) AS numlocks FROM pg_locks l, pg_class r WHERE r.oid = l.relation GROUP BY r.relname, l."mode" ORDER BY count(*) DESC; relname | mode | numlocks --+ threads | AccessShareLock | 43 threads_ix_nuked_lastpost| AccessShareLock | 35 threads_ix_nuked_gameid_lastpost | AccessShareLock |7 pg_attribute | AccessShareLock |1 v_locks | AccessShareLock |1 pg_class | AccessShareLock |1 usersessions | AccessShareLock |1 countries| AccessShareLock |1 users| AccessShareLock |1 userstats_ix_id | AccessShareLock |1 pg_statistic_relid_att_index | AccessShareLock |1 pg_attribute_relid_attnum_index | AccessShareLock |1 userstats| AccessShareLock |1 demos| AccessShareLock |1 pg_cast_source_target_index | AccessShareLock |1 pg_locks | AccessShareLock |1 users_ix_id | AccessShareLock |1 buddies | AccessShareLock |1 buddies_ix_userid| AccessShareLock |1 pg_cast | AccessShareLock |1 pg_statistic | AccessShareLock |1 (21 rows) RUNNING QUERIES (AND HOW LONG FOR): gf=# SELECT pg_stat_activity.usename, round(date_part('epoch'::text, now() - pg_stat_activity.query_start)) AS duration, pg_stat_activity.current_query FROM pg_stat_activity ORDER BY round(date_part('epoch'::text, now() - pg_stat_activity.query_start)) DESC; usename | duration | current_query -+--+--- -- gf |4 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 AND gameid = 1 ORDER BY nuked DESC, gameid DESC, lastpost DESC LIMIT 8 gf |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 ORDER BY nuked DESC, lastpost DESC LIMIT 8 gf |3 | SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 ORDER BY nuked DESC, lastpost DESC LIMIT 8 gf |3 | SELECT id, gameid, forumid, subject FROM threads WHERE
Re: [PERFORM] Caching of Queries
Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 connections / second is realistic, if not underestimated. As is the nature of our site (realtime information about online gaming), there's a huge fan base and as a big upset happens, we'll do 50,000 page views in a span of 3-5 minutes. First, your posts show no evidences of the CS storm bug. Second, 300 *new* connections a second is a lot. Each new connection requires a significant amount of both database and OS overhead. This is why all the other web developers use a connection pool. I would second this. You need to be running a connection pool and probably multiple web servers in front of that. You are talking about a huge amount of connections in that amount of time. Josh Drake In fact, I wouldn't be surprised if your lockups are on the OS level, even; I don't recall that you cited what OS you're using, but I can imagine locking up Linux 2.4 trying to spawn 300 new processes a second. --Josh ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Re: [PERFORM] Caching of Queries
Jason, > Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 > connections > / second is realistic, if not underestimated. As is the nature of > our site > (realtime information about online gaming), there's a huge fan base > and as a > big upset happens, we'll do 50,000 page views in a span of 3-5 > minutes. First, your posts show no evidences of the CS storm bug. Second, 300 *new* connections a second is a lot. Each new connection requires a significant amount of both database and OS overhead. This is why all the other web developers use a connection pool. In fact, I wouldn't be surprised if your lockups are on the OS level, even; I don't recall that you cited what OS you're using, but I can imagine locking up Linux 2.4 trying to spawn 300 new processes a second. --Josh ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
Update: I just tried running the same test (ab with 150 concurrent connections) while connecting to postgres through 35 persistent connections (PHP library), and had roughly the same type of results. This should eliminate the "new connection" overhead. I've attached top and vmstat. I let it run until it had completed 800 requests. Unless I'm missing something, there's more than the "new connection" IO load here. Jason > -Original Message- > From: Jason Coene [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 3:08 PM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: RE: Caching of Queries > > Hi All, > > It does sound like we should be pooling connections somehow. I'll be > looking at implementing that shortly. I'd really like to understand what > the actual problem is, though. > > Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 > connections > / second is realistic, if not underestimated. As is the nature of our > site > (realtime information about online gaming), there's a huge fan base and as > a > big upset happens, we'll do 50,000 page views in a span of 3-5 minutes. > > I get the same results with: > > ab -n 1 -c 150 http://www.gotfrag.com/portal/news/ > > I've attached results from the above test, showing open locks, top output, > and vmstat 5. > > Tom, I've run the test described in: > > http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php > > Results attached in mptest.txt. The box did experience the same problems > as > we've seen before. I ran it under a separate database (test), and it > still > caused our other queries to slow significantly from our production > database > (gf) - semwait again. > > It does look like the "cs" column under CPU (which I'd assume is Context > Swap) does bump up significantly (10x or better) during both my ab test, > and > the test you suggested in that archived message. > > Reading the first thread you pointed out (2004-04/msg00249.php), Josh > Berkus > was questioning the ServerWorks chipsets. We're running on the Intel > E7501 > Chipset (MSI board). Our CPU's are 2.66 GHz with 533MHz FSB, > Hyperthreading > enabled. Unfortunately, I don't have physical access to the machine to > turn > HT off. > > > Thanks, > > Jason > > > > > -Original Message- > > From: Gaetano Mendola [mailto:[EMAIL PROTECTED] > > Sent: Thursday, September 23, 2004 1:41 PM > > To: Jason Coene > > Subject: Re: Caching of Queries > > > > Jason Coene wrote: > > > Hi Tom, > > > > > > Easily recreated with Apache benchmark, "ab -n 3 -c 3000 > > > http://webserver ". This runs 1 query per page, everything else is > > cached > > > on webserver. > > > > That test require 3 access with 3000 connections that is not a > normal > > load. Describe us your HW. > > > > 3000 connections means a very huge load, may you provide also the result > > of > > "vmstat 5" my webserver trash already with -c 120 ! > > > > how many connection your postgres can manage ? > > > > You have to consider to use a connection pool with that ammount of > > connections. > > > > > > Regards > > Gaetano Mendola last pid: 48239; load averages: 5.83, 2.43, 1.50 up 19+22:59:04 15:19:12 127 processes: 16 running, 111 sleeping CPU states: 17.7% user, 0.0% nice, 20.0% system, 1.0% interrupt, 61.3% idle Mem: 125M Active, 1456M Inact, 193M Wired, 96M Cache, 112M Buf, 54M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPUCPU COMMAND 48190 pgsql -40 97408K 86416K semwai 1 0:01 3.35% 1.32% postgres 47761 pgsql -40 96816K 56708K semwai 2 0:01 0.90% 0.88% postgres 47765 pgsql 40 96816K 56708K sbwait 3 0:01 0.90% 0.88% postgres 47754 pgsql -40 96816K 56708K semwai 2 0:01 0.85% 0.83% postgres 47763 pgsql -40 96816K 56708K semwai 0 0:01 0.85% 0.83% postgres 47741 pgsql 40 96816K 56708K sbwait 3 0:01 0.75% 0.73% postgres 47674 pgsql -40 96264K 38992K semwai 1 0:01 0.74% 0.73% postgres 47753 pgsql -40 96816K 56708K semwai 1 0:00 0.65% 0.63% postgres 48204 pgsql -40 96856K 46752K semwai 0 0:00 2.15% 0.63% postgres 47698 pgsql 40 96240K 37792K sbwait 3 0:01 0.59% 0.59% postgres 47757 pgsql -40 96816K 56708K semwai 3 0:01 0.60% 0.59% postgres 47740 pgsql 40 96240K 37768K sbwait 0 0:01 0.55% 0.54% postgres 47759 pgsql 40 96816K 56708K sbwait 0 0:01 0.50% 0.49% postgres 47735 pgsql -40 96240K 37772K semwai 0 0:00 0.50% 0.49% postgres 48223 pgsql -40 96984K 55980K semwai 1 0:00 2.69% 0.49% postgres 48102 pgsql 40 96136K 54956K sbwait 0 0:00 0.69% 0.44% postgres 47718 pgsql 40 96816K 56716K sbwait 1 0:01 0.40% 0.39% postgres 48225 pgsql1230 96272K 57156K RUN0 0:00 2.80% 0.39% postgres 48053 pgsql -40 96136K 55040K sem
Re: [PERFORM] Caching of Queries
Hi All, It does sound like we should be pooling connections somehow. I'll be looking at implementing that shortly. I'd really like to understand what the actual problem is, though. Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 connections / second is realistic, if not underestimated. As is the nature of our site (realtime information about online gaming), there's a huge fan base and as a big upset happens, we'll do 50,000 page views in a span of 3-5 minutes. I get the same results with: ab -n 1 -c 150 http://www.gotfrag.com/portal/news/ I've attached results from the above test, showing open locks, top output, and vmstat 5. Tom, I've run the test described in: http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php Results attached in mptest.txt. The box did experience the same problems as we've seen before. I ran it under a separate database (test), and it still caused our other queries to slow significantly from our production database (gf) - semwait again. It does look like the "cs" column under CPU (which I'd assume is Context Swap) does bump up significantly (10x or better) during both my ab test, and the test you suggested in that archived message. Reading the first thread you pointed out (2004-04/msg00249.php), Josh Berkus was questioning the ServerWorks chipsets. We're running on the Intel E7501 Chipset (MSI board). Our CPU's are 2.66 GHz with 533MHz FSB, Hyperthreading enabled. Unfortunately, I don't have physical access to the machine to turn HT off. Thanks, Jason > -Original Message- > From: Gaetano Mendola [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 1:41 PM > To: Jason Coene > Subject: Re: Caching of Queries > > Jason Coene wrote: > > Hi Tom, > > > > Easily recreated with Apache benchmark, "ab -n 3 -c 3000 > > http://webserver ". This runs 1 query per page, everything else is > cached > > on webserver. > > That test require 3 access with 3000 connections that is not a normal > load. Describe us your HW. > > 3000 connections means a very huge load, may you provide also the result > of > "vmstat 5" my webserver trash already with -c 120 ! > > how many connection your postgres can manage ? > > You have to consider to use a connection pool with that ammount of > connections. > > > Regards > Gaetano Mendola gf=# SELECT r.relname, l."mode", count(*) AS numlocks FROM pg_locks l, pg_class r WHERE r.oid = l.relation GROUP BY r.relname, l."mode" ORDER BY count(*) DESC; relname | mode | numlocks --+-+-- threads | AccessShareLock | 63 threads_ix_nuked_lastpost| AccessShareLock | 47 threads_ix_nuked_gameid_lastpost | AccessShareLock |7 pg_class | AccessShareLock |5 pg_opclass_am_name_nsp_index | AccessShareLock |3 pg_opclass | AccessShareLock |3 pg_class_oid_index | AccessShareLock |3 pg_type | AccessShareLock |2 pg_statistic | AccessShareLock |2 pg_attribute | AccessShareLock |2 pg_amop_opc_strategy_index | AccessShareLock |2 pg_attrdef | AccessShareLock |2 pg_trigger_tgrelid_tgname_index | AccessShareLock |2 pg_trigger | AccessShareLock |2 users| AccessShareLock |2 pg_statistic_relid_att_index | AccessShareLock |2 pg_type_oid_index| AccessShareLock |2 pg_amop | AccessShareLock |2 pg_attribute_relid_attnum_index | AccessShareLock |2 comments | AccessShareLock |2 pg_shadow| AccessShareLock |2 acls | AccessShareLock |1 pg_index_indexrelid_index| AccessShareLock |1 pg_attrdef_adrelid_adnum_index | AccessShareLock |1 surveyresults_ix_userid | AccessShareLock |1 pg_cast | AccessShareLock |1 pg_shadow_usesysid_index | AccessShareLock |1 pg_index | AccessShareLock |1 games| AccessShareLock |1 usersessions | AccessShareLock |1 surveyoptions| AccessShareLock |1 countries| AccessShareLock |1 surveyresults| AccessShareLock |1 vopenlocks | AccessShareLock |1 pg_class_relname_nsp_index | AccessShareLock |1 pg_namespace | AccessShareLock |1 pg_database | AccessShareLock |1 surveys
Re: [PERFORM] Caching of Queries
Josh Berkus <[EMAIL PROTECTED]> writes: >> I think you are probably looking at the same problem previously reported >> by Josh Berkus among others. > That would be interesting. Previously we've only demonstrated the > problem on long-running queries, but I suppose it could also affect > massive concurrent query access. Well, the test cases we used were designed to get the system into a tight loop of grabbing and releasing shared buffers --- a long-running index scan is certainly one of the best ways to do that, but there are others. I hadn't focused before on the point that Jason is launching a new connection for every query. In that scenario I think the bulk of the cycles are going to go into loading the per-backend catalog caches with the system catalog rows that are needed to parse and plan the query. The catalog fetches to get those rows are effectively mini-queries with preset indexscan plans, so it's not hard to believe that they'd be hitting the BufMgrLock nearly as hard as a tight indexscan loop. Once all the pages needed are cached in shared buffers, there's no I/O delays to break the loop, and so you could indeed get into the context swap storm regime we saw before. I concur with the thought that using persistent connections might go a long way towards alleviating his problem. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
Tom, > I think you are probably looking at the same problem previously reported > by Josh Berkus among others. Does the rate of context swaps shown by > vmstat go through the roof when this happens? If you strace or ktrace > one of the backends, do you see lots of semop()s and little else? That would be interesting. Previously we've only demonstrated the problem on long-running queries, but I suppose it could also affect massive concurrent query access. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
Hi, Jason, On Thu, 23 Sep 2004 12:53:25 -0400 "Jason Coene" <[EMAIL PROTECTED]> wrote: > I've looked at PREPARE, but apparently it only lasts per-session - that's > worthless in our case (web based service, one connection per data-requiring > connection). This sounds like the loads of connection init and close may be the reason for the slowdown. Can you use connection pooling in your service? HTH, Markus -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:[EMAIL PROTECTED] | www.logi-track.com ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
Jason Coene wrote: Hi Tom, Easily recreated with Apache benchmark, "ab -n 3 -c 3000 http://webserver ". This runs 1 query per page, everything else is cached on webserver. That test require 3 access with 3000 connections that is not a normal load. Describe us your HW. 3000 connections means a very huge load, may you provide also the result of "vmstat 5" my webserver trash already with -c 120 ! how many connection your postgres can manage ? You have to consider to use a connection pool with that ammount of connections. Regards Gaetano Mendola ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
> I've looked at PREPARE, but apparently it only lasts > per-session - that's worthless in our case (web based > service, one connection per data-requiring connection). That's a non-sequitur. Most 'normal' high volume web apps have persistent DB connections, one per http server process. Are you really dropping DB connections and reconnecting each time a new HTTP request comes in? M ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
"Jason Coene" <[EMAIL PROTECTED]> writes: > I'm not sure how I go about getting the stack traceback you need. Any info > on this? Results of "ps" below. System is dual xeon 2.6, 2gb ram, hardware > raid 10 running FreeBSD 5.2.1. Hmm. Dual Xeon sets off alarm bells ... I think you are probably looking at the same problem previously reported by Josh Berkus among others. Does the rate of context swaps shown by vmstat go through the roof when this happens? If you strace or ktrace one of the backends, do you see lots of semop()s and little else? Check the archives for this thread among others: http://archives.postgresql.org/pgsql-performance/2004-04/msg00249.php The test case you are talking about is a tight indexscan loop, which is pretty much the same scenario as here: http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php The fundamental problem is heavy contention for access to a shared data structure. We're still looking for good solutions, but in the context of this thread it's worth pointing out that a shared query-plan cache would itself be subject to heavy contention, and arguably would make this sort of problem worse not better. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Caching of Queries
"Jason Coene" <[EMAIL PROTECTED]> writes: > All of our "postgres" processes end up in the "semwai" state - seemingly > waiting on other queries to complete. If the system isn't taxed in CPU or > disk, I have a good feeling that this may be the cause. Well, it's possible contention of some sort is an issue but it's not clear that it's planning related contention. > We're running on SELECT's, and the number of locks on our "high traffic" > tables grows to the hundreds. Where are you seeing this? What information do you have about these locks? > I've looked at PREPARE, but apparently it only lasts per-session - that's > worthless in our case (web based service, one connection per data-requiring > connection). Well the connection time in postgres is pretty quick. But a lot of other things, including prepared queries but also including other factors are a lot more effective if you have long-lived sessions. I would strongly recommend you consider some sort of persistent database connection for your application. Most web based services run queries from a single source base where all the queries are written in-house. In that situation you can ensure that one request never leaves the session in an unusual state (like setting guc variables strangely, or leaving a transaction open, or whatever). That saves you the reconnect time, which as I said is actually small, but could still be contributing to your problem. I think it also makes the buffer cache more effective as well. And It also means you can prepare all your queries and reuse them on subsequent requests. The nice thing about web based services is that while each page only executes each query once, you tend to get the same pages over and over thousands of times. So if they prepare their queries the first time around they can reuse those prepared queries thousands of times. Using a text cache of the query string on the server side is just a work-around for failing to do that on the client side. It's much more efficient and more flexible to do it on the client-side. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
Hi Tom, Easily recreated with Apache benchmark, "ab -n 3 -c 3000 http://webserver ". This runs 1 query per page, everything else is cached on webserver. The lone query: SELECT id, gameid, forumid, subject FROM threads WHERE nuked = 0 ORDER BY nuked DESC, lastpost DESC LIMIT 8 Limit (cost=0.00..1.99 rows=8 width=39) (actual time=27.865..28.027 rows=8 loops=1) -> Index Scan Backward using threads_ix_nuked_lastpost on threads (cost=0.0 0..16824.36 rows=67511 width=39) (actual time=27.856..27.989 rows=8 loops=1) Filter: (nuked = 0) Total runtime: 28.175 ms I'm not sure how I go about getting the stack traceback you need. Any info on this? Results of "ps" below. System is dual xeon 2.6, 2gb ram, hardware raid 10 running FreeBSD 5.2.1. Jason last pid: 96094; load averages: 0.22, 0.35, 0.38 up 19+20:50:37 13:10:45 161 processes: 2 running, 151 sleeping, 8 lock CPU states: 12.2% user, 0.0% nice, 16.9% system, 1.6% interrupt, 69.4% idle Mem: 120M Active, 1544M Inact, 194M Wired, 62M Cache, 112M Buf, 2996K Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPUCPU COMMAND 50557 pgsql 980 95276K 4860K select 0 24:00 0.59% 0.59% postgres 95969 pgsql 40 96048K 34272K sbwait 0 0:00 2.10% 0.29% postgres 95977 pgsql -40 96048K 29620K semwai 2 0:00 1.40% 0.20% postgres 96017 pgsql 40 96048K 34280K sbwait 0 0:00 2.05% 0.20% postgres 95976 pgsql -40 96048K 30564K semwai 3 0:00 1.05% 0.15% postgres 95970 pgsql -40 96048K 24404K semwai 1 0:00 1.05% 0.15% postgres 95972 pgsql -40 96048K 21060K semwai 1 0:00 1.05% 0.15% postgres 96053 pgsql -40 96048K 24140K semwai 3 0:00 1.54% 0.15% postgres 96024 pgsql -40 96048K 22192K semwai 3 0:00 1.54% 0.15% postgres 95985 pgsql -40 96048K 15208K semwai 3 0:00 1.54% 0.15% postgres 96033 pgsql 980 95992K 7812K *Giant 2 0:00 1.54% 0.15% postgres 95973 pgsql -40 96048K 30936K semwai 3 0:00 0.70% 0.10% postgres 95966 pgsql 40 96048K 34272K sbwait 0 0:00 0.70% 0.10% postgres 95983 pgsql 40 96048K 34272K sbwait 2 0:00 1.03% 0.10% postgres 95962 pgsql 40 96048K 34268K sbwait 2 0:00 0.70% 0.10% postgres 95968 pgsql -40 96048K 26232K semwai 2 0:00 0.70% 0.10% postgres 95959 pgsql 40 96048K 34268K sbwait 2 0:00 0.70% 0.10% postgres > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 1:06 PM > To: Jason Coene > Cc: 'Mr Pink'; 'Scott Kirkwood'; [EMAIL PROTECTED] > Subject: Re: [PERFORM] Caching of Queries > > "Jason Coene" <[EMAIL PROTECTED]> writes: > > All of our "postgres" processes end up in the "semwai" state - seemingly > > waiting on other queries to complete. If the system isn't taxed in CPU > or > > disk, I have a good feeling that this may be the cause. > > Whatever that is, I'll bet lunch that it's got 0 to do with caching > query plans. Can you get stack tracebacks from some of the stuck > processes? What do they show in "ps"? > > regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
Scott: We have seen similar issues when we have had massive load on our web server. My determination was that simply the act of spawning and stopping postgres sessions was very heavy on the box, and by implementing connection pooling (sqlrelay), we got much higher throughput, and better response on the server then we would get any other way. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jason Coene Sent: Thursday, September 23, 2004 10:53 AM To: 'Mr Pink'; 'Scott Kirkwood' Cc: [EMAIL PROTECTED] Subject: Re: [PERFORM] Caching of Queries I'm not an expert, but I've been hunting down a killer performance problem for a while now. It seems this may be the cause. At peak load, our database slows to a trickle. The CPU and disk utilization are normal - 20-30% used CPU and disk performance good. All of our "postgres" processes end up in the "semwai" state - seemingly waiting on other queries to complete. If the system isn't taxed in CPU or disk, I have a good feeling that this may be the cause. I didn't know that planning queries could create such a gridlock, but based on Mr Pink's explanation, it sounds like a very real possibility. We're running on SELECT's, and the number of locks on our "high traffic" tables grows to the hundreds. If it's not the SELECT locking (and we don't get that many INSERT/UPDATE on these tables), could the planner be doing it? At peak load (~ 1000 queries/sec on highest traffic table, all very similar), the serialized queries pile up and essentially create a DoS on our service - requiring a restart of the PG daemon. Upon stop & start, it's back to normal. I've looked at PREPARE, but apparently it only lasts per-session - that's worthless in our case (web based service, one connection per data-requiring connection). Does this sound plausible? Is there an alternative way to do this that I don't know about? Additionally, in our case, I personally don't see any downside to caching and using the same query plan when the only thing substituted are variables. In fact, I'd imagine it would help performance significantly in high-volume web applications. Thanks, Jason > -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-performance- > [EMAIL PROTECTED] On Behalf Of Mr Pink > Sent: Thursday, September 23, 2004 11:29 AM > To: Scott Kirkwood; [EMAIL PROTECTED] > Subject: Re: [PERFORM] Caching of Queries > > Not knowing anything about the internals of pg, I don't know how this > relates, but in theory, > query plan caching is not just about saving time re-planning queries, it's > about scalability. > Optimizing queries requires shared locks on the database metadata, which, > as I understand it > causes contention and serialization, which kills scalability. > > I read this thread from last to first, and I'm not sure if I missed > something, but if pg isnt > caching plans, then I would say plan caching should be a top priority for > future enhancements. It > needn't be complex either: if the SQL string is the same, and none of the > tables involved in the > query have changed (in structure), then re-use the cached plan. Basically, > DDL and updated > statistics would have to invalidate plans for affected tables. > > Preferably, it should work equally for prepared statements and those not > pre-prepared. If you're > not using prepare (and bind variables) though, your plan caching down the > drain anyway... > > I don't think that re-optimizing based on values of bind variables is > needed. It seems like it > could actually be counter-productive and difficult to asses it's impact. > > That's the way I see it anyway. > > :) > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
Jason Coene wrote: I'm not an expert, but I've been hunting down a killer performance problem for a while now. It seems this may be the cause. At peak load, our database slows to a trickle. The CPU and disk utilization are normal - 20-30% used CPU and disk performance good. For a peak load 20-30% used CPU this mean you reached your IO bottleneck. All of our "postgres" processes end up in the "semwai" state - seemingly waiting on other queries to complete. If the system isn't taxed in CPU or disk, I have a good feeling that this may be the cause. I didn't know that planning queries could create such a gridlock, but based on Mr Pink's explanation, it sounds like a very real possibility. We're running on SELECT's, and the number of locks on our "high traffic" tables grows to the hundreds. If it's not the SELECT locking (and we don't get that many INSERT/UPDATE on these tables), could the planner be doing it? At peak load (~ 1000 queries/sec on highest traffic table, all very similar), the serialized queries pile up and essentially create a DoS on our service - requiring a restart of the PG daemon. Upon stop & start, it's back to normal. Give us informations on this queries, a explain analyze could be a good start point. I've looked at PREPARE, but apparently it only lasts per-session - that's worthless in our case (web based service, one connection per data-requiring connection). Trust me the PREPARE is not doing miracle in shenarios like yours . If you use postgres in a web service environment what you can use is a connection pool ( look for pgpoll IIRC ), if you use a CMS then try to enable the cache in order to avoid to hit the DB for each request. Regards Gaetano Mendola ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Caching of Queries
"Jason Coene" <[EMAIL PROTECTED]> writes: > All of our "postgres" processes end up in the "semwai" state - seemingly > waiting on other queries to complete. If the system isn't taxed in CPU or > disk, I have a good feeling that this may be the cause. Whatever that is, I'll bet lunch that it's got 0 to do with caching query plans. Can you get stack tracebacks from some of the stuck processes? What do they show in "ps"? regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Caching of Queries
I'm not an expert, but I've been hunting down a killer performance problem for a while now. It seems this may be the cause. At peak load, our database slows to a trickle. The CPU and disk utilization are normal - 20-30% used CPU and disk performance good. All of our "postgres" processes end up in the "semwai" state - seemingly waiting on other queries to complete. If the system isn't taxed in CPU or disk, I have a good feeling that this may be the cause. I didn't know that planning queries could create such a gridlock, but based on Mr Pink's explanation, it sounds like a very real possibility. We're running on SELECT's, and the number of locks on our "high traffic" tables grows to the hundreds. If it's not the SELECT locking (and we don't get that many INSERT/UPDATE on these tables), could the planner be doing it? At peak load (~ 1000 queries/sec on highest traffic table, all very similar), the serialized queries pile up and essentially create a DoS on our service - requiring a restart of the PG daemon. Upon stop & start, it's back to normal. I've looked at PREPARE, but apparently it only lasts per-session - that's worthless in our case (web based service, one connection per data-requiring connection). Does this sound plausible? Is there an alternative way to do this that I don't know about? Additionally, in our case, I personally don't see any downside to caching and using the same query plan when the only thing substituted are variables. In fact, I'd imagine it would help performance significantly in high-volume web applications. Thanks, Jason > -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-performance- > [EMAIL PROTECTED] On Behalf Of Mr Pink > Sent: Thursday, September 23, 2004 11:29 AM > To: Scott Kirkwood; [EMAIL PROTECTED] > Subject: Re: [PERFORM] Caching of Queries > > Not knowing anything about the internals of pg, I don't know how this > relates, but in theory, > query plan caching is not just about saving time re-planning queries, it's > about scalability. > Optimizing queries requires shared locks on the database metadata, which, > as I understand it > causes contention and serialization, which kills scalability. > > I read this thread from last to first, and I'm not sure if I missed > something, but if pg isnt > caching plans, then I would say plan caching should be a top priority for > future enhancements. It > needn't be complex either: if the SQL string is the same, and none of the > tables involved in the > query have changed (in structure), then re-use the cached plan. Basically, > DDL and updated > statistics would have to invalidate plans for affected tables. > > Preferably, it should work equally for prepared statements and those not > pre-prepared. If you're > not using prepare (and bind variables) though, your plan caching down the > drain anyway... > > I don't think that re-optimizing based on values of bind variables is > needed. It seems like it > could actually be counter-productive and difficult to asses it's impact. > > That's the way I see it anyway. > > :) > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
Not knowing anything about the internals of pg, I don't know how this relates, but in theory, query plan caching is not just about saving time re-planning queries, it's about scalability. Optimizing queries requires shared locks on the database metadata, which, as I understand it causes contention and serialization, which kills scalability. I read this thread from last to first, and I'm not sure if I missed something, but if pg isnt caching plans, then I would say plan caching should be a top priority for future enhancements. It needn't be complex either: if the SQL string is the same, and none of the tables involved in the query have changed (in structure), then re-use the cached plan. Basically, DDL and updated statistics would have to invalidate plans for affected tables. Preferably, it should work equally for prepared statements and those not pre-prepared. If you're not using prepare (and bind variables) though, your plan caching down the drain anyway... I don't think that re-optimizing based on values of bind variables is needed. It seems like it could actually be counter-productive and difficult to asses it's impact. That's the way I see it anyway. :) --- Scott Kirkwood <[EMAIL PROTECTED]> wrote: > I couldn't find anything in the docs or in the mailing list on this, > but it is something that Oracle appears to do as does MySQL. > The idea, I believe, is to do a quick (hash) string lookup of the > query and if it's exactly the same as another query that has been done > recently to re-use the old parse tree. > It should save the time of doing the parsing of the SQL and looking up > the object in the system tables. > It should probably go through the planner again because values passed > as parameters may have changed. Although, for extra points it could > look at the previous query plan as a hint. > On the surface it looks like an easy enhancement, but what do I know? > I suppose it would benefit mostly those programs that use a lot of > PQexecParams() with simple queries where a greater percentage of the > time is spent parsing the SQL rather than building the execute plan. > What do you think? > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > ___ Do you Yahoo!? Declare Yourself - Register online to vote today! http://vote.yahoo.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Caching of Queries
In article <[EMAIL PROTECTED]>, Scott Kirkwood <[EMAIL PROTECTED]> writes: > I couldn't find anything in the docs or in the mailing list on this, > but it is something that Oracle appears to do as does MySQL. > The idea, I believe, is to do a quick (hash) string lookup of the > query and if it's exactly the same as another query that has been done > recently to re-use the old parse tree. That's not was MySQL is doing. MySQL caches not the query plan, but the result set for the (hashed) query string. If the same query comes again, it is not executed at all (unless one of the tables involved have been changed meanwhile). ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Caching of Queries
Neil Conway wrote: Another idea would be to improve the quality of the plan we generate at PREPARE time: for instance you could generate 'n' plans for various combinations of input parameters, and then choose the best query plan at EXECUTE time. It's a difficult problem to solve, however (consider multiple parameters to PREPARE, for example). Do you mean store different plans for each different histogram segment ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
On 22 Sep 2004 at 15:59, Tom Lane wrote: > Scott Kirkwood <[EMAIL PROTECTED]> writes: > > What do you think? > > I think this would allow the problems of cached plans to bite > applications that were previously not subject to them :-(. > An app that wants plan re-use can use PREPARE to identify the > queries that are going to be re-executed. > > regards, tom lane > And then there are the people that would like to upgrade and get a performance gain without having to change their programs. A simple conf flag could turn query/plan caching off for all those that rely on each statement being re-planned. This is where SQLServer etc. tend to get big wins. I know from direct comparisons that SQLServer often takes quite a bit longer to parse/plan a select statement than Postgres, but wins out overall from its query/plan caching. Regards, Gary. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Caching of Queries
On Thu, 2004-09-23 at 05:59, Tom Lane wrote: > I think this would allow the problems of cached plans to bite > applications that were previously not subject to them :-(. > An app that wants plan re-use can use PREPARE to identify the > queries that are going to be re-executed. I agree; if you want to do some work in this area, making improvements to PREPARE would IMHO be the best bet. For example, some people have talked about having PREPARE store queries in shared memory. Another idea would be to improve the quality of the plan we generate at PREPARE time: for instance you could generate 'n' plans for various combinations of input parameters, and then choose the best query plan at EXECUTE time. It's a difficult problem to solve, however (consider multiple parameters to PREPARE, for example). -Neil ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching of Queries
Scott Kirkwood <[EMAIL PROTECTED]> writes: > What do you think? I think this would allow the problems of cached plans to bite applications that were previously not subject to them :-(. An app that wants plan re-use can use PREPARE to identify the queries that are going to be re-executed. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Caching of Queries
I couldn't find anything in the docs or in the mailing list on this, but it is something that Oracle appears to do as does MySQL. The idea, I believe, is to do a quick (hash) string lookup of the query and if it's exactly the same as another query that has been done recently to re-use the old parse tree. It should save the time of doing the parsing of the SQL and looking up the object in the system tables. It should probably go through the planner again because values passed as parameters may have changed. Although, for extra points it could look at the previous query plan as a hint. On the surface it looks like an easy enhancement, but what do I know? I suppose it would benefit mostly those programs that use a lot of PQexecParams() with simple queries where a greater percentage of the time is spent parsing the SQL rather than building the execute plan. What do you think? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings