Re: [PERFORM] Problem analyzing explain analyze output
Steinar, which seems to make sense; you have one run of about 257ms, plus 514 runs taking about 0.035ms each (ie. about 18ms), which should add up to become about 275ms (which is close enough to the reality of 281ms). Yep. The line that disturbed me was the bitmap index scan with a cost of actual time=254.143..254.143. I was more looking for something like actual time=0..254.143 which is what I usually have for an index scan. So I suppose that the bitmap index scan returns rows only when it's totally computed. Thanks for your help. Regards. -- Guillaume ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What gets cached?
Just to play devils advocate here for as second, but if we have an algorithm that is substational better than just plain old LRU, which is what I believe the kernel is going to use to cache pages (I'm no kernel hacker), then why don't we apply that and have a significantly larger page cache a la Oracle? AlexOn 10/21/05, Neil Conway [EMAIL PROTECTED] wrote: On Fri, 2005-21-10 at 07:34 -0500, Martin Nickel wrote: Let's say I do the same thing in Postgres.I'm likely to have my very fastest performance for the first few queries until memory gets filled up. No, you're not: if a query doesn't hit the cache (both the OS cache andthe Postgres userspace cache), it will run slower. If the caches areempty when Postgres starts up (which is true for the userspace cache and might be true of the OS cache), the first queries that are run should beslower, not faster.The only time Postgres seems to take advantage of cached data is when Irepeat the same (or substantially the same) query. Caching is done on a page-by-page basis -- the source text of the queryitself is not relevant. If two different queries happen to hit a similarset of pages, they will probably both benefit from the same set of cached pages. I don't know of any way to view what is actually cached at any point in time, but it seems like most recently used rather than most frequently used. The cache replacement policy in 7.4 and older releases is simple LRU.The policy in 8.0 is ARC (essentially a version of LRU modified to tryto retain hot pages more accurately). The policy in 8.1 is a clock-based algorithm.-Neil---(end of broadcast)---TIP 6: explain analyze is your friend
Re: [PERFORM] What gets cached?
On Mon, Oct 24, 2005 at 11:09:55AM -0400, Alex Turner wrote: Just to play devils advocate here for as second, but if we have an algorithm that is substational better than just plain old LRU, which is what I believe the kernel is going to use to cache pages (I'm no kernel hacker), then why don't we apply that and have a significantly larger page cache a la Oracle? There have (AFAIK) been reports of setting huge amounts of shared_buffers (close to the total amount of RAM) performing much better in 8.1 than in earlier versions, so this might actually be okay these days. I haven't heard of anybody reporting increase setting such values, though. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Used Memory
In addition to what Mark pointed out, there is the possibility that a query is running which is scanning a large table or otherwise bringing in a large number of pages from disk. That would first use up all available unused cache space, and then may start replacing some of your frequently used data. This can cause slowness for some time after the process which flushed the cache, as pages are reread and recached. Keep in mind that the cache could be flushed by some external process, such as copying disk files. The use of free memory for caching is not slowing you down; but if it coincides with slowness, it could be a useful clue. -Kevin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Used Memory
Kevin Grittner wrote: In addition to what Mark pointed out, there is the possibility that a query is running which is scanning a large table or otherwise bringing in a large number of pages from disk. That would first use up all available unused cache space, and then may start replacing some of your frequently used data. An LRU cache is often a bad strategy for database applications. There are two illustrations that show why. 1. You have an index that's used for EVERY search, but each search returns a large and unique set of rows. If it happens that the rows returned exceed the systems cache size, the part or all of your index will be flushed with EVERY query. 2. You do a sequential scan of a table that's one block bigger than the file system cache, then you do it again. At the beginning of the second scan, the first block of the table will have just been swapped out because it was the oldest, so the file system brings it back in, replacing the second block, which is now the oldest. As you scan the table, each block of the table is swapped out JUST BEFORE you get to it. At the start of your query, the file system might have had 99.9% of the relevant data in memory, but it swaps out 100% of it as your query progresses. Scenario 2 above is interesting because a system that is performing very well can suddenly experience a catastrophic performance decline when the size of the data exceeds a critical limit - the file system's avaliable cache. LRU works well if your frequently-used data is used often enough to keep it in memory. But many applications don't have that luxury. It's often the case that a single query will exceed the file system's cache size. The file system cache is dumb -- it's strategy is too simple for a relational database. What's needed is a way for the application developer to explicitely say, This object is frequenly used, and I want it kept in memory. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Inefficient escape codes.
Now this interests me a lot. Please clarify this: I have 5000 tables, one for each city: City1_Photos, City2_Photos, ... City5000_Photos. Each of these tables are: CREATE TABLE CityN_Photos (location text, lo_id largeobectypeiforgot) So, what's the limit for these large objects? I heard I could only have 4 billion records for the whole database (not for each table). Is this true? If this isn't true, then would postgres manage to create all the large objects I ask him to? Also, this would be a performance penalty, wouldn't it? Much thanks for the knowledge shared, Rodrigo
Re: [PERFORM] Used Memory
On Mon, 2005-10-24 at 12:00, Craig A. James wrote: Kevin Grittner wrote: In addition to what Mark pointed out, there is the possibility that a query is running which is scanning a large table or otherwise bringing in a large number of pages from disk. That would first use up all available unused cache space, and then may start replacing some of your frequently used data. An LRU cache is often a bad strategy for database applications. There are two illustrations that show why. 1. You have an index that's used for EVERY search, but each search returns a large and unique set of rows. If it happens that the rows returned exceed the systems cache size, the part or all of your index will be flushed with EVERY query. 2. You do a sequential scan of a table that's one block bigger than the file system cache, then you do it again. At the beginning of the second scan, the first block of the table will have just been swapped out because it was the oldest, so the file system brings it back in, replacing the second block, which is now the oldest. As you scan the table, each block of the table is swapped out JUST BEFORE you get to it. At the start of your query, the file system might have had 99.9% of the relevant data in memory, but it swaps out 100% of it as your query progresses. Scenario 2 above is interesting because a system that is performing very well can suddenly experience a catastrophic performance decline when the size of the data exceeds a critical limit - the file system's avaliable cache. LRU works well if your frequently-used data is used often enough to keep it in memory. But many applications don't have that luxury. It's often the case that a single query will exceed the file system's cache size. The file system cache is dumb -- it's strategy is too simple for a relational database. What's needed is a way for the application developer to explicitely say, This object is frequenly used, and I want it kept in memory. There's an interesting conversation happening on the linux kernel hackers mailing list right about now that applies: http://www.gossamer-threads.com/lists/linux/kernel/580789 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Used Memory
Scott Marlowe wrote: What's needed is a way for the application developer to explicitely say, This object is frequenly used, and I want it kept in memory. There's an interesting conversation happening on the linux kernel hackers mailing list right about now that applies: http://www.gossamer-threads.com/lists/linux/kernel/580789 Thanks for the pointer. If you're a participant in that mailing list, maybe you could forward this comment... A fundamental flaw in the kernel, which goes WAY back to early UNIX implementations, is that the nice(1) setting of a program only applies to CPU usage, not to other resources. In this case, the file-system cache has no priority, so even if I set postmaster's nice(1) value to a very high priority, any pissant process with the lowest priority possible can come along with a cat some-big-file /dev/null and trash my cached file-system pages. It's essentially a denial-of-service mechanism that's built in to the kernel. The kernel group's discussion on the heuristics of how and when to toss stale cache pages should have a strong nice(1) component to it. A process with a low priority should not be allowed to toss memory from a higher-priority process unless there is no other source of memory. Getting back to Postgres, the same points that the linux kernel group are discussing apply to Postgres. There is simply no way to devise a heuristic that comes even close to what the app developer can tell you. A mechanism that allowed an application to say, Keep this table in memory is the only way. App developers should be advised to use it sparingly, because most of the time the system is pretty good at memory management, and such a mechanism hobbles the system's ability to manage. But when it's needed, there is no substitute. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Is There Any Way ....
Jim C. Nasby jnasby ( at ) pervasive ( dot ) com wrote: Stefan Weiss wrote: ... IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as cheap, even when querying on columns that are not indexed. Why do you think you'll know better than the database how frequently something is used? At best, your guess will be correct and PostgreSQL (or the kernel) will keep the table in memory. Or, your guess is wrong and you end up wasting memory that could have been used for something else. It would probably be better if you describe why you want to force this table (or tables) into memory, so we can point you at more appropriate solutions. Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memory at all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user's query takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes as the index is re-read from memory. Now here's the rub. The only performance I care about is response to queries from the web application. Everything else is low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries, doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The another customer comes along and performs a query. At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except my web application. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevant compared to the performance of the customer's query. What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits, and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, and mirror the database on both. All administration and modification happens on the offline database, and the web application only uses the online database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a mechanism to explicitely name the tables that should be locked in memory. Look at tsearchd that Oleg is working on. It's a direct response to this problem. It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our ability to predict the behavior of an application is woefully inadequate compared with the application developer's knowledge of the application. Computer Science simply isn't a match for the human brain yet, not even close. To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the responses I received told me that they encountered this problem, and their solution was to use an external full-text engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Is There Any Way ....
This is possible with Oracle utilizing the keep pool alter table t_name storage ( buffer_pool keep); If Postgres were to implement it's own caching system, this seems like it would be easily to implement (beyond the initial caching effort). Alex On 10/24/05, Craig A. James [EMAIL PROTECTED] wrote: Jim C. Nasby jnasby ( at ) pervasive ( dot ) com wrote: Stefan Weiss wrote: ... IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as cheap, even when querying on columns that are not indexed. Why do you think you'll know better than the database how frequently something is used? At best, your guess will be correct and PostgreSQL (or the kernel) will keep the table in memory. Or, your guess is wrong and you end up wasting memory that could have been used for something else. It would probably be better if you describe why you want to force this table (or tables) into memory, so we can point you at more appropriate solutions. Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memory at all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user's query takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes as the index is re-read from memory. Now here's the rub. The only performance I care about is response to queries from the web application. Everything else is low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries, doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The another customer comes along and performs a query. At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except my web application. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevant compared to the performance of the customer's query. What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits, and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, and mirror the database on both. All administration and modification happens on the offline database, and the web application only uses the online database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. There is ONLY one way to convey this sort of information to Postgres, which is to provide the application developer a mechanism to explicitely name the tables that should be locked in memory. Look at tsearchd that Oleg is working on. It's a direct response to this problem. It's been recognized for decades that, as kernel developers (whether a Linux kernel or a database kernel), our ability to predict the behavior of an application is woefully inadequate compared with the application developer's knowledge of the application. Computer Science simply isn't a match for the human brain yet, not even close. To give you perspective, since I posted a question about this problem (regarding tsearch2/GIST indexes), half of the responses I received told me that they encountered this problem, and their solution was to use an external full-text engine. They all confirmed that Postgres can't deal with this problem yet, primarily for the reasons outlined above. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Is There Any Way ....
Alex Turner wrote: This is possible with Oracle utilizing the keep pool alter table t_name storage ( buffer_pool keep); If Postgres were to implement it's own caching system, this seems like it would be easily to implement (beyond the initial caching effort). Alex On 10/24/05, Craig A. James [EMAIL PROTECTED] wrote: Jim C. Nasby jnasby ( at ) pervasive ( dot ) com wrote: Stefan Weiss wrote: ... IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as cheap, even when querying on columns that are not indexed. Why do you think you'll know better than the database how frequently something is used? At best, your guess will be correct and PostgreSQL (or the kernel) will keep the table in memory. Or, your guess is wrong and you end up wasting memory that could have been used for something else. It would probably be better if you describe why you want to force this table (or tables) into memory, so we can point you at more appropriate solutions. Or perhaps we could explain why we NEED to force these tables into memory, so we can point you at a more appropriate implementation. ;-) Ok, wittiness aside, here's a concrete example. I have an application with one critical index that MUST remain in memory at all times. The index's tablespace is about 2 GB. As long as it's in memory, performance is excellent - a user's query takes a fraction of a second. But if it gets swapped out, the user's query might take up to five minutes as the index is re-read from memory. Now here's the rub. The only performance I care about is response to queries from the web application. Everything else is low priority. But there is other activity going on. Suppose, for example, that I'm updating tables, performing queries, doing administration, etc., etc., for a period of an hour, during which no customer visits the site. The another customer comes along and performs a query. At this point, no heuristic in the world could have guessed that I DON'T CARE ABOUT PERFORMANCE for anything except my web application. The performance of all the other stuff, the administration, the updates, etc., is utterly irrelevant compared to the performance of the customer's query. What actually happens is that the other activities have swapped out the critical index, and my customer waits, and waits, and waits... and goes away after a minute or two. To solve this, we've been forced to purchase two computers, and mirror the database on both. All administration and modification happens on the offline database, and the web application only uses the online database. At some point, we swap the two servers, sync the two databases, and carry on. It's a very unsatisfactory solution. We have a similar problem with vacuum being the equivalent of continuously flush all system caches for a long time. Our database is about 200GB in size and vacuums take hours and hours. The performance is acceptable still, but only because we've hidden the latency in our application. I've occasionally thought it would be good to have the backend doing a vacuum or analyze also call priocntl() prior to doing any real work to lower its priority. We'll be switching to the 8.1 release ASAP just because the direct IO capabilities are appearing to be a win on our development system. -- Alan ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Used Memory
Hi To all those who replied. Thank You. I monitor my database server a while ago and found out that memory is used extensively when I am fetching records from the database. I use the command fetch all in my VB Code and put it in a recordset.Also in this command the CPU utilization is used extensively. Is there something wrong with my code or is it just the way postgresql is behaving which I cannot do something about it? I just monitor one workstation connecting to the database server and it is already eating up about 20 % of the CPU of database server. Which I think will not be applicable to our system since we have a target of 25 PC connecting to the database server most of the time. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Craig A. James Sent: Monday, October 24, 2005 9:47 PM To: Scott Marlowe Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Used Memory Scott Marlowe wrote: What's needed is a way for the application developer to explicitely say, This object is frequenly used, and I want it kept in memory. There's an interesting conversation happening on the linux kernel hackers mailing list right about now that applies: http://www.gossamer-threads.com/lists/linux/kernel/580789 Thanks for the pointer. If you're a participant in that mailing list, maybe you could forward this comment... A fundamental flaw in the kernel, which goes WAY back to early UNIX implementations, is that the nice(1) setting of a program only applies to CPU usage, not to other resources. In this case, the file-system cache has no priority, so even if I set postmaster's nice(1) value to a very high priority, any pissant process with the lowest priority possible can come along with a cat some-big-file /dev/null and trash my cached file-system pages. It's essentially a denial-of-service mechanism that's built in to the kernel. The kernel group's discussion on the heuristics of how and when to toss stale cache pages should have a strong nice(1) component to it. A process with a low priority should not be allowed to toss memory from a higher-priority process unless there is no other source of memory. Getting back to Postgres, the same points that the linux kernel group are discussing apply to Postgres. There is simply no way to devise a heuristic that comes even close to what the app developer can tell you. A mechanism that allowed an application to say, Keep this table in memory is the only way. App developers should be advised to use it sparingly, because most of the time the system is pretty good at memory management, and such a mechanism hobbles the system's ability to manage. But when it's needed, there is no substitute. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Need help in setting optimal configuration for a huge database.
Hi All, I am Kishore doing freelance development of J2EE applications. We switched to use Postgresql recently because of the advantages it has over other commercial databases. All went well untill recently, untill we began working on an application that needs to maintain a huge database. I am describing the problem we are facing below. Can you please take a look at the case, and help me in configuring the PostgreSQL. We have only two tables, one of which contains 97% of the data and the other table which contains 2.8% of the data. All other contain only the remaining 0.2% of data and are designed to support these two big tables. Currently we have 9 million of records in the first table and 0.2 million of records in the second table.We need to insert into the bigger table almost for every second , through out the life time. In addition, we receive at least 200,000 records a day at a fixed time.We are facing a critical situation because of the performance of the database. Even a basic query like select count(*) from bigger_table is taking about 4 minutes to return.The following is the system configuration.Database : Postgresql 7.3OS : Redhat LinuxProcessor : Athlon,Memory : 2 GBWe are expecting that at least 200 active connections need to be maintainedthrough out the day. I am also attaching the configuration file that we are currently using. Can anyyou please suggest the best configuration to satisfy the above requirements? Thanks in advance. Thank you, Kishore. Yahoo! FareChase - Search multiple travel sites in one click. postgresql.conf Description: 3963038301-postgresql.conf ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] impact of stats_command_string
If I turn on stats_command_string, how much impact would it have on PostgreSQL server's performance during a period of massive data INSERTs? I know that the answer to the question I'm asking will largely depend upon different factors so I would like to know in which situations it would be negligible or would have a signifcant impact. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Used Memory
Christian Paul B. Cosinas wrote: Hi To all those who replied. Thank You. I monitor my database server a while ago and found out that memory is used extensively when I am fetching records from the database. I use the command fetch all in my VB Code and put it in a recordset.Also in this command the CPU utilization is used extensively. Is there something wrong with my code or is it just the way postgresql is behaving which I cannot do something about it? I just monitor one workstation connecting to the database server and it is already eating up about 20 % of the CPU of database server. Which I think will not be applicable to our system since we have a target of 25 PC connecting to the database server most of the time. Could you post the query and the output of EXPLAIN ANALYZE? In addition, have you run ANALYZE on all the tables in that database ? (sorry, have to ask :-) ). cheers Mark ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Used Memory
Hi mark I have so many functions, more than 100 functions in the database :) And I am dealing about 3 million of records in one database. And about 100 databases :) -Original Message- From: Mark Kirkwood [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 25, 2005 3:07 AM To: Christian Paul B. Cosinas Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Used Memory Christian Paul B. Cosinas wrote: Hi To all those who replied. Thank You. I monitor my database server a while ago and found out that memory is used extensively when I am fetching records from the database. I use the command fetch all in my VB Code and put it in a recordset.Also in this command the CPU utilization is used extensively. Is there something wrong with my code or is it just the way postgresql is behaving which I cannot do something about it? I just monitor one workstation connecting to the database server and it is already eating up about 20 % of the CPU of database server. Which I think will not be applicable to our system since we have a target of 25 PC connecting to the database server most of the time. Could you post the query and the output of EXPLAIN ANALYZE? In addition, have you run ANALYZE on all the tables in that database ? (sorry, have to ask :-) ). cheers Mark I choose Polesoft Lockspam to fight spam, and you? http://www.polesoft.com/refer.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Used Memory
Christian Paul B. Cosinas wrote: Hi mark I have so many functions, more than 100 functions in the database :) And I am dealing about 3 million of records in one database. And about 100 databases :) LOL - sorry, mis-understood your previous message to mean you had identified *one* query where 'fetch all' was causing the problem! Having said that, to make much more progress, you probably want to identify those queries that are consuming your resource, pick one of two of the particularly bad ones and post 'em. There are a number of ways to perform said identification, enabling stats collection might be worth a try. regards Mark ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster