Re: [NOVICE] [PERFORM] Extreme high load averages
Dennis Björklund wrote: On Sun, 6 Jul 2003, Martin Foster wrote: The processor seems to be purposely sitting there twiddling it's thumbs. Which leads me to believe that perhaps the nice levels have to be changed on the server itself? It could also be all the usual things that affect performance. Are your queries using indexes where it should? Do you vacuum analyze after you have updated/inserted a lot of data? It could be that some of your queries is not as efficient as it should, like doing a sequenctial scan over a table instead of an index scan. That translates into more IO needed and slower response times. Especially when you have more connections figthing for the available IO. I actually got a bit more respect for PostgreSQL tonight. It seems that one of my scripts was not committing changes after maintenance was conducted. Meaning that rows that would normally be removed after offline archiving was completed were in fact still around. Normally at any given point in time this table would grow 50K rows during a day, be archived that night and then loose rows that were no longer needed.This process, is what allowed MySQL to maintain any stability as the size of this table can balloon significantly. PostgreSQL with tweaking was handling a table with nearly 300K rows. That size alone would of dragged the MySQL system down to a near grind, and since most of those rows are not needed. One can imagine that queries are needlessly processing rows that should be outright ignored. This probably explains why row numbering based searches greatly accelerated the overall process. By fixing the script and doing the appropriate full vacuum and re-index, the system is behaving much more like it should. Even if the process may seem a bit odd to some. The reason for removing rows on a daily basis is due to the perishable nature of the information. Since this is a chat site, posts over a day old are rarely needed for any reason. Which is why they are archived into dumps in case we really need to retrieve the information itself and this gives us the added bonus of smaller backup sizes and smaller database sizes. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] plpgsql vs. SQL performance (again)
About a month ago I asked the general list about plpgsql functions that occasionally significantly underperform their straight SQL equivalents. Tom noted that a different query plan was almost certainly being chosen by the plpgsql function: http://archives.postgresql.org/pgsql-general/2003-05/msg00966.php http://archives.postgresql.org/pgsql-general/2003-05/msg00998.php Tom suggested checking for sloppy datatype declarations in the plpgsql functions. Double-checked, a-ok. Tom also suggested that indexscans might not get picked by the plpgsql function if I have some very skewed statistics. Is there a way to verify the plpgsql function's planner choices? My casual observations are that this problem occurs with aggregates, and that the big performance hit is not consistent. I'd like advice on more formal troubleshooting. I can provide examples (my latest problem function is currently taking over 4 seconds vs. .04 seconds for its straight SQL equivalent), table schema, explain output for the straight SQL, etc., if anyone cares to work through this with me. thanks, michael ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Can you help me?
Hello all! I'm a new to Postgresql , I have never used it before. I am having an issue with configure the postgresql.conf file. The machine itself is a 2.66GHz P4 w/ 2G memory. Would you mind to send me a copy of examples .(postgresql.conf) Maybe you can tell me how to configure these parameters. Thanks Sincerely, Chris.Wu ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Sean Chittenden wrote: I looked through the src/doc/runtime.sgml for a good place to stick this and couldn't find a place that this seemed appropriate, but on FreeBSD, this can be determined with a great deal of precision in a programmatic manner: echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))" The same OID is available via C too. It'd be slick if PostgreSQL could tune itself (on FreeBSD) at initdb time with the above code. If Linux exports this info via /proc and can whip out the appropriate magic, even better. An uncommented out good guess that shows up in postgresql.conf would be stellar and quite possible with the use of sed. Maybe an initdb switch could be added to have initdb tune the config it generates? If a -n is added, have it generate a config and toss it to stdout? case `uname` in "FreeBSD") echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))" ;; *) echo "Unable to automatically determine the effective cache size" >> /dev/stderr ;; esac -sc Simplest way may be to create a 'auto-tune' directory with scripts for configured platforms. When postgres installs the databases, it checks for 'tune.xxx' and if found uses that to generate the script itself? This would allow for defaults on platforms that do not have them and optimization for those that do. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
> I don't have much to add because I'm pretty new to Postgres and have > been soliciting advice here recently, but I totally agree with > everything you said. I don't mind if it's in the postgres.conf file > or in a faq that is easy to find, I just would like it to be in one > place. A good example of the need for this is when I was tuning > "effective_cache" I thought that was creating a cache for Postgres > when in fact as it was pointed out to me, it's just hinting to > postgres the size of the OS cache. Lots of ways for people to get > really confused here. I looked through the src/doc/runtime.sgml for a good place to stick this and couldn't find a place that this seemed appropriate, but on FreeBSD, this can be determined with a great deal of precision in a programmatic manner: echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))" The same OID is available via C too. It'd be slick if PostgreSQL could tune itself (on FreeBSD) at initdb time with the above code. If Linux exports this info via /proc and can whip out the appropriate magic, even better. An uncommented out good guess that shows up in postgresql.conf would be stellar and quite possible with the use of sed. Maybe an initdb switch could be added to have initdb tune the config it generates? If a -n is added, have it generate a config and toss it to stdout? case `uname` in "FreeBSD") echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))" ;; *) echo "Unable to automatically determine the effective cache size" >> /dev/stderr ;; esac -sc -- Sean Chittenden ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Scott Marlowe wrote: It would be nice to have a program that could run on any OS postgresql runs on and could report on the current limits of the kernel, and make recommendations for changes the admin might want to make. One could probably make a good stab at effective cache size during install. Anything reasonably close would probably help. Report what % of said resources could be consumed by postgresql under various circumstances... One of the issues that automating the process would encounter are limits in the kernel that are too low for PostgreSQL to handle. The BSD's come to mind where they need values manually increased in the kernel before you can reach a reasonable maximum connection count. Another example is how OpenBSD will outright crash when trying to test the database during install time. It seems that most of the tests fail because the maximum amount of processes allowed is too low for the test to succeed. While FreeBSD will work just fine on those same tests. If PostgreSQL automates the configuration, that would be a plus. But also detect the platform and inform the person that these changes should be done to the kernel, sysctl or whatever in order to have that configuration run. Perl may be useful in this for a few reasons. It's portable enough to run on multiple Unix variants and the tools would be fairly standard, so the code would require less considerations for more exotic implementations. Martin Foster Creator/Designer Ethereal Realms [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
On Wed, 9 Jul 2003, Kaarel wrote: > >>Are you willing to say that the PostgreSQL database system should only be > >>used by DBAs? I believe that Postgres is such a good and useful tool that > >>anyone should be able to start using it with little or no barrier to entry. > > > > > > I quite agree. But there is a difference between saying "you should get > > decent performance with no effort" and "you should get optimal > > performance with no effort". I think we can get to the first with > > relatively little trouble (like boosting the default shared_buffers to > > 1000), but the second is an impractical goal. > > > Just wanted to repeat some of the thoughts already been expressed. > > There are no reasons why shouldn't PostgreSQL be reasonably well > configured for a particular platform out of the box. Not for maximum > performance but for good enough performance. The many complaints by new > users about PostgreSQL being suprisingly slow and the all the so > standard answers (vacuum, pump up memory settings) imho prove that the > default installatio can be improved. Already mentioned in the mail > lists: using multiple standard conf files, quering system info and > dynamically generating all or some parts of the conf file, automating > the vacuum process... It would be nice to have a program that could run on any OS postgresql runs on and could report on the current limits of the kernel, and make recommendations for changes the admin might want to make. One could probably make a good stab at effective cache size during install. Anything reasonably close would probably help. Report what % of said resources could be consumed by postgresql under various circumstances... ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Moving postgresql.conf tunables into 2003...
Are you willing to say that the PostgreSQL database system should only be used by DBAs? I believe that Postgres is such a good and useful tool that anyone should be able to start using it with little or no barrier to entry. I quite agree. But there is a difference between saying "you should get decent performance with no effort" and "you should get optimal performance with no effort". I think we can get to the first with relatively little trouble (like boosting the default shared_buffers to 1000), but the second is an impractical goal. Just wanted to repeat some of the thoughts already been expressed. There are no reasons why shouldn't PostgreSQL be reasonably well configured for a particular platform out of the box. Not for maximum performance but for good enough performance. The many complaints by new users about PostgreSQL being suprisingly slow and the all the so standard answers (vacuum, pump up memory settings) imho prove that the default installatio can be improved. Already mentioned in the mail lists: using multiple standard conf files, quering system info and dynamically generating all or some parts of the conf file, automating the vacuum process... Kaarel ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Some very weird behaviour....
> To give you some perspective on the size of the dataset and the > performance level we are hitting, here are some "good" results based on > some explains: Before Tom jumps in taking all the fun out of trying to solve it... The estimates in the slow queries seem perfectly reasonable. In fact, the cost estimates of both the slow and fast queries are the same which is what would be expected if all of the data was distributed evenly amongst the table. Given it's a date, I would guess that the data is generally inserted into the table in an order following the date but for some reason those 'high' dates have their data distributed more evenly amongst the table. Clustered data will have fewer disk seeks and deal with fewer pages of information in general which makes for a much faster query. Distributed data will have to pull out significantly more information from the disk, throwing most of it away. I would guess that sometime on 2002-05-25 someone did a bit of data cleaning (deleting records). Next day the free space map had entries available in various locations within the table, and used them rather than appending to the end. With 89 Million records with date being significant, I'm guessing there aren't very many modifications or deletes on it. So.. How to solve the problem? If this is the type of query that occurs most often, you do primarily inserts, and the inserts are generally created following date, cluster the table by index "some_table_ix_0". The clustering won't degrade very much since that is how you naturally insert the data. signature.asc Description: This is a digitally signed message part
[PERFORM] Some very weird behaviour....
Hi All, I'm sure some of you know me from previous questions on other lists, but this one has myself and Marc completely stumped. We've got a database that has about 89 Million rows, under PostgreSQL 7.3.3 on a dual PIII 1.2 with 4 GBytes of RAM on a 5 disk RAID 5 array. The dataset itself is about 26+ GBYtes in size, all of it in the one table. To give you some perspective on the size of the dataset and the performance level we are hitting, here are some "good" results based on some explains: jnlstats=# explain analyze select count(*) from some_table where some_time::date='2003-05-21'; QUERY PLAN -- Aggregate (cost=1794562.35..1794562.35 rows=1 width=0) (actual time=3013.55..3013.55 rows=1 loops=1) -> Index Scan using some_table_ix_0 on some_table (cost=0.00..1793446.02 rows=446531 width=0) (actual time=48.40..2721.26 rows=249837 loops=1) Index Cond: ((some_time)::date = '2003-05-21'::date) Total runtime: 3015.02 msec (4 rows) jnlstats=# explain analyze select count(*) from stats_raw where some_time::date='2003-05-21'; QUERY PLAN - Aggregate (cost=1794562.35..1794562.35 rows=1 width=0) (actual time=1401.23..1401.23 rows=1 loops=1) -> Index Scan using some_table_ix_0 on some_table (cost=0.00..1793446.02 rows=446531 width=0) (actual time=0.50..1118.92 rows=249837 loops=1) Index Cond: ((some_time)::date = '2003-05-21'::date) Total runtime: 1401.42 msec There are about 249837 items that the query is identifying as valid results and the results range between 1-1.4 seconds over ten runs with the initial query taking 3 seconds, this average is how 90% of the queries resopond, but we've got several peaks that we can not explain in any way. For instance: jnlstats=# explain analyze select count(*) from some_table where some_time::date='2003-05-26'; QUERY PLAN Aggregate (cost=1794562.35..1794562.35 rows=1 width=0) (actual time=307025.65..307025.65 rows=1 loops=1) -> Index Scan using some_table_ix_0 on some_table (cost=0.00..1793446.02 rows=446531 width=0) (actual time=51.05..306256.93 rows=374540 loops=1) Index Cond: ((some_time)::date = '2003-05-26'::date) Total runtime: 307025.81 msec (4 rows) jnlstats=# explain analyze select count(*) from some_table where some_time::date='2003-05-26'; QUERY PLAN -- Aggregate (cost=1794562.35..1794562.35 rows=1 width=0) (actual time=10837.86..10837.86 rows=1 loops=1) -> Index Scan using some_table_ix_0 on some_table (cost=0.00..1793446.02 rows=446531 width=0) (actual time=1.01..10304.78 rows=374540 loops=1) Index Cond: ((some_time)::date = '2003-05-26'::date) Total runtime: 10838.04 msec The total number of items counted is 374540 items, so not too much more then the previous query, but the 300 second runtime was unexpected (we were expecting ~4-5 seconds and then ~1-2 seconds for the caches results. I have 5 other dates that all exhibit this information,but it's ONLY those dates that run that slow and the one I presented above here is the largest of them all. The database server is configured with a 5 MByte shared mamory buffer, but even a larger shared memory buffer does not help (we have had it set to 800 MBytes before). The disk is getting hit the heviest durring that last query, with iostat results being: tty da0 da1 da2 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 25 12.22 1 0.02 7.68 0 0.00 7.68 0 0.00 0 0 0 0 99 4 3758 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 6 0 38 0 56 0 151 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 9 0 43 0 48 0 148 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 10 0 40 0 49 0 153 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00 10 0 40 0 49 0 152 0.00 0 0.00 0.00 0 0.00 0.00 0 0.00