Re: [PERFORM] Wierd context-switching issue on Xeon
Hi, Dual Xeon P4 2.8 linux RedHat AS 3 kernel 2.4.21-4-EL-smp 2 GB ram I can see the same problem: procs memory swap io system cpu r b swpd free buff cache si sobibo incs us sy id wa 1 0 0 96212 61056 172024000 0 0 10111 25 0 75 0 1 0 0 96212 61056 172024000 0 0 108 139 25 0 75 0 1 0 0 96212 61056 172024000 0 0 104 173 25 0 75 0 1 0 0 96212 61056 172024000 0 0 10211 25 0 75 0 1 0 0 96212 61056 172024000 0 0 10111 25 0 75 0 2 0 0 96204 61056 172024000 0 0 110 53866 31 4 65 0 2 0 0 96204 61056 172024000 0 0 101 83176 41 5 54 0 2 0 0 96204 61056 172024000 0 0 102 86050 39 6 55 0 2 0 0 96204 61056 172024000 049 113 73642 41 5 54 0 2 0 0 96204 61056 172024000 0 0 102 84211 40 5 55 0 2 0 0 96204 61056 172024000 0 0 101 105165 39 7 54 0 2 0 0 96204 61056 172024000 0 0 103 97754 38 6 56 0 2 0 0 96204 61056 172024000 0 0 103 113668 36 7 57 0 2 0 0 96204 61056 172024000 0 0 103 112003 37 7 56 0 regards, ivan. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] slow seqscan
My first post to this list :) Scenario: I have a database used only with search queries with only one table that holds about 450.000/500.000 records. The table is well indexed so that most of the queries are executed with index scan but since there is a big text field in the table (360chars) some search operation (with certain filters) ends up with seq scans. This table is not written during normal operation: twice per week there is a batch program that insert about 35.000 records and updates another 40.000. last friday morning, after that batch has been executed, the database started responding really slowly to queries (expecially seq scans), after a vacuum full analize things did get something better. Yesterday the same: before the batch everything was perfect, after every query was really slow, I've vacuum it again and now is ok. Since now the db was working fine, it's 4 month's old with two updates per week and I vacuum about once per month. I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking about setting this table in a kind of read-only mode to improve performance, is this possible? Thank you for your help Edoardo Ceccarelli ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] slow seqscan
Hi Edoardo, The table is well indexed so that most of the queries are executed with index scan but since there is a big text field in the table (360chars) some search operation (with certain filters) ends up with seq scans. Please paste the exact SELECT query that uses a seqscan, plus the EXPLAIN ANALYZE of the SELECT, and the psql output of \d table. This table is not written during normal operation: twice per week there is a batch program that insert about 35.000 records and updates another 40.000. After such an update, you need to run VACUUM ANALYZE table; Run it before the update as well, if it doesn't take that long. last friday morning, after that batch has been executed, the database started responding really slowly to queries (expecially seq scans), after a vacuum full analize things did get something better. Yesterday the same: before the batch everything was perfect, after every query was really slow, I've vacuum it again and now is ok. Since now the db was working fine, it's 4 month's old with two updates per week and I vacuum about once per month. You need to vacuum analyze (NOT full) once and HOUR, not once a month. Add this command to your crontab to run once an hour and verify that it's working: vacuumdb -a -z -q Otherwise, install the auto vacuum utility found in contrib/pg_autovacuum in the postgres source. Set this up. It will monitor postgres and run vacuums and analyzes when necessary. You can then remove your cron job. I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking about setting this table in a kind of read-only mode to improve performance, is this possible? There's no read only mode to improve performance. Upgrading to 7.4 will more than likely improve the performance of your database in general. Be careful to read the upgrade notes because there were a few incompatibilities. Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] slow seqscan
Edoardo Ceccarelli wrote: My first post to this list :) Scenario: I have a database used only with search queries with only one table that holds about 450.000/500.000 records. The table is well indexed so that most of the queries are executed with index scan but since there is a big text field in the table (360chars) some search operation (with certain filters) ends up with seq scans. This table is not written during normal operation: twice per week there is a batch program that insert about 35.000 records and updates another 40.000. last friday morning, after that batch has been executed, the database started responding really slowly to queries (expecially seq scans), after a vacuum full analize things did get something better. Yesterday the same: before the batch everything was perfect, after every query was really slow, I've vacuum it again and now is ok. Since now the db was working fine, it's 4 month's old with two updates per week and I vacuum about once per month. I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking about setting this table in a kind of read-only mode to improve performance, is this possible? Thank you for your help Edoardo Ceccarelli ---(end of broadcast)--- TIP 8: explain analyze is your friend In general we are going to need more information, like what kind of search filters you are using on the text field and an EXPLAIN ANALYZE. But can you try and run the following, bearing in mind it will take a while to complete. REINDEX TABLE table_name From what I remember there were issues with index space not being reclaimed in a vacuum. I believe this was fixed in 7.4. By not reclaiming the space the indexes grow larger and larger over time, causing PG to prefer a sequential scan over an index scan (I think). Hope that helps Nick ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] MySQL vs PG TPC-H benchmarks
Hi, Has anyone had a look at: http://people.ac.upc.es/zgomez/ I realize that MySQL PG cannot really be compared (especially when you consider the issues that MySQL has with things like data integrity) but still surely PG would perform better than the stats show (i.e. #7 31.28 seconds versus 42 minutes!!!). On a side note it certainly looks like linux kernel 2.6 is quite a bit faster in comparision to 2.4. Nick ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] slow seqscan
In general we are going to need more information, like what kind of search filters you are using on the text field and an EXPLAIN ANALYZE. But can you try and run the following, bearing in mind it will take a while to complete. REINDEX TABLE table_name From what I remember there were issues with index space not being reclaimed in a vacuum. I believe this was fixed in 7.4. By not reclaiming the space the indexes grow larger and larger over time, causing PG to prefer a sequential scan over an index scan (I think). The query is this: SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11 dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 rows=11 loops=1) - Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) (actual time=51.47..56.40 rows=12 loops=1) Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 'cbr%'::text)) Total runtime: 56.53 msec (4 rows) But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! check this (same table, same query, different rubric=MA index): dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'MA' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN --- Limit (cost=0.00..6630.72 rows=9 width=546) (actual time=42.74..42.74 rows=0 loops=1) - Index Scan using rubric on annuncio400 (cost=0.00..6968.48 rows=9 width=546) (actual time=42.73..42.73 rows=0 loops=1) Index Cond: (rubric = 'MA'::bpchar) Filter: (lower((testo)::text) ~~ 'cbr%'::text) Total runtime: 42.81 msec (5 rows) Thanks for your help Edoardo ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] slow seqscan
dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; QUERY PLAN Limit (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 rows=11 loops=1) - Seq Scan on annuncio400 (cost=0.00..35490.60 rows=125 width=546) (actual time=51.47..56.40 rows=12 loops=1) Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 'cbr%'::text)) Total runtime: 56.53 msec (4 rows) What happens if you go: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); or even just: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! What's strange about that? The less data is going to be retrieved, the more likely postgres is to use the index. I suggest maybe increasing the amount of stats recorded for your rubrik column: ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; ANALYZE annuncio400; You could also try reducing the random_page_cost value in your postgresql.conf a little, say to 3 (if it's currently 4). That will make postgres more likely to use index scans over seq scans. Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] slow seqscan
What happens if you go: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); or even just: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); I wasn't able to make this 2 field index with lower: dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); ERROR: parser: parse error at or near ( at character 71 seems impossible to creat 2 field indexes with lower function. The other one does not make it use the index. But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! What's strange about that? The less data is going to be retrieved, the more likely postgres is to use the index. can't understand this policy: dba400=# SELECT count(*) from annuncio400 where rubric='DD'; count --- 6753 (1 row) dba400=# SELECT count(*) from annuncio400 where rubric='MA'; count --- 2165 (1 row) so it's using the index on 2000 rows and not for 6000? it's not that big difference, isn't it? I suggest maybe increasing the amount of stats recorded for your rubrik column: ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; ANALYZE annuncio400; done, almost the same, still not using index You could also try reducing the random_page_cost value in your postgresql.conf a little, say to 3 (if it's currently 4). That will make postgres more likely to use index scans over seq scans. changed the setting on postgresql.conf, restarted the server, nothing has changed. what about setting this to false? #enable_seqscan = true thanks again Edoardo ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] slow seqscan
tried the enable_seqscan = false and I'm having all index scans, timing has improved from 600ms to 18ms wondering what other implications I might expect. Edoardo Ceccarelli ha scritto: What happens if you go: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); or even just: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); I wasn't able to make this 2 field index with lower: dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); ERROR: parser: parse error at or near ( at character 71 seems impossible to creat 2 field indexes with lower function. The other one does not make it use the index. But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! What's strange about that? The less data is going to be retrieved, the more likely postgres is to use the index. can't understand this policy: dba400=# SELECT count(*) from annuncio400 where rubric='DD'; count --- 6753 (1 row) dba400=# SELECT count(*) from annuncio400 where rubric='MA'; count --- 2165 (1 row) so it's using the index on 2000 rows and not for 6000? it's not that big difference, isn't it? I suggest maybe increasing the amount of stats recorded for your rubrik column: ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100; ANALYZE annuncio400; done, almost the same, still not using index You could also try reducing the random_page_cost value in your postgresql.conf a little, say to 3 (if it's currently 4). That will make postgres more likely to use index scans over seq scans. changed the setting on postgresql.conf, restarted the server, nothing has changed. what about setting this to false? #enable_seqscan = true thanks again Edoardo ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slow seqscan
enable_seqscan = false and I'm having all index scans, timing has improved from 600ms to 18ms wondering what other implications I might expect. Lots of really bad implications...it's really not a good idea. Chris ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon
How long is this test supposed to run? I've launched just 1 for testing, the plan seems horrible; the test is cpu bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G Unixware 713 The machine is a Fujitsu-Siemens TX 200 server On Mon, 19 Apr 2004, Tom Lane wrote: Date: Mon, 19 Apr 2004 20:01:56 -0400 From: Tom Lane [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: Joe Conway [EMAIL PROTECTED], scott.marlowe [EMAIL PROTECTED], Bruce Momjian [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Neil Conway [EMAIL PROTECTED] Subject: Re: [PERFORM] Wierd context-switching issue on Xeon Here is a test case. To set up, run the test_setup.sql script once; then launch two copies of the test_run.sql script. (For those of you with more than two CPUs, see whether you need one per CPU to make trouble, or whether two test_runs are enough.) Check that you get a nestloops-with-index-scans plan shown by the EXPLAIN in test_run. In isolation, test_run.sql should do essentially no syscalls at all once it's past the initial ramp-up. On a machine that's functioning per expectations, multiple copies of test_run show a relatively low rate of semop() calls --- a few per second, at most --- and maybe a delaying select() here and there. What I actually see on Josh's client's machine is a context swap storm: vmstat 1 shows CS rates around 170K/sec. strace'ing the backends shows a corresponding rate of semop() syscalls, with a few delaying select()s sprinkled in. top(1) shows system CPU percent of 25-30 and idle CPU percent of 16-20. I haven't bothered to check how long the test_run query takes, but if it ends while you're still examining the behavior, just start it again. Note the test case assumes you've got shared_buffers set to at least 1000; with smaller values, you may get some I/O syscalls, which will probably skew the results. regards, tom lane -- Olivier PRENANT Tel: +33-5-61-50-97-00 (Work) 6, Chemin d'Harraud Turrou +33-5-61-50-97-01 (Fax) 31190 AUTERIVE +33-6-07-63-80-64 (GSM) FRANCE Email: [EMAIL PROTECTED] -- Make your life a dream, make your dream a reality. (St Exupery) ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
On Wed, 2004-04-21 at 08:19, Rod Taylor wrote: I realize that MySQL PG cannot really be compared (especially when you consider the issues that MySQL has with things like data integrity) but still surely PG would perform better than the stats show (i.e. #7 31.28 seconds versus 42 minutes!!!). We know that PostgreSQL 7.5 will perform much better than 7.4 did due to the efforts of OSDN and Tom. OSDL not OSDN. I've enquired as to whether they ran ANALYZE after the data load. They don't explicitly mention it, and given the mention it took 2.5days to load 1GB of data, they're not regular PostgreSQL users. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
On 21/04/2004 09:31 Nick Barr wrote: Hi, Has anyone had a look at: http://people.ac.upc.es/zgomez/ I realize that MySQL PG cannot really be compared (especially when you consider the issues that MySQL has with things like data integrity) but still surely PG would perform better than the stats show (i.e. #7 31.28 seconds versus 42 minutes!!!). Looks like he's using the default postgresql.conf settings in which case I'm not suprised at pg looking so slow. His stated use of foreign keys invalidates the tests anyway as MyISAM tables don't support FKs so we're probably seeing FK check overheads in pg that are simply ignore by MySQL. In an honest test, MySQL should be reported as failing those tests. Perhaps one of the advocay team will pick up the batton? On a side note it certainly looks like linux kernel 2.6 is quite a bit faster in comparision to 2.4. Yes, I've seen other benchmarks which also show that. -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Looks like he's using the default postgresql.conf settings in which case I'm not suprised at pg looking so slow. The question also is, IMHO, why the hell, postgreSQL still comes out of the box with so stupid configuration defaults, totally underestimated for todays average hardware configuration (1+GHz, 0.5+GB RAM, fast FSB, fast HDD). It seems to me better strategy to force that 1% of users to downgrade cfg. than vice-versa. regards ch ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
On 21/04/2004 14:31 Cestmir Hybl wrote: Looks like he's using the default postgresql.conf settings in which case I'm not suprised at pg looking so slow. The question also is, IMHO, why the hell, postgreSQL still comes out of the box with so stupid configuration defaults, totally underestimated for todays average hardware configuration (1+GHz, 0.5+GB RAM, fast FSB, fast HDD). It seems to me better strategy to force that 1% of users to downgrade cfg. than vice-versa. regards ch This has been discussed many times before. Check the archives. -- Paul Thomas +--+-+ | Thomas Micro Systems Limited | Software Solutions for Business | | Computer Consultants | http://www.thomas-micro-systems-ltd.co.uk | +--+-+ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slow seqscan
On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote: What happens if you go: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); or even just: CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo)); I wasn't able to make this 2 field index with lower: dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); ERROR: parser: parse error at or near ( at character 71 That's a 7.4 feature I think (and I think the version with two columns may need extra parens around the lower()). I think the only way to do something equivalent in 7.3 is to make a function that concatenates the two in some fashion after having applied the lower to the one part and then using that in the queries as well. Plus, if you're not in C locale, I'm not sure that it'd help in 7.3 anyway. But the strangest thing ever is that if I change the filter with another one that represent a smaller amount of data it uses the index scan!!! What's strange about that? The less data is going to be retrieved, the more likely postgres is to use the index. can't understand this policy: dba400=# SELECT count(*) from annuncio400 where rubric='DD'; count --- 6753 (1 row) dba400=# SELECT count(*) from annuncio400 where rubric='MA'; count --- 2165 (1 row) so it's using the index on 2000 rows and not for 6000? it's not that big difference, isn't it? It's a question of how many pages it thinks it's going to have to retrieve in order to handle the request. If it say needs (or think it needs) to retrieve 50% of the pages, then given a random_page_cost of 4, it's going to expect the index scan to be about twice the cost. Generally speaking one good way to compare is to try the query with explain analyze and then change parameters like enable_seqscan and try the query with explain analyze again and compare the estimated rows and costs. That'll give an idea of how it expects the two versions of the query to compare speed wise. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Wierd context-switching issue on Xeon
After some testing if you use the current head code for s_lock.c which has some mods in it to alleviate this situation, and change SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test. I am seeing a slight degradation in throughput using pgbench -c 10 -t 1000 but it might be liveable, considering the alternative is unbearable in some situations. Can anyone else replicate my results? Dave On Wed, 2004-04-21 at 08:10, Dirk_Lutzebäck wrote: It is intended to run indefinately. Dirk [EMAIL PROTECTED] wrote: How long is this test supposed to run? I've launched just 1 for testing, the plan seems horrible; the test is cpu bound and hasn't finished yet after 17:02 min of CPU time, dual XEON 2.6G Unixware 713 The machine is a Fujitsu-Siemens TX 200 server ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) !DSPAM:40866735106778584283649! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Help understanding stat tables
I just want to make sure that I am interpreting this data correctly. From pg_statio_user_tables, I have pulled relname, heap_blks_read, heap_blks_hit. I get several rows like this: relname heap_bkls_read heap_blks_hit clmhdr 8607161 196547165 So this means that I am getting over a 100% cache hit ratio for this table, right? If not, please help me understand what these numbers mean. Thanks, Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Shared buffers, Sort memory, Effective Cache Size
Hello, I have a bi-PIII server with 2Gb of RAM with Debian and a PostgreSQL 7.4 running on. What are the bests settings for shared buffers, sort memory and effective cache size? My main database have a small/mid range size: some tables may have 1 or 2 millions of records. Thanks Frédéric Robinet [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Help understanding stat tables
I think I have figured my problem out. I was taking heap_blks_hit / heap_blks_read for my hit pct. It should be heap_blks_hit/(heap_blks_read+heap_blks_hit), correct? Thanks On Wednesday 21 April 2004 11:34, Chris Hoover wrote: I just want to make sure that I am interpreting this data correctly. From pg_statio_user_tables, I have pulled relname, heap_blks_read, heap_blks_hit. I get several rows like this: relname heap_bkls_read heap_blks_hit clmhdr 8607161 196547165 So this means that I am getting over a 100% cache hit ratio for this table, right? If not, please help me understand what these numbers mean. Thanks, Chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wierd context-switching issue on Xeon
Dave, After some testing if you use the current head code for s_lock.c which has some mods in it to alleviate this situation, and change SPINS_PER_DELAY to 10 you can drastically reduce the cs with tom's test. I am seeing a slight degradation in throughput using pgbench -c 10 -t 1000 but it might be liveable, considering the alternative is unbearable in some situations. Can anyone else replicate my results? Can you produce a patch against 7.4.1? I'd like to test your fix against a real-world database. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Folks, I've sent a polite e-mail to Mr. Gomez offering our help. Please, nobody flame him! -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slow seqscan
Edoardo Ceccarelli [EMAIL PROTECTED] writes: I wasn't able to make this 2 field index with lower: dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, LOWER(testo)); ERROR: parser: parse error at or near ( at character 71 seems impossible to creat 2 field indexes with lower function. You need 7.4 to do that; previous releases don't support multi-column functional indexes. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon
Dave: Why would test and set increase context swtches: Note that it *does not increase* context swtiches when the two threads are on the two cores of a single Xeon processor. (use taskset to force affinity on linux) Scenario: If the two test and set processes are testing and setting the same bit as each other, then they'll see worst case cache coherency misses. They'll ping a cache line back and forth between CPUs. Another case, might be that they're tesing and setting different bits or words, but those bits or words are always in the same cache line, again causing worst case cache coherency and misses. The fact that tis doesn't happen when the threads are bound to the 2 cores of a single Xeon suggests it's because they're now sharing L1 cache. No pings/bounces. I wonder do the threads stall so badly when pinging cache lines back and forth, that the kernel sees it as an opportunity to put the process to sleep? or do these worst case misses cause an interrupt? My question is: What is it that the two threads waiting for when they spin? Is it exactly the same resource, or two resources that happen to have test-and-set flags in the same cache line? On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote: I modified the code in s_lock.c to remove the spins #define SPINS_PER_DELAY 1 and it doesn't exhibit the behaviour This effectively changes the code to while(TAS(lock)) select(1); // 10ms Can anyone explain why executing TAS 100 times would increase context switches ? Dave On Tue, 2004-04-20 at 12:59, Josh Berkus wrote: Anjan, Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon
Paul Tuckfield [EMAIL PROTECTED] writes: I wonder do the threads stall so badly when pinging cache lines back and forth, that the kernel sees it as an opportunity to put the process to sleep? or do these worst case misses cause an interrupt? No; AFAICS the kernel could not even be aware of that behavior. The context swap storm is happening because of contention at the next level up (LWLocks rather than spinlocks). It could be an independent issue that just happens to be triggered by the same sort of access pattern. I put forward a hypothesis that the cache miss storm caused by the test-and-set ops induces the context swap storm by making the code more likely to be executing in certain places at certain times ... but it's only a hypothesis. Yesterday evening I had pretty well convinced myself that they were indeed independent issues: profiling on a single-CPU machine was telling me that the test case I proposed spends over 10% of its time inside ReadBuffer, which certainly seems like enough to explain a high rate of contention on the BufMgrLock, without any assumptions about funny behavior at the hardware level. However, your report and Dave's suggest that there really is some linkage. So I'm still confused. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Paul Thomas wrote: Looks like he's using the default postgresql.conf settings in which case I'm not suprised at pg looking so slow. His stated use of foreign keys invalidates the tests anyway as MyISAM tables don't support FKs so we're probably seeing FK check overheads in pg that are simply ignore by MySQL. In an honest test, MySQL should be reported as failing those tests. Either failures, or they should not have been using MyISAM, they should have used the table format that supports FK's. This is just not apples to apples. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help understanding stat tables
Chris Hoover [EMAIL PROTECTED] writes: I was taking heap_blks_hit / heap_blks_read for my hit pct. It should be heap_blks_hit/(heap_blks_read+heap_blks_hit), correct? Right. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon
FYI, I am doing my testing on non hyperthreading dual athlons. Also, the test and set is attempting to set the same resource, and not simply a bit. It's really an lock;xchg in assemblelr. Also we are using the PAUSE mnemonic, so we should not be seeing any cache coherency issues, as the cache is being taken out of the picture AFAICS ? Dave On Wed, 2004-04-21 at 14:19, Paul Tuckfield wrote: Dave: Why would test and set increase context swtches: Note that it *does not increase* context swtiches when the two threads are on the two cores of a single Xeon processor. (use taskset to force affinity on linux) Scenario: If the two test and set processes are testing and setting the same bit as each other, then they'll see worst case cache coherency misses. They'll ping a cache line back and forth between CPUs. Another case, might be that they're tesing and setting different bits or words, but those bits or words are always in the same cache line, again causing worst case cache coherency and misses. The fact that tis doesn't happen when the threads are bound to the 2 cores of a single Xeon suggests it's because they're now sharing L1 cache. No pings/bounces. I wonder do the threads stall so badly when pinging cache lines back and forth, that the kernel sees it as an opportunity to put the process to sleep? or do these worst case misses cause an interrupt? My question is: What is it that the two threads waiting for when they spin? Is it exactly the same resource, or two resources that happen to have test-and-set flags in the same cache line? On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote: I modified the code in s_lock.c to remove the spins #define SPINS_PER_DELAY 1 and it doesn't exhibit the behaviour This effectively changes the code to while(TAS(lock)) select(1); // 10ms Can anyone explain why executing TAS 100 times would increase context switches ? Dave On Tue, 2004-04-20 at 12:59, Josh Berkus wrote: Anjan, Quad 2.0GHz XEON with highest load we have seen on the applications, DB performing great - Can you run Tom's test? It takes a particular pattern of data access to reproduce the issue. -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match !DSPAM:4086c4d0263544680737483! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [pgsql-advocacy] [PERFORM] MySQL vs PG TPC-H benchmarks
Josh Berkus wrote: Folks, I've sent a polite e-mail to Mr. Gomez offering our help. Please, nobody flame him! Please keep in mind that the entire test has, other than a similar database schema and query types maybe, nothing to do with a TPC-H. I don't see any kind of SUT. Foreign key support on the DB level is not required by any of the TPC benchmarks. But the System Under Test, which is the combination of middleware application and database together with all computers and network components these parts are running on, must implement all the required semantics, like ACID properties, referential integrity c. One could implement a TPC-H with flat files, it's just a major pain in the middleware. A proper TPC benchmark implementation would for example be a complete PHP+DB application, where the user interaction is done by an emulated browser and what is measured is the http response times, not anything going on between PHP and the DB. Assuming that all requirements of the TPC specification are implemented by either using available DB features, or including appropriate workarounds in the PHP code, that would very well lead to something that can compare PHP+MySQL vs. PHP+PostgreSQL. All TPC benchmarks I have seen are performed by timing such a system after a considerable rampup time, giving the DB system a chance to properly populate caches and so forth. Rebooting the machine just before the test is the wrong thing here and will especially kill any advanced cache algorithms like ARC. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
attached. -- Dave Cramer 519 939 0336 ICQ # 14675561 Index: backend/storage/lmgr/s_lock.c === RCS file: /usr/local/cvs/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.16 diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 10 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 *** *** 88,93 --- 88,94 while (TAS(lock)) { + __asm__ __volatile__ ( rep;nop: : :memory); if (++spins SPINS_PER_DELAY) { if (++delays NUM_DELAYS) Index: include/storage/s_lock.h === RCS file: /usr/local/cvs/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.115.2.1 diff -c -r1.115.2.1 s_lock.h *** include/storage/s_lock.h 4 Nov 2003 09:43:56 - 1.115.2.1 --- include/storage/s_lock.h 21 Apr 2004 20:26:25 - *** *** 103,110 register slock_t _res = 1; __asm__ __volatile__( ! lock \n xchgb %0,%1 \n : =q(_res), =m(*lock) : 0(_res)); return (int) _res; --- 103,113 register slock_t _res = 1; __asm__ __volatile__( ! cmpb $0,%1 \n ! jne 1f \n ! lock \n xchgb %0,%1 \n + 1:\n : =q(_res), =m(*lock) : 0(_res)); return (int) _res; ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Moving postgres to FC disks
On Tue, 2004-04-20 at 17:27, Joshua D. Drake wrote: -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? XFS What Linux distributions are popular in here for PG+XFS? I'm very disappointed that Redhat Enterprise 3 doesn't appear to support XFS/JFS, or anything else. Suse Server 8 seems very dated, at least from the eval I downloaded. I'm curious as to where other people have gone with the death of RH9. I'd have gone on to Redhat 3 if I wasn't interested in getting some of the benefits of XFS at the same time ... ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon
Kenneth Marshall [EMAIL PROTECTED] writes: If the context swap storm derives from LWLock contention, maybe using a random order to assign buffer locks in buf_init.c would prevent simple adjacency of buffer allocation to cause the storm. Good try, but no cigar ;-). The test cases I've been looking at take only shared locks on the per-buffer locks, so that's not where the context swaps are coming from. The swaps have to be caused by the BufMgrLock, because that's the only exclusive lock being taken. I did try increasing the allocated size of the spinlocks to 128 bytes to see if it would do anything. It didn't ... regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave Cramer [EMAIL PROTECTED] writes: diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 10 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 As far as I can tell, this does reduce the rate of semop's significantly, but it does so by bringing the overall processing rate to a crawl :-(. I see 97% CPU idle time when using this patch. I believe what is happening is that the select() delay in s_lock.c is being hit frequently because the spin loop isn't allowed to run long enough to let the other processor get out of the spinlock. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Bruce Momjian [EMAIL PROTECTED] writes: For BSDOS it has: #if (CLIENT_OS == OS_FREEBSD) || (CLIENT_OS == OS_BSDOS) || \ (CLIENT_OS == OS_OPENBSD) || (CLIENT_OS == OS_NETBSD) { /* comment out if inappropriate for your *bsd - cyp (25/may/1999) */ int ncpus; size_t len = sizeof(ncpus); int mib[2]; mib[0] = CTL_HW; mib[1] = HW_NCPU; if (sysctl( mib[0], 2, ncpus, len, NULL, 0 ) == 0) //if (sysctlbyname(hw.ncpu, ncpus, len, NULL, 0 ) == 0) cpucount = ncpus; } Multiplied by how many platforms? Ewww... I was wondering about some sort of dynamic adaptation, roughly along the lines of whenever a spin loop successfully gets the lock after spinning, decrease the allowed loop count by one; whenever we fail to get the lock after spinning, increase by 100; if the loop count reaches, say, 1, decide we are on a uniprocessor and irreversibly set it to 1. As written this would tend to incur a select() delay once per hundred spinlock acquisitions, which is way too much, but I think we could make it work with a sufficiently slow adaptation rate. The tricky part is that a slow adaptation rate means we can't have every backend figuring this out for itself --- the right value would have to be maintained globally, and I'm not sure how to do that without adding a lot of overhead. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend