Re: [PERFORM] [ADMIN] Index not used. WHY?
There are around 700 rows in this table. If I set enable_seqscan=off then the index is used and I also used Vacuum Analyze recently. I find it strange because the number of values of id_user and id_modull are somehow in the same distribution and when I search the table the id_user index is used but the id_modull index is not used. Does somehow postgre know that a seq scan runs faster in this case as a index scan? Should I erase this index? I have to say that the data's in this table are not changed offen, but there are a LOT of joins made with this table. Best regards. Andy. - Original Message - From: "Stephan Szabo" <[EMAIL PROTECTED]> To: "Andrei Bintintan" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, December 04, 2003 5:19 PM Subject: Re: [ADMIN] Index not used. WHY? > > On Thu, 4 Dec 2003, Andrei Bintintan wrote: > > > Hi, > > > > I have the following table: > > CREATE TABLE public.rights ( > > id int4 DEFAULT nextval('"rights_id_seq"'::text) NOT NULL, > > id_user int4 NOT NULL, > > id_modull int4 NOT NULL, > > CONSTRAINT rights_pkey PRIMARY KEY (id) > > ) > > > > and I created the following indexes: > > > > CREATE INDEX right_id_modull_idx ON rights USING btree (id_modull); > > CREATE INDEX right_id_user_idx ON rights USING btree (id_user); > > > > Now the problem: > > > > EXPLAIN SELECT * FROM rights r WHERE r.id_modull =15 > > returnes: > > Seq Scan on rights r (cost=0.00..12.30 rows=42 width=12) > > Filter: (id_modull = 15) > > > > EXPLAIN SELECT * FROM rights r WHERE r.id_user =15 > > returnes: > > Index Scan using right_id_user_idx on rights r (cost=0.00..8.35 rows=11 width=12) > > Index Cond: (id_user = 15) > > > > Question: Why the right_id_modull_idx is NOT USED at the 1st query and > > the second query the right_id_user_idx index is used. > > As a note, pgsql-performance is a better list for these questions. > > So, standard questions: > > How many rows are in the table, what does EXPLAIN ANALYZE show for the > queries, if you force index usage (set enable_seqscan=off) on the first > what does EXPLAIN ANALYZE show then, have you used ANALYZE/VACUUM ANALYZE > recently? > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [ADMIN] Index not used. WHY?
Andrei Bintintan wrote: There are around 700 rows in this table. If I set enable_seqscan=off then the index is used and I also used Vacuum Analyze recently. For 700 rows I think seq. would work best. I find it strange because the number of values of id_user and id_modull are somehow in the same distribution and when I search the table the id_user index is used but the id_modull index is not used. Does somehow postgre know that a seq scan runs faster in this case as a index scan? Should I erase this index? I have to say that the data's in this table are not changed offen, but there are a LOT of joins made with this table. If table is cached then it does not matter. Unless it grows substantially, say to around hundred thousand rows(Note your table is small), idex wouldn't be that useful. Shridhar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] tuning questions
Jack Coates wrote: > > latest changes: > shared_buffers = 35642 > max_fsm_relations = 1000 > max_fsm_pages = 1 > wal_buffers = 64 > sort_mem = 32768 > vacuum_mem = 32768 > effective_cache_size = 1 > > /proc/sys/kernel/shmmax = 5 > > IO is active, but hardly saturated. CPU load is hefty though, load > average is at 4 now. > >procs memoryswap io > system cpu > r b w swpd free buff cache si sobibo incs us > sy id > 0 2 1 2808 11436 39616 1902988 0 0 240 896 765 469 > 2 11 87 > 0 2 1 2808 11432 39616 1902988 0 0 244 848 768 540 > 4 3 93 > 0 2 1 2808 11432 39616 1902984 0 0 204 876 788 507 > 3 4 93 > 0 2 1 2808 11432 39616 1902984 0 0 360 416 715 495 > 4 1 96 > 0 2 1 2808 11432 39616 1902984 0 0 376 328 689 441 > 2 1 97 > 0 2 0 2808 11428 39616 1902976 0 0 464 360 705 479 > 2 1 97 > 0 2 1 2808 11428 39616 1902976 0 0 432 380 718 547 > 3 1 97 > 0 2 1 2808 11428 39616 1902972 0 0 440 372 742 512 > 1 3 96 > 0 2 1 2808 11428 39616 1902972 0 0 416 364 711 504 > 3 1 96 > 0 2 1 2808 11424 39616 1902972 0 0 456 492 743 592 > 2 1 97 > 0 2 1 2808 11424 39616 1902972 0 0 440 352 707 494 > 2 1 97 > 0 2 1 2808 11424 39616 1902972 0 0 456 360 709 494 > 2 2 97 > 0 2 1 2808 11436 39616 1902968 0 0 536 516 807 708 > 3 2 94 > Hi Jack, As show by vmstat, your Operating System is spending 96% of its time in Idle. On RedHat 8.0 IA32, Idle means idle and Wait I/O. In your case, i think they are Wait I/O as you are working on 2.8 GB DB with only 2GB RAM, but it should be arround 30%. Your performances whould increase only if User CPU increase otherwise, for exemple if your system swap, only Sys CPU whould increase and your application will stay slow. You can better check your I/O with : iostat 3 1000, and check that the max tps are on the database filesystem. So, all the Postgres tuning you have tried do not change a lot as the bottleneck is your I/O throuput. But, one thing you can check is which parts of Postgres need a lot of I/O. To do that, after shuting down PG, move your database on an other disk (OS disk ?) for exemple /mypg/data and create a symblolic link for /mypg/data/ to $PGDATA/base. Restart PG, and while you execute your application, check with iostat which disk as the max of tps. I bet, it is the disk where the WAL buffer are logged. One more thing about I/O, for an IDE disk, the maximum number of Write Block + Read Block per sec is about 1 based on the I/O block size is 1 K. That means 10 Mb/s. if you need more, you can try Stripped SCSI disks or RAID0 subsystem disks. Thierry Missimilly > > -- > Jack Coates, Lyris Technologies Applications Engineer > 510-549-4350 x148, [EMAIL PROTECTED] > "Interoperability is the keyword, uniformity is a dead end." > --Olivier Fourdan > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly begin:vcard n:Missimilly;Thierry tel;fax:+33 (0)4 76 29 78 78 tel;work:+33 (0)4 76 29 74 54 x-mozilla-html:FALSE url:http:\\www.bull.com org:BIS/R&D adr:;;Bull SA, 1, rue de provence - BP 208;ECHIROLLES;;38432;FRANCE version:2.1 email;internet:[EMAIL PROTECTED] x-mozilla-cpt:;-18184 fn:Thierry Missimilly end:vcard ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Friday 05 December 2003 02:07, Ivar Zarans wrote: > I have played around with explain and explain analyze and noticed one > interesting oddity: [snip] > Why first example, where recid is given as numeric constant, is using > sequential scan, but second example, where recid is given as string > constant works with index scan, as expected? Third example shows, that > numeric constant must be typecasted in order to function properly. > > Is this normal behaviour of fields with bigint type? As Christopher says, normal (albeit irritating). Not sure it applies here - all the examples you've shown me are using the index. Well - I must admit I'm stumped. Unless you have a *lot* of indexes and foreign keys to check, I can't see why it would take so long to update a single row. Can you post the schema for the table? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Fri, Dec 05, 2003 at 10:08:20AM +, Richard Huxton wrote: > > numeric constant must be typecasted in order to function properly. > > > > Is this normal behaviour of fields with bigint type? > > As Christopher says, normal (albeit irritating). Not sure it applies here - > all the examples you've shown me are using the index. I guess i have solved this mystery. Problem appears to be exactly with this - numeric constant representation in query. I am using PyPgSQL for PostgreSQL access and making update queries as this: qry = "UPDATE table1 SET status = %s WHERE recid = %s" cursor.execute(qry, status, recid) Execute method of cursor object is supposed to merge "status" and "recid" values into "qry", using proper quoting. When i started to play around with debug information i noticed, that this query used sequential scan for "recid". Then i also noticed, that query, sent to server looked like this: "UPDATE table1 SET status = 'SKIP' WHERE recid = 199901" Sure enough, when i used psql and EXPLAIN on this query, i got query plan with sequential scan. And using recid value as string or typecasted integer gave correct results with index scan. I wrote about this in my previous message. It seems, that PyPgSQL query quoting is not aware of this performance problem (to which Cristopher referred) and final query, sent to server is correct SQL, but not correct, considering PostgreSQL bugs. One more explanation - previously i posted some logs, showing correct query, using index scan, but still taking 0.29 seconds. Reason for this delay is logging itself - it generates enough IO traffic to have impact on query speed. With logging disabled, this query takes around 0.0022 seconds, which is perfectly normal. Finally - what would be correct solution to this problem? Upgrading to 7.5 CVS is not an option :) One possibility is not to use PyPgSQL variable substitution and create every query "by hand" - not very nice solution, since variable substitution and quoting is quite convenient. Second (and better) possibility is to ask PyPgSQL develeopers to take care of PostgreSQL oddities. Any other suggestions? -- Ivar ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Slow UPADTE, compared to INSERT
Ivar Zarans wrote: It seems, that PyPgSQL query quoting is not aware of this performance problem (to which Cristopher referred) and final query, sent to server is correct SQL, but not correct, considering PostgreSQL bugs. Personally I don't consider a bug but anyways.. You are the one facing problem so I understand.. Finally - what would be correct solution to this problem? Upgrading to 7.5 CVS is not an option :) One possibility is not to use PyPgSQL variable substitution and create every query "by hand" - not very nice solution, since variable substitution and quoting is quite convenient. Second (and better) possibility is to ask PyPgSQL develeopers to take care of PostgreSQL oddities. Any other suggestions? I know zero in python but just guessing.. Will following help? qry = "UPDATE table1 SET status = %s WHERE recid = '%s'" cursor.execute(qry, status, recid) Just a thought.. Shridhar ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Fri, Dec 05, 2003 at 06:19:46PM +0530, Shridhar Daithankar wrote: > >is correct SQL, but not correct, considering PostgreSQL bugs. > > Personally I don't consider a bug but anyways.. You are the one facing > problem so I understand.. Well, if this is not bug, then what is consideration behind this behaviour? BTW, according to Cristopher it is fixed in 7.5 CVS. Why fix it if this is not a bug? :)) One more question - is this "feature" related only to "bigint" fields, or are other datatypes affected as well? > Will following help? > > qry = "UPDATE table1 SET status = %s WHERE recid = '%s'" > cursor.execute(qry, status, recid) Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting logic. I would prefer to take care of this all by myself or trust some underlying code to do this for me. And PyPgSQL is quite nice - it checks datatype and acts accordingly. -- Ivar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Friday 05 December 2003 12:49, Shridhar Daithankar wrote: > Ivar Zarans wrote: > > It seems, that PyPgSQL query quoting is not aware of this performance > > problem (to which Cristopher referred) and final query, sent to server > > is correct SQL, but not correct, considering PostgreSQL bugs. > > Will following help? > > qry = "UPDATE table1 SET status = %s WHERE recid = '%s'" > cursor.execute(qry, status, recid) Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8" PG is very strict regarding types - normally a good thing, but it can hit you unexpectedly in this scenario. The reason is that the literal number is treated as int4, whereas quoted it is marked as type unknown. Unkown gets cast to int8, whereas int4 gets left as-is. If you want to know why int4 doesn't get promoted to int8 automatically, browse the hackers list for the last couple of years. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow UPADTE, compared to INSERT
Ivar Zarans wrote: On Fri, Dec 05, 2003 at 06:19:46PM +0530, Shridhar Daithankar wrote: is correct SQL, but not correct, considering PostgreSQL bugs. Personally I don't consider a bug but anyways.. You are the one facing problem so I understand.. Well, if this is not bug, then what is consideration behind this behaviour? BTW, according to Cristopher it is fixed in 7.5 CVS. Why fix it if this is not a bug? :)) This is not a bug. It is just that people find it confusing when postgresql planner consider seemingly same type as different. e.g. treating int8 as different than int4. Obvious thinking is they should be same. But given postgresql's flexibility with create type, it is difficult to promote. AFAIK, the fix in CVS is to make indexes operatable with seemingly compatible types. Which does not change the fact that postgresql can not upgrade data types on it's own. Write good queries which adhere to strict data typing. It is better to understand anyway. One more question - is this "feature" related only to "bigint" fields, or are other datatypes affected as well? Every data type is affected. int2 will not use a int4 index and so on. Will following help? qry = "UPDATE table1 SET status = %s WHERE recid = '%s'" cursor.execute(qry, status, recid) Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting logic. I would prefer to take care of this all by myself or trust some underlying code to do this for me. And PyPgSQL is quite nice - it checks datatype and acts accordingly. Well, then pypgsql should be upgraded to query the pg catalogd to find exact type of column. But that would be too cumbersome I guess. Shridhar ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] [ADMIN] Index not used. WHY?
On Fri, 5 Dec 2003, Andrei Bintintan wrote: > There are around 700 rows in this table. > If I set enable_seqscan=off then the index is used and I also used Vacuum > Analyze recently. > > I find it strange because the number of values of id_user and id_modull are > somehow in the same distribution and when I search the table the id_user > index is used but the id_modull index is not used. It was guessing that one would return 11 rows and the other 42 which is why one used the index and the other wouldn't. If those numbers aren't realistic, you may want to raise the statistics target for the columns (see ALTER TABLE) and re-run analyze. > Does somehow postgre know that a seq scan runs faster in this case as a > index scan? Should I erase this index? It's making an educated guess. When you're doing an index scan, it needs to read through the index and then get matching rows from the table. However, because those reads from the table are in a potentially random order, there's usually a higher cost associated with those reads than if the table was read in order (barring cases where you know your database should always stay cached in disk cache, etc...). If there's say 50 pages in the entire table, a sequence scan does 50 sequential page reads and is checking all those tuples. If you're getting say 42 rows through an index, you're first reading through the index, and then getting pages in a random order from the table where depends on the distribution of values throughout the table. There's a variable in the configuration, random_page_cost which controls the ratio of cost between a sequential read and a random one (defaulting to 4). ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Fri, Dec 05, 2003 at 01:23:43PM +, Richard Huxton wrote: > Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8" Thanks for the hint! > unexpectedly in this scenario. The reason is that the literal number is > treated as int4, whereas quoted it is marked as type unknown. Unkown gets > cast to int8, whereas int4 gets left as-is. This explains a lot. Thanks! BTW, is this mentioned somewhere in PostgreSQL documentation? I can't remember anything on this subject. Maybe i just somehow skipped it... -- Ivar ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Fri, Dec 05, 2003 at 07:21:38PM +0530, Shridhar Daithankar wrote: > planner consider seemingly same type as different. e.g. treating int8 as > different than int4. Obvious thinking is they should be same. But given > postgresql's flexibility with create type, it is difficult to promote. OK, this makes sense and explains a lot. Thanks! > Well, then pypgsql should be upgraded to query the pg catalogd to find > exact type of column. But that would be too cumbersome I guess. Yes, so it seems. Time to rewrite my queries :) Thanks again for help and explanations! -- Ivar ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow UPADTE, compared to INSERT
I just spent 2 days tracking this error down in my own code, actually. What I wound up doing is having the two places where I generate the queries (everything in my system goes through those two points, as I'm using a middleware layer) check values used as identifying fields for the presence of a bigint, and if one exists, replaces it with a wrapper that does the coerced-string representation: class Wrap: def __init__( self, value ): self.value = value def __str__( self ): return "'%s'::bigint"%(self.value,) __repr__ = __str__ value = Wrap(value) Just doing that for the indexing/identifying values ATM. pyPgSQL will back up to using simple repr for the object (rather than raising an error as it would if you were using a formatted string), but will otherwise treat it as a regular value for quoting and the like, so no other modifications to the code required. By no means an elegant fix, but since your post (well, the resulting thread) managed to solve my problem, figured I should at least tell everyone thanks and how I worked around the problem. You wouldn't want this kind of hack down in the pyPgSQL level I would think, as it's DB-version specific. I suppose you could alter the __repr__ of the PgInt8 class/type to always use the string or coerced form, but it seems wrong to me. I'm actually hesitant to include it in our own middleware layer, but oh well, it does seem to be necessary for even somewhat reasonable performance. BTW, my case was a largish (88,000 record) table with a non-unique bigint key, explain on update shows sequential search, while with 'int'::bigint goes to index search. Using pyPgSQL as the interface to 7.3.4 and 7.3.3. Enjoy, Mike Ivar Zarans wrote: On Fri, Dec 05, 2003 at 10:08:20AM +, Richard Huxton wrote: ... I am using PyPgSQL for PostgreSQL access and making update queries as this: ... It seems, that PyPgSQL query quoting is not aware of this performance problem (to which Cristopher referred) and final query, sent to server is correct SQL, but not correct, considering PostgreSQL bugs. ... Finally - what would be correct solution to this problem? Upgrading to 7.5 CVS is not an option :) One possibility is not to use PyPgSQL variable substitution and create every query "by hand" - not very nice solution, since variable substitution and quoting is quite convenient. Second (and better) possibility is to ask PyPgSQL develeopers to take care of PostgreSQL oddities. Any other suggestions? ___ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow UPADTE, compared to INSERT
Ivar Zarans wrote: On Fri, Dec 05, 2003 at 01:23:43PM +, Richard Huxton wrote: Better IMHO would be: "UPDATE table1 SET status = %s WHERE recid = %s::int8" Thanks for the hint! Which makes the wrapper class need: def __str__( self ): return "%s::int8"%(self.value,) Enjoy, Mike ___ Mike C. Fletcher Designer, VR Plumber, Coder http://members.rogers.com/mcfletch/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] tuning questions
Jack, > The frustrating thing is, we also have a UP P3-500 with 512M RAM and two > IDE drives with the same PG install which is doing okay with this load > -- still half the speed of MS-SQL2K, but usable. I'm at a loss. Overall, I'm really getting the feeling that this procedure was optimized for Oracle and/or MSSQL and is hitting some things that aren't such a good idea for PostgreSQL. I highly suggest that you try using log_duration and log_statement (and in 7.4 log_min_duration_statement) to try to locate which particular statements are taking the longest. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow UPADTE, compared to INSERT
Ivar Zarans <[EMAIL PROTECTED]> writes: > > qry = "UPDATE table1 SET status = %s WHERE recid = '%s'" > > cursor.execute(qry, status, recid) > > Yes, this helps. But then it sort of obsoletes PyPgSQL-s own quoting > logic. I would prefer to take care of this all by myself or trust some > underlying code to do this for me. And PyPgSQL is quite nice - it > checks datatype and acts accordingly. You should tell the PyPgSQL folk to use the new binary protocol for parameters so that there are no quoting issues at all. But if it's going to interpolate strings into the query then pyPgSQL really ought to be doing '%s' as above even for numbers. This lets postgres decide what the optimal datatype is based on what you're comparing it to. Skipping the quotes will only cause headaches. -- greg ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Slightly OT -- Xeon versus Opteron Comparison
Ace's Hardware has put together a fairly comprehensive comparison between Xeon & Opteron platforms running server apps. Unfortunately, only MySQL "data mining" benchmarks as the review crew doesn't have that much experience with OLTP-type systems but I'm gonna try to convince them to add the ODSL DB benchmarks assuming they work fairly well with Postgres. Read up the goodies here: http://www.aceshardware.com/read.jsp?id=6275 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] query using cpu nearly 100%, why?
Hello, I use php as front-end to query our database. When I use System Monitor to check the usage of cpu and memory, I noticed that the cpu very easily gets up to 100%. Is that normal? if not, could someone points out possible reason? I am using linux7.3, pgsql 7.3.4, 1G Memory and 2GHz CPU. Regards, William ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Help tracking down problem with inserts slowing down...
I need some help tracking down a sudden, massive slowdown in inserts in one of our databases. PG: 7.2.3 (RedHat 8.0) Background. We currently run nearly identical systems at two sites: Site A is a 'lab' site used for development, Site B is a production site. The databases in question have identical structure: A simple table with 4 columns with a trigger function on inserts (which checks to see if the entry already exists, and if so, changes the insert into an update...) A simple view with 4 columns into the above table. All access is through jdbc (JDK 1.3.1, jdbc 7.1-1.3), postgresql.conf's are identical. The two sites were performing at comparable speeds until a few days ago, when we deleted several million records from each database and then did a vacuum full; analyze on both. Now inserts at Site B are several orders of magnitude slower than at Site A. The odd thing is that Site B's DB now has only 60,000 records while Site A's is up around 3 million. Inserts at A average 63ms, inserts at B are now up at 4.5 seconds! EXPLAIN doesn't show any difference between the two. Can someone suggest ways to track this down? I don't know much about postgresql internals/configuration. Thanks! Steve -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Help tracking down problem with inserts slowing
Steve Wampler <[EMAIL PROTECTED]> writes: > PG: 7.2.3 (RedHat 8.0) You're using PG 7.2.3 with the PG 7.1 JDBC driver; FWIW, upgrading to newer software is highly recommended. > The two sites were performing at comparable speeds until a few days > ago, when we deleted several million records from each database and > then did a vacuum full; analyze on both. Now inserts at Site B are > several orders of magnitude slower than at Site A. Two thoughts: (1) Can you confirm that the VACUUM FULL on site B actually removed all the tuples you intended it to remove? Concurrent transactions can limit the amount of data that VACUUM FULL is able to reclaim. If you run contrib/pgstattuple (or compare the database's disk consumption with the number of live rows in it), you should be able to tell. (2) Look at the EXPLAIN for the SELECTs generated by the ON INSERT trigger -- is there any difference between site A and B? -Neil ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] tuning questions
On Fri, 2003-12-05 at 09:26, Josh Berkus wrote: > Jack, > > > The frustrating thing is, we also have a UP P3-500 with 512M RAM and two > > IDE drives with the same PG install which is doing okay with this load > > -- still half the speed of MS-SQL2K, but usable. I'm at a loss. > > Overall, I'm really getting the feeling that this procedure was optimized for > Oracle and/or MSSQL and is hitting some things that aren't such a good idea > for PostgreSQL. I highly suggest that you try using log_duration and > log_statement (and in 7.4 log_min_duration_statement) to try to locate which > particular statements are taking the longest. I'll definitely buy that as round two of optimization, but round one is still "it's faster on the slower server." hdparm -I is identical between the boxes, filesystem structure layout is identical, disk organization isn't identical, but far worse: the UP low ram box has PG on /dev/hdb, ew. Predictably, vmstat shows low numbers... but steady numbers. dev is the box which goes fast, and I was wrong, it's actually a 2GHz P4. rufus is the box which goes slow. During the big fetch: dev bi sits around 2000 blocks for twenty seconds while bo is around 50 blocks, then bo jumps to 800 or so while the data is returned, then we're done. rufus bi starts at 16000 blocks, then drops steadily while bo climbs. After a minute or so, bi stabilizes at 4096 blocks, then bo bursts to return the data. Then the next fetch starts, and it's bi of 500, bo of 300 for several minutes. These observations certainly all point to Eric and Thierry's recommendations to better organize the filesystem and get faster disks.. except that the dev box gets acceptable performance. So, I've dug into postgresql.conf on dev and rufus, and here's what I found: RUFUS how much ram do you have? 75% converted to 8K pages of that for effective_cache 15% of that or 512M, whichever is larger, converted to 8K pages for shared_buffers 15% of that converted to 8K pages for vacuum_mem how many messages will you send between vacuums? divide that by 2 and divide by 6 for max_fsm_pages DEV how much ram do you have? 48% converted to 8K pages of that for effective_cache 6.5% of that or 512M, whichever is larger, converted to 8K pages for shared_buffers 52% of that converted to 8K pages for vacuum_mem max_fsm_pages untouched on this box. I adjusted rufus's configuration to match those percentages, but left max_fsm_pages dialed up to 50. Now Rufus's vmstat shows much better behavior: bi 12000 blocks gradually sloping down to 3000 during the big select, bo steady until it's ready to return. As more jobs come in, we see overlap areas where bi is 600-ish and bo is 200-ish, but they only last a few tens of seconds. The big selects are still a lot slower than they are on the smaller database and overall performance is still unacceptable. Next I dialed max_fsm_pages back down to 1 -- no change. Hm, maybe it's been too long since the last vacuumdb --analyze, let's give it another. hdparm -Tt shows that disk performance is crappo on rufus, half what it is on dev -- and freaking dev is using 16 bit IO! This is a motherboard IDE controller issue. South Bridge: VIA vt8233 Revision: ISA 0x0 IDE 0x6 That's it, I'm throwing out this whole test series and starting over with different hardware. Database server is now a dual 2GHz Xeon with 2GB RAM & 2940UW SCSI, OS and PG's logs on 36G drive, PG data on 9GB drive. Data is importing now and I'll restart the tests tonight. -- Jack Coates, Lyris Technologies Applications Engineer 510-549-4350 x148, [EMAIL PROTECTED] "Interoperability is the keyword, uniformity is a dead end." --Olivier Fourdan ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Help tracking down problem with inserts slowing down...
On Friday 05 December 2003 16:51, Steve Wampler wrote: > I need some help tracking down a sudden, massive slowdown > in inserts in one of our databases. > > PG: 7.2.3 (RedHat 8.0) > > Background. We currently run nearly identical systems > at two sites: Site A is a 'lab' site used for development, > Site B is a production site. > > The databases in question have identical structure: > > A simple table with 4 columns with a trigger function > on inserts (which checks to see if the entry already > exists, and if so, changes the insert into an update...) > A simple view with 4 columns into the above table. > > All access is through jdbc (JDK 1.3.1, jdbc 7.1-1.3), > postgresql.conf's are identical. > > The two sites were performing at comparable speeds until > a few days ago, when we deleted several million records > from each database and then did a vacuum full; analyze > on both. Now inserts at Site B are several orders of > magnitude slower than at Site A. The odd thing is that > Site B's DB now has only 60,000 records while Site A's is > up around 3 million. Inserts at A average 63ms, inserts > at B are now up at 4.5 seconds! > > EXPLAIN doesn't show any difference between the two. > > Can someone suggest ways to track this down? I don't know > much about postgresql internals/configuration. > What does explain analyze show for the insert query? Are there FK and/or Indexes involved here? Did you you reindex? A vacuum verbose could give you a good indication if you need to reindex, compare the # of pages in the index with the # in the table. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match