[PERFORM] timestamp indexing
We have a production database with transaction-style data, in most of the tables we have a timestamp attribute "created" telling the creation time of the table row. Naturally, this attribute is always increasing. By now we are hitting the limit where the table data does not fit in caches anymore. We have a report section where there are constantly requests for things like "sum up all transactions for the last two weeks", and those requests seem to do a full table scan, even though only the last parts of the table is needed - so by now those reports have started to cause lots of iowait. Is there any way to avoid this, apart from adding memory linearly with database growth, make adjunct tables for historical rows, or build a separate data warehousing system? There must be some simpler solutions, right? -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Postgresql and xeon.
What about xeon and postgresql, i have been told that postgresql wouldn't perform as well when running under xeon processors due to some cache trick that postgresql uses? Why? Any fix? Rumors? AMD Advocates? Problems with HT?? Would that problems only be true for 7.4.x? I didin't found any comprehensive analysis/explanation for this matters beside people saying , stop using xeon and postgresql. Enlightment please... Eric Lauzon [Recherche & Développement] Above Sécurité / Above Security Tél : (450) 430-8166 Fax : (450) 430-1858 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] timestamp indexing
On Mon, May 30, 2005 at 05:19:51PM +0800, Tobias Brox wrote: > > We have a production database with transaction-style data, in most of the > tables we have a timestamp attribute "created" telling the creation time of > the table row. Naturally, this attribute is always increasing. The message subject is "timestamp indexing" but you don't mention whether you have an index on the timestamp column. Do you? > By now we are hitting the limit where the table data does not fit in caches > anymore. We have a report section where there are constantly requests for > things like "sum up all transactions for the last two weeks", and those > requests seem to do a full table scan, even though only the last parts of > the table is needed - so by now those reports have started to cause lots of > iowait. Could you post an example query and its EXPLAIN ANALYZE output? If the query uses a sequential scan then it might also be useful to see the EXPLAIN ANALYZE output with enable_seqscan turned off. Since caching can cause a query to be significantly faster after being run several times, it might be a good idea to run EXPLAIN ANALYZE three times and post the output of the last run -- that should put the queries under comparison on a somewhat equal footing (i.e., we don't want to be misled about how much faster one query is than another simply because one query happened to use more cached data on a particular run). How many records are in the tables you're querying? Are you regularly vacuuming and analyzing the database or the individual tables? Are any of the tables clustered? If so, on what indexes and how often are you re-clustering them? What version of PostgreSQL are you using? -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Index not used on join with inherited tables
Hi all, I'm having another problem with a query that takes to long, because the appropriate index is not used. I found some solutions to this problem, but I think Postgres should do an index scan in all cases. To show the problem I've attached a small script with a testcase. Thanks in advance Sebastian /* tables */ CREATE TABLE test ( id SERIAL PRIMARY KEY, test TEXT ); CREATE TABLE test1 ( id INTEGER PRIMARY KEY ) INHERITS (test); CREATE TABLE test2 ( id INTEGER PRIMARY KEY ) INHERITS (test); CREATE TABLE test3 ( id INTEGER PRIMARY KEY ) INHERITS (test); CREATE TABLE bug ( id INTEGER PRIMARY KEY ); /* views */ CREATE OR REPLACE VIEW working_fast AS SELECT * FROM test WHERE id = 1; CREATE OR REPLACE VIEW working_fast_2 AS SELECT test1.* FROM test1 JOIN bug AS bug1 ON TRUE WHERE test1.id = bug1.id UNION ALL SELECT test2.* FROM test2 JOIN bug AS bug2 ON TRUE WHERE test2.id = bug2.id UNION ALL SELECT test3.* FROM test3 JOIN bug AS bug3 ON TRUE WHERE test3.id = bug3.id; CREATE OR REPLACE VIEW working_slow AS SELECT test.* FROM test JOIN bug ON TRUE WHERE test.id = bug.id; /* data */ CREATE OR REPLACE FUNCTION data () RETURNS BOOLEAN AS $$ BEGIN FOR i IN 1..1 LOOP INSERT INTO test1 (id,test) VALUES (DEFAULT,i); INSERT INTO test2 (id,test) VALUES (DEFAULT,i); INSERT INTO test3 (id,test) VALUES (DEFAULT,i); END LOOP; RETURN TRUE; END; $$ LANGUAGE plpgsql; SELECT data(); INSERT INTO bug VALUES ('1'); ANALYZE; EXPLAIN ANALYZE SELECT * from working_fast; EXPLAIN ANALYZE SELECT * from working_fast_2; EXPLAIN ANALYZE SELECT * from working_slow; ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql and xeon.
Eric, > What about xeon and postgresql, i have been told that > postgresql wouldn't perform as well when running > under xeon processors due to some cache trick that postgresql > uses? Search the archives of this list. This has been discussed ad nauseum. www.pgsql.ru -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Index not used on join with inherited tables
Sebastian, > I'm having another problem with a query that takes to long, because > the appropriate index is not used. PostgreSQL is not currently able to push down join criteria into UNIONed subselects. It's a TODO. Also, if you're using inherited tables, it's unnecessary to use UNION; just select from the parent. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] sequential scan performance
Thanks everyone for all the suggestions. I'll check into those contrib modules. Michael On May 29, 2005, at 2:44 PM, Oleg Bartunov wrote: Michael, I'd recommend our contrib/pg_trgm module, which provides trigram based fuzzy search and return results ordered by similarity to your query. Read http://www.sai.msu.su/~megera/postgres/gist/ pg_trgm/README.pg_trgm for more details. Oleg On Sun, 29 May 2005, Michael Engelhart wrote: Hi - I have a table of about 3 million rows of city "aliases" that I need to query using LIKE - for example: select * from city_alias where city_name like '%FRANCISCO' When I do an EXPLAIN ANALYZE on the above query, the result is: Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) (actual time=73.369..3330.281 rows=407 loops=1) Filter: ((name)::text ~~ '%FRANCISCO'::text) Total runtime: 3330.524 ms (3 rows) this is a query that our system needs to do a LOT. Is there any way to improve the performance on this either with changes to our query or by configuring the database deployment? We have an index on city_name but when using the % operator on the front of the query string postgresql can't use the index . Thanks for any help. Mike ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) Regards, Oleg _ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Index not used on join with inherited tables
Josh Berkus wrote: Sebastian, I'm having another problem with a query that takes to long, because the appropriate index is not used. PostgreSQL is not currently able to push down join criteria into UNIONed subselects. It's a TODO. And the appends in a "SELECT * from parent" are UNIONs, aren't they? Also, if you're using inherited tables, it's unnecessary to use UNION; just select from the parent. Yes, but then no index is used... Sebastian ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] sequential scan performance
On Sun, May 29, 2005 at 08:27:26AM -0500, Michael Engelhart wrote: > Hi - > > I have a table of about 3 million rows of city "aliases" that I need > to query using LIKE - for example: > > select * from city_alias where city_name like '%FRANCISCO' > > > When I do an EXPLAIN ANALYZE on the above query, the result is: > > Seq Scan on city_alias (cost=0.00..59282.31 rows=2 width=42) > (actual time=73.369..3330.281 rows=407 loops=1) >Filter: ((name)::text ~~ '%FRANCISCO'::text) > Total runtime: 3330.524 ms > (3 rows) > > > this is a query that our system needs to do a LOT. Is there any way > to improve the performance on this either with changes to our query > or by configuring the database deployment? We have an index on > city_name but when using the % operator on the front of the query > string postgresql can't use the index . If that's really what you're doing (the wildcard is always at the beginning) then something like this create index city_name_idx on foo (reverse(city_name)); select * from city_alias where reverse(city_name) like reverse('%FRANCISCO'); should do just what you need. I use this, with a plpgsql implementation of reverse, and it works nicely. CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS ' DECLARE original alias for $1; reverse_str text; i int4; BEGIN reverse_str = ; FOR i IN REVERSE LENGTH(original)..1 LOOP reverse_str = reverse_str || substr(original,i,1); END LOOP; return reverse_str; END;' LANGUAGE 'plpgsql' IMMUTABLE; Someone will no doubt suggest using tsearch2, and you might want to take a look at it if you actually need full-text search, but my experience has been that it's too slow to be useful in production, and it's not needed for the simple "leading wildcard" case. Cheers, Steve ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql and xeon.
On Mon, May 30, 2005 at 09:19:40AM -0700, Josh Berkus wrote: > Search the archives of this list. This has been discussed ad nauseum. > www.pgsql.ru I must admit I still haven't really understood it -- I know that it appears on multiple operating systems, on multiple architectures, but most with Xeon CPUs, and that it's probably related to the poor memory bandwidth between the CPUs, but that's about it. I've read the threads I could find on the list archives, but I've yet to see somebody pinpoint exactly what in PostgreSQL is causing this. Last time someone claimed this was bascially understood and "just a lot of work to fix", I asked for pointers to a more detailed analysis, but nobody answered. Care to explain? :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] timestamp indexing
[Michael Fuhr - Mon at 07:54:29AM -0600] > The message subject is "timestamp indexing" but you don't mention > whether you have an index on the timestamp column. Do you? Yes. Sorry for not beeing explicit on that. > Could you post an example query and its EXPLAIN ANALYZE output? If > the query uses a sequential scan then it might also be useful to see > the EXPLAIN ANALYZE output with enable_seqscan turned off. Since > caching can cause a query to be significantly faster after being run > several times, it might be a good idea to run EXPLAIN ANALYZE three > times and post the output of the last run -- that should put the > queries under comparison on a somewhat equal footing (i.e., we don't > want to be misled about how much faster one query is than another > simply because one query happened to use more cached data on a > particular run). The actual statement was with 6 or 7 joins and very lengthy. I reduced it to a simple single join query which still did a sequential scan rather than an index scan (as expected), and I believe I already did a follow-up mail including "explain analyze". All "explain analyze" in my previous mail was run until the resulting execution time had stabilized, relatively. I will try with "set enable_seqscan off" when I get back to the office. > How many records are in the tables you're querying? Also answered on in my follow-up. > Are you regularly > vacuuming and analyzing the database or the individual tables? Vacuum is run nightly, and I also did a manual "vacuum analyze table" on the table in question. > Are > any of the tables clustered? If so, on what indexes and how often > are you re-clustering them? Huh? :) > What version of PostgreSQL are you using? Also answered in my follow-up - "not yet pg8" :) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgresql and xeon.
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Steinar H. Gunderson > Sent: 30 mai 2005 12:55 > To: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Postgresql and xeon. > > On Mon, May 30, 2005 at 09:19:40AM -0700, Josh Berkus wrote: > > Search the archives of this list. This has been discussed > ad nauseum. > > www.pgsql.ru > > I must admit I still haven't really understood it -- I know > that it appears on multiple operating systems, on multiple > architectures, but most with Xeon CPUs, and that it's > probably related to the poor memory bandwidth between the > CPUs, but that's about it. I've read the threads I could find > on the list archives, but I've yet to see somebody pinpoint > exactly what in PostgreSQL is causing this. > > Last time someone claimed this was bascially understood and > "just a lot of work to fix", I asked for pointers to a more > detailed analysis, but nobody answered. Care to explain? :-) Same here archives references are just overview but no real data to where and why, i would state pg 7.4.8 and kernel 2.6 with preemptive scheduler and dual xeon 3.2 ghz 6 gig of ram. Eric Lauzon [Recherche & Développement] Above Sécurité / Above Security Tél : (450) 430-8166 Fax : (450) 430-1858 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Postgresql and xeon.
"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes: > I must admit I still haven't really understood it -- I know that it appears > on multiple operating systems, on multiple architectures, but most with Xeon > CPUs, and that it's probably related to the poor memory bandwidth between the > CPUs, but that's about it. I've read the threads I could find on the list > archives, but I've yet to see somebody pinpoint exactly what in PostgreSQL is > causing this. The problem appears to be that heavy contention for a spinlock is extremely expensive on multiprocessor Xeons --- apparently, the CPUs waste tremendous amounts of time passing around exclusive ownership of the memory cache line containing the spinlock. While any SMP system is likely to have some issues here, the Xeons seem to be particularly bad at it. In the case that was discussed extensively last spring, the lock that was causing the problem was the BufMgrLock. Since 8.0 we've rewritten the buffer manager in hopes of reducing contention, but I don't know if the problem is really gone or not. The buffer manager is hardly the only place with the potential for heavy contention... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] timestamp indexing
Tobias Brox <[EMAIL PROTECTED]> writes: >> What version of PostgreSQL are you using? > Also answered in my follow-up - "not yet pg8" :) Your followup hasn't shown up here yet, but if the query is written like WHERE timestampcol >= now() - interval 'something' then the pre-8.0 planner is not capable of making a good estimate of the selectivity of the WHERE clause. One solution is to fold the timestamp computation to a constant on the client side. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] poor performance involving a small table
Hi: I have a table called sensors: Table "public.sensor" Column | Type |Modifiers -+--+- sensor_id | integer | not null default nextval('sensor_id_seq'::text) sensor_model_id | integer | not null serial_number | character varying(50)| not null purchase_date | timestamp with time zone | not null variable_id | integer | not null datalink_id | integer | not null commentary | text | Indexes: "sensor_pkey" PRIMARY KEY, btree (sensor_id) Foreign-key constraints: "datalink_id_exists" FOREIGN KEY (datalink_id) REFERENCES datalink(datalink_id) ON DELETE RESTRICT "sensor_model_id_exists" FOREIGN KEY (sensor_model_id) REFERENCES sensor_model(sensor_model_id) ON DELETE RESTRICT "variable_id_exists" FOREIGN KEY (variable_id) REFERENCES variable(variable_id) ON DELETE RESTRICT Currently, it has only 19 rows. But when I try to delete a row, it takes forever. I tried restarting the server. I tried a full vacuum to no avail. I tried the following: explain analyze delete from sensor where sensor_id = 12; QUERY PLAN Seq Scan on sensor (cost=0.00..1.25 rows=1 width=6) (actual time=0.055..0.068 rows=1 loops=1) Filter: (sensor_id = 12) Total runtime: 801641.333 ms (3 rows) Can anybody help me out? Thanks so much! ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] tuning
hi- i would like to see if someone could recommend something to make my query run faster. System specs: PostgreSQL 7.4.2 on RedHat 9 dual AMD Athlon 2GHz processors 1 gig memory mirrored 7200 RPM IDE disks Values in postgresql.conf: shared_buffers = 1000 sort_mem is commented out effective_cache_size is commented out random_page_cost is commented out Relevant tables: product --- id serial productlistid integer vendorid integer item varchar(32) descrip varchar(256) price double vendor -- id serial vendorname varchar(64) A view i made in order to easily retrieve the vendor name: create view productvendorview as select p.id, p.productlistid, v.vendorname, p.item, p.descrip, p.price from product p, vendor v where p.vendorid = v.id; Here are some indices i have created: create index product_plid on product (productlistid); create index product_plidloweritem on product (productlistid, lower(item) varchar_pattern_ops); create index product_plidlowerdescrip on product (productlistid, lower(descrip) varchar_pattern_ops); Here is the query in question: select * from productvendorview where (productlistid=3 or productlistid=5 or productlistid=4) and (lower(item) like '9229%' or lower(descrip) like 'toner%') order by vendorname,item limit 100; This query scans 412,457 records. Here is the EXPLAIN ANALYZE for the query: Limit (cost=45718.83..45719.08 rows=100 width=108) (actual time=39093.636..39093.708 rows=100 loops=1) -> Sort (cost=45718.83..45727.48 rows=3458 width=108) (actual time=39093.629..39093.655 rows=100 loops=1) Sort Key: v.vendorname, p.item -> Hash Join (cost=22.50..45515.57 rows=3458 width=108) (actual time=95.490..39062.927 rows=2440 loops=1) Hash Cond: ("outer".vendorid = "inner".id) -> Seq Scan on test p (cost=0.00..45432.57 rows=3457 width=62) (actual time=89.066..39041.654 rows=2444 loops=1) Filter: (((productlistid = 3) OR (productlistid = 5) OR (productlistid = 4)) AND ((lower((item)::text) ~~ '9229%'::text) OR (lower((descrip)::text) ~~ 'toner%'::text))) -> Hash (cost=20.00..20.00 rows=1000 width=54) (actual time=6.289..6.289 rows=0 loops=1) -> Seq Scan on vendor v (cost=0.00..20.00 rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1) Total runtime: 39094.713 ms (10 rows) Thanks! -Clark ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Need help to decide Mysql vs Postgres
In article <[EMAIL PROTECTED]>, Josh Berkus wrote: >So transaction integrity is not a real concern? I know of all too many people that consider that to be true. They simply don't understand the problem. -- http://www.spinics.net/linux/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Specific query performance problem help requested - postgresql 7.4
I am seeing vastly different performance characteristics for almost the exact same query. Can someone help me break this down and figure out why the one query takes so much longer than the other? Looking at the explain analyze output, I see that the loops value on the innermost index scan when bucket = 3 is way out of wack with the others. Here's the query...the only thing that changes from run to run is the bucket number. For some strange reason the id and bucket types are bigint although they do not need to be. Shared buffers is 48000 sort_mem is 32767 This is on 7.4.2 I'm seeing the same thing on 7.4.7 as well. explain analyze select t0.filename, t2.filename as parentname, t0.st_atime, t0.size, t0.ownernameid, t0.filetypeid, t0.groupnameid, t0.groupnameid, t0.id, t0.filename from Nodes_215335885080_1114059806 as t0 inner join fileftypebkt_215335885080_1114059806 as t1 on t0.id=t1.fileid inner join dirs_215335885080_1114059806 as t2 on t0.parentnameid=t2.filenameid where t1.bucket=3 order by t0.filename asc offset 0 limit 25 Here's the bucket distribution..i have clustered the index on the bucket value. bucket | count +- 9 | 13420 8 | 274053 7 | 2187261 6 |1395 5 | 45570 4 | 2218830 3 | 16940 2 | 818405 1 |4092 (9 rows) And the explain analyzes for bucket values of 3 7 and 8 QUERY PLAN Limit (cost=0.00..18730.19 rows=25 width=112) (actual time=89995.190..400863.350 rows=25 loops=1) -> Nested Loop (cost=0.00..48333634.41 rows=64513 width=112) (actual time=89995.172..400863.043 rows=25 loops=1) -> Nested Loop (cost=0.00..47944899.32 rows=64513 width=69) (actual time=89971.894..400484.701 rows=25 loops=1) -> Index Scan using xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806 t0 (cost=0.00..19090075.03 rows=4790475 width=69) (actual time=0.074..319084.540 rows=713193 loops=1) -> Index Scan using xfileftypebkt_215335885080_1114059806_fileid on fileftypebkt_215335885080_1114059806 t1 (cost=0.00..6.01 rows=1 width=8) (actual time=0.101..0.101 rows=0 loops=713193) Index Cond: ("outer".id = t1.fileid) Filter: (bucket = 3) -> Index Scan using xdirs_215335885080_1114059806_filenameid on dirs_215335885080_1114059806 t2 (cost=0.00..6.01 rows=1 width=59) (actual time=15.096..15.103 rows=1 loops=25) Index Cond: ("outer".parentnameid = t2.filenameid) Total runtime: 400863.747 ms (10 rows) QUERY PLAN -- Limit (cost=0.00..785.15 rows=25 width=112) (actual time=173.935..552.075 rows=25 loops=1) -> Nested Loop (cost=0.00..59327691.44 rows=1889045 width=112) (actual time=173.917..551.763 rows=25 loops=1) -> Nested Loop (cost=0.00..47944899.32 rows=1889045 width=69) (actual time=151.198..303.463 rows=25 loops=1) -> Index Scan using xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806 t0 (cost=0.00..19090075.03 rows=4790475 width=69) (actual time=0.225..82.328 rows=6930 loops=1) -> Index Scan using xfileftypebkt_215335885080_1114059806_fileid on fileftypebkt_215335885080_1114059806 t1 (cost=0.00..6.01 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=6930) Index Cond: ("outer".id = t1.fileid) Filter: (bucket = 7) -> Index Scan using xdirs_215335885080_1114059806_filenameid on dirs_215335885080_1114059806 t2 (cost=0.00..6.01 rows=1 width=59) (actual time=9.894..9.901 rows=1 loops=25) Index Cond: ("outer".parentnameid = t2.filenameid) Total runtime: 552.519 ms (10 rows) QUERY PLAN --- Limit (cost=0.00..18730.19 rows=25 width=112) (actual time=81.271..330.404 rows=25 loops=1) -> Nested Loop (cost=0.00..48333634.41 rows=64513 width=112) (actual time=81.254..330.107 rows=25 loops=1) -> Nested Loop (cost=0.00..47944899.32 rows=64513 width=69) (actual time=4.863..8.164 rows=25 loops=1) -> Index Scan using xnodes_215335885080_1114059806_filename on nodes_215335885080_1114059806 t0 (cost=0.00..19090075.03 rows=4790475 width=69) (actual time=0.204..2.576 rows=75 loops=1) -> Index Scan using xfileftypebkt_215335885080_1114059806_fileid on fileftypebkt_215335885080_11140
Re: [PERFORM] poor performance involving a small table
Colton A Smith wrote: Hi: I have a table called sensors: Table "public.sensor" Column | Type |Modifiers -+--+- sensor_id | integer | not null default nextval('sensor_id_seq'::text) sensor_model_id | integer | not null serial_number | character varying(50)| not null purchase_date | timestamp with time zone | not null variable_id | integer | not null datalink_id | integer | not null commentary | text | Indexes: "sensor_pkey" PRIMARY KEY, btree (sensor_id) Foreign-key constraints: "datalink_id_exists" FOREIGN KEY (datalink_id) REFERENCES datalink(datalink_id) ON DELETE RESTRICT "sensor_model_id_exists" FOREIGN KEY (sensor_model_id) REFERENCES sensor_model(sensor_model_id) ON DELETE RESTRICT "variable_id_exists" FOREIGN KEY (variable_id) REFERENCES variable(variable_id) ON DELETE RESTRICT Currently, it has only 19 rows. But when I try to delete a row, it takes forever. I tried restarting the server. I tried a full vacuum to no avail. I tried the following: explain analyze delete from sensor where sensor_id = 12; QUERY PLAN Seq Scan on sensor (cost=0.00..1.25 rows=1 width=6) (actual time=0.055..0.068 rows=1 loops=1) Filter: (sensor_id = 12) Total runtime: 801641.333 ms (3 rows) Can anybody help me out? Thanks so much! I'd say the obvious issue would be your foreign keys slowing things down. Have you analyzed the referenced tables, and indexed the columns on the referenced tables? -- ___ This e-mail may be privileged and/or confidential, and the sender does not waive any related rights and obligations. Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized. If you received this e-mail in error, please advise me (by return e-mail or otherwise) immediately. ___ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] Drop / create indexes and vacuumdb
Hi, Does it make a difference in performance and/or disc space if I 1) drop index / vacuumdb -zf / create index or 2) drop index / create index / vacuumdb -zf I guess it makes a diff for the --analyze, not ? Met vriendelijke groeten, Bien à vous, Kind regards, Yves Vindevogel Implements <> Mail: [EMAIL PROTECTED] - Mobile: +32 (478) 80 82 91 Kempische Steenweg 206 - 3500 Hasselt - Tel-Fax: +32 (11) 43 55 76 Web: http://www.implements.be First they ignore you. Then they laugh at you. Then they fight you. Then you win. Mahatma Ghandi. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] tuning
list wrote: hi- i would like to see if someone could recommend something to make my query run faster. Values in postgresql.conf: shared_buffers = 1000 sort_mem is commented out effective_cache_size is commented out random_page_cost is commented out I would increase shared_buffers (say 5000 - 1), and also effective_cache_size (say around 2 - 5 - but work out how much memory this box has free or cached and adjust accordingly). From your explain output, it looks like sorting is not too much of a problem - so you can leave it unchanged (for this query anyway). Here is the query in question: select * from productvendorview where (productlistid=3 or productlistid=5 or productlistid=4) and (lower(item) like '9229%' or lower(descrip) like 'toner%') order by vendorname,item limit 100; You might want to break this into 2 queries and union them, so you can (potentially) use the indexes on productlistid,lower(item) and productlistid, lower(descrip) separately. This query scans 412,457 records. Here is the EXPLAIN ANALYZE for the query: Limit (cost=45718.83..45719.08 rows=100 width=108) (actual time=39093.636..39093.708 rows=100 loops=1) -> Sort (cost=45718.83..45727.48 rows=3458 width=108) (actual time=39093.629..39093.655 rows=100 loops=1) Sort Key: v.vendorname, p.item -> Hash Join (cost=22.50..45515.57 rows=3458 width=108) (actual time=95.490..39062.927 rows=2440 loops=1) Hash Cond: ("outer".vendorid = "inner".id) -> Seq Scan on test p (cost=0.00..45432.57 rows=3457 width=62) (actual time=89.066..39041.654 rows=2444 loops=1) Filter: (((productlistid = 3) OR (productlistid = 5) OR (productlistid = 4)) AND ((lower((item)::text) ~~ '9229%'::text) OR (lower((descrip)::text) ~~ 'toner%'::text))) -> Hash (cost=20.00..20.00 rows=1000 width=54) (actual time=6.289..6.289 rows=0 loops=1) -> Seq Scan on vendor v (cost=0.00..20.00 rows=1000 width=54) (actual time=0.060..3.653 rows=2797 loops=1) Total runtime: 39094.713 ms (10 rows) I guess the relation 'test' is a copy of product (?) Cheers Mark ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] poor performance involving a small table
Seq Scan on sensor (cost=0.00..1.25 rows=1 width=6) (actual time=0.055..0.068 rows=1 loops=1) Filter: (sensor_id = 12) Total runtime: 801641.333 ms (3 rows) Can anybody help me out? Thanks so much! Does your table have millions of dead rows? Do you vacuum once an hour? Run VACUUM FULL ANALYE sensor; Chris ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] timestamp indexing
[Tom Lane - Mon at 01:57:54PM -0400] > Your followup hasn't shown up here yet, I'll check up on that and resend it. > but if the query is written like > WHERE timestampcol >= now() - interval 'something' > then the pre-8.0 planner is not capable of making a good estimate of the > selectivity of the WHERE clause. > One solution is to fold the timestamp > computation to a constant on the client side. I don't think there are any of that in the production; we always make the timestamps on the client side. As to my original problem, I looked up on table clustering on google. Right, for report performance, we store some aggregates in the table which are updated several times. If I've understood it correctly, the row will physically be moved to the tail of the table every time the attribute is updated. I understand that it may make sense to do a full table scan if a random 10% of the rows should be selected. Forcing the usage of the index caused a tiny improvement of performance, but only after running it some few times to be sure the index got buffered :-) -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] timestamp indexing
[Tobias Brox - Tue at 10:06:25AM +0800] > [Tom Lane - Mon at 01:57:54PM -0400] > > Your followup hasn't shown up here yet, > > I'll check up on that and resend it. Hrm ... messed-up mail configuration I suppose. Here we go: Paul McGarry unintentionally sent a request for more details off the list, since it was intended for the list I'll send my reply here. While writing up the reply, and doing research, I discovered that this is not a problem with indexing timestamps per se, but more with a query of the kind "give me 5% of the table"; it seems like it will often prefer to do a full table scan instead of going via the index. I think that when I had my university courses on databases, we also learned about flat indexes, where the whole index has to be rebuilt whenever a field is updated or inserted in the middle, and I also think we learned that the table usually would be sorted physically by the primary key on the disk. As long as we have strictly incrementing primary keys and timestamps, such a setup would probably be more efficient for queries of the kind "give me all activity for the last two weeks"? Here follows my reply to Paul, including some gory details: [Paul McGarry - Mon at 07:59:35PM +1000] > What version of postgresql are you using and what are the exact > datatypes and queries? We are still using 7.4.6, but I suppose that if our issues are completely or partially solved in pg 8, that would make a good case for upgrading :-) The datatypes I'm indexing are timestamp without time zone. Actually I may be on the wrong hunting ground now - the production system froze completely some days ago basically due to heavy iowait and load on the database server, rendering postgresql completely unresponsive - and back then we had too poor logging to find out what queries that was causing it to grind to a halt, and since we've never such a bad problem before, we didn't know how to handle the situation (we just restarted the entire postgresql; if we had been just killing the processes running the rogue database queries, we would have had very good tracks of it in the logs). I digress. The last days I've looked through profiling logs, and I'm checking if the accumulatively worst queries can be tuned somehow. Most of them are big joins, but I'm a bit concerned of the amounts of "Seq Scan" returned by "explain" despite the fact that only a small fraction of the tables are queried. I reduced the problem to a simple "select * from table where created>xxx" and discovered that it still won't use index, and still will be costly (though of course not much compared to the big joined query). The "ticket" table have less than a million rows, around 50k made the last ten days: NBET=> explain analyze select * from ticket where created>'2005-05-20'; QUERY PLAN -- Seq Scan on ticket (cost=0.00..19819.91 rows=89553 width=60) (actual time=535.884..1018.268 rows=53060 loops=1) Filter: (created > '2005-05-20 00:00:00'::timestamp without time zone) Total runtime: 1069.514 ms (3 rows) Anyway, it seems to me that "indexing on timestamp" is not the real issue here, because when restricting by primary key (numeric, sequential ID) the execution time is the same or worse, still doing a sequence scan: NBET=> explain analyze select * from ticket where id>711167; QUERY PLAN -- Seq Scan on ticket (cost=0.00..19819.91 rows=92273 width=60) (actual time=550.855..1059.843 rows=53205 loops=1) Filter: (id > 711167) Total runtime: 1110.469 ms (3 rows) I've tried running equivalent queries on a table with twice as many rows and width=180, it will pull from the index both when querying by ID and timestamp, and it will usually spend less time. Running "select * from ticket" seems to execute ~2x slower than when having the restriction. > I have a 7.3 database with a "timestamp with time zone" field and we > have to be very careful to explicitly cast values as that in queries > if it is to use the index correctly. I believe it's an issue that is > cleared up in newer versions though. I suppose so - as said, restricting by primary key didn't improve the performance significantly, so I was clearly wrong indicating that this is a special issue with indexing a timestamp. -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] poor performance involving a small table
Colton A Smith <[EMAIL PROTECTED]> wrote .. > Seq Scan on sensor (cost=0.00..1.25 rows=1 width=6) (actual > time=0.055..0.068 rows=1 loops=1) > Filter: (sensor_id = 12) > Total runtime: 801641.333 ms > (3 rows) Do you have some foreign keys pointing in the other direction? In other words, is there another table such that a delete on sensors causing a delete (or a check of some key) in another table? EXPLAIN doesn't show these. And that might be a big table missing an index. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Index on a NULL-value
I read in the manual today: Indexes are not used for IS NULL clauses by default. The best way to use indexes in such cases is to create a partial index using an IS NULL predicate. This is from the documentation for PostgreSQL 8. I did not find anything equivalent in the 7.4.8-documentation. I wasn't aware of this until it became an issue :-) Well, so I follow the tip but in vain. Reduced and reproduced like this in PostgreSQL 7.4.7: test=# create table mock(a int, b int); CREATE TABLE test=# create index b_is_null on mock((b IS NULL)); CREATE INDEX test=# insert into mock values (10,20); INSERT 70385040 1 test=# insert into mock values (20,30); INSERT 70385041 1 test=# insert into mock values (30, NULL); INSERT 70385042 1 test=# set enable_seqscan=off; SET test=# explain select * from mock where b is NULL; QUERY PLAN Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) Filter: (b IS NULL) (2 rows) vacuum analyze also didn't help to recognize the index ;-) Any tips? Rewrite the application to not use NULL-values? Hide under bedclothes and hope the problem goes away? Install more memory in the server? :-) -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:02:07 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > I read in the manual today: > > Indexes are not used for IS NULL clauses by default. The best way to use > indexes in such cases is to create a partial index using an IS NULL > predicate. > > This is from the documentation for PostgreSQL 8. I did not find anything > equivalent in the 7.4.8-documentation. > > I wasn't aware of this until it became an issue :-) Well, so I follow the > tip but in vain. Reduced and reproduced like this in PostgreSQL 7.4.7: > > test=# create table mock(a int, b int); > CREATE TABLE > test=# create index b_is_null on mock((b IS NULL)); > CREATE INDEX > test=# insert into mock values (10,20); > INSERT 70385040 1 > test=# insert into mock values (20,30); > INSERT 70385041 1 > test=# insert into mock values (30, NULL); > INSERT 70385042 1 > test=# set enable_seqscan=off; > SET > test=# explain select * from mock where b is NULL; > QUERY PLAN > > Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) >Filter: (b IS NULL) > (2 rows) > > vacuum analyze also didn't help to recognize the index ;-) It isn't surprising that an index wasn't used since a sequential scan is going to be faster in your test case. If you want to test this out, you to want use realistically sized tables. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index on a NULL-value
[Tobias Brox - Tue at 11:02:07AM +0800] > test=# explain select * from mock where b is NULL; > QUERY PLAN > > Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) >Filter: (b IS NULL) > (2 rows) (...) > ---(end of broadcast)--- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match That tip helped me :-) test=# explain select * from mock where (b IS NULL)=true; QUERY PLAN -- Index Scan using b_is_null on mock (cost=0.00..4.68 rows=1 width=8) Index Cond: ((b IS NULL) = true) (2 rows) -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index on a NULL-value
[Tobias Brox] > test=# set enable_seqscan=off; [Bruno Wolff III - Mon at 10:16:53PM -0500] > It isn't surprising that an index wasn't used since a sequential scan is > going to be faster in your test case. > > If you want to test this out, you to want use realistically sized tables. Wrong. In this case I was not wondering about the planners choise of not using the index, but the fact that the planner could not find the index at all. Reproducing it on a simple table in a test environment was a valid strategy to solve this specific problem. -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index on a NULL-value
[Bruno Wolff III - Mon at 10:36:33PM -0500] > You want something like this: > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; Oh, cool. I wasn't aware that this is possible. This would probably help us a lot of places. :-) -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:21:20 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > [Tobias Brox - Tue at 11:02:07AM +0800] > > test=# explain select * from mock where b is NULL; > > QUERY PLAN > > > > Seq Scan on mock (cost=1.00..10020.00 rows=6 width=8) > >Filter: (b IS NULL) > > (2 rows) > > (...) > > > ---(end of broadcast)--- > > TIP 9: the planner will ignore your desire to choose an index scan if your > > joining column's datatypes do not match > > That tip helped me :-) > > test=# explain select * from mock where (b IS NULL)=true; > QUERY PLAN > > -- > Index Scan using b_is_null on mock (cost=0.00..4.68 rows=1 width=8) >Index Cond: ((b IS NULL) = true) > (2 rows) Looked back at your first example and saw that you didn't use a partial index which is why you had to contort things to make it possible to use an indexed search. (Though the planner really should have done this since all of the rows should be in one disk block and doing an index scan should require doing more disk reads than a sequential scan for the test case you used.) You want something like this: CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; The advantage is that the index can be a lot smaller than an index over all of the rows in the case where only a small fraction of rows have a null value for b. (If this isn't the case you probably don't want the index.) ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index on a NULL-value
On Tue, May 31, 2005 at 11:31:58 +0800, Tobias Brox <[EMAIL PROTECTED]> wrote: > [Tobias Brox] > > test=# set enable_seqscan=off; > > [Bruno Wolff III - Mon at 10:16:53PM -0500] > > It isn't surprising that an index wasn't used since a sequential scan is > > going to be faster in your test case. > > > > If you want to test this out, you to want use realistically sized tables. > > Wrong. In this case I was not wondering about the planners choise of not > using the index, but the fact that the planner could not find the index at > all. Reproducing it on a simple table in a test environment was a valid > strategy to solve this specific problem. I missed that you turned sequential scans off for your test. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index on a NULL-value
Bruno Wolff III <[EMAIL PROTECTED]> writes: > Looked back at your first example and saw that you didn't use a partial > index which is why you had to contort things to make it possible to > use an indexed search. FWIW, there is code in CVS tip that recognizes the connection between an index on a boolean expression and a WHERE clause testing that expression. It's not quite perfect --- using Tobias' example I see regression=# explain select * from mock where b is NULL; QUERY PLAN Index Scan using b_is_null on mock (cost=0.00..51.67 rows=10 width=8) Index Cond: ((b IS NULL) = true) Filter: (b IS NULL) (3 rows) so there's a useless filter condition still being generated. But it gets the job done as far as using the index, anyway. > You want something like this: > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; I think best practice for something like this is to make the partial index's columns be something different from what the partial condition tests. Done as above, every actual index entry will be a null, so the entry contents are just dead weight. Instead do, say, CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL; where a is chosen as a column that you frequently also test in conjunction with "b IS NULL". That is, the above index can efficiently handle queries like ... WHERE a = 42 AND b IS NULL ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Index on a NULL-value
Tobias Brox <[EMAIL PROTECTED]> writes: > [Bruno Wolff III - Mon at 10:36:33PM -0500] > > You want something like this: > > CREATE INDEX b_is_null ON mock(b) WHERE b IS NULL; > > Oh, cool. I wasn't aware that this is possible. This would probably help > us a lot of places. :-) Yeah it's a cool feature. I'm not 100% sure but I think it still won't consider this index unless the column being indexed is used in some indexable operation. So for example if you had CREATE INDEX b_null on mock(other) WHERE b IS NULL; and something like SELECT * FROM b WHERE b IS NULL ORDER BY other or SELECT * FROM b where other > 0 AND b IS NULL then it would be a candidate because the ORDER BY or the other > 0 make the index look relevant. But I don't think (again I'm not 100% sure) that the partial index WHERE clause is considered in picking which indexes to consider. It *is* considered in evaluating which index is the best one to use and whether it's better than a sequential scan. Just not in the initial choice of which indexes to look at at all. -- greg ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Index on a NULL-value
Greg Stark <[EMAIL PROTECTED]> writes: > then it would be a candidate because the ORDER BY or the other > 0 make the > index look relevant. But I don't think (again I'm not 100% sure) that the > partial index WHERE clause is considered in picking which indexes to consider. Nope, the partial index will be considered simply on the strength of its predicate matching the WHERE clause. Of course, if you can get some additional mileage by having the index contents be useful, that's great --- but it's not necessary. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Index on a NULL-value
> CREATE INDEX b_is_null ON mock(a) WHERE b IS NULL; > > where a is chosen as a column that you frequently also test in > conjunction with "b IS NULL". That is, the above index can efficiently > handle queries like > > ... WHERE a = 42 AND b IS NULL ... This is wonderful, it seems like most of our problems (probably also regarding the "index on timestamp"-thread I started separately) can be solved with partial indexing on expressions. No need to hide under bedclothes anymore ;-) -- Tobias Brox, Beijing ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])