Re: [PERFORM] Performance problems with large telemetric datasets on 7.4.2
On mán, 2007-08-06 at 00:10 -0700, Sven Clement wrote: 2007/8/5, Heikki Linnakangas [EMAIL PROTECTED]: I don't remember a bug like that. Where did you read that from? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com Partially I found that one in the PostgreSQL Documentation for the 7.x.x versions under the command REINDEX where they claim that you should run a reindex under certain circumstances and for my comprehension this says that with some access pattern (as ours (major writes / one big delete per day)) the index may be corrupted or otherwise not really useful. you are probably talking about index bloat, not corruption. when that happens, the index consumes more space that needed, and its effectivity is reduced, but it is not corrupted and does not cause wrong results. i believe this is a lot less common now than in the 7.x days gnari ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: RES: [PERFORM] select on 1milion register = 6s
On lau, 2007-07-28 at 17:12 -0300, Bruno Rodrigues Siqueira wrote: where to_char( data_encerramento ,'-mm') between '2006-12' and '2007-01' assuming data_encerramento is a date column, try: WHERE data_encerramento between '2006-12-01' and '2007-01-31' gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] determining maxsize for character varying
On lau, 2007-06-16 at 13:35 +0200, [EMAIL PROTECTED] wrote: Thanks if i understand well that means that if i choose character varying(3) or character varying(8) or character varying(32) or character varying with no max length the fields will take the same place in the disk (8kb) except for fields too long to take place in the 8kb whose are stored in another place ? Is that correct ? not at all a varchar will occupy the bytelength of your actual string, + a small fixed overhead+padding, except when the total rowsize causes TOASTing in single-byte encodings, the string 'okparanoid' will occupy the same amount of diskspace in a varchar, varchar(10) or a varchar(1000) column, namely around 16 bytes. hope this helps gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Nested Loop
On þri, 2007-03-27 at 16:13 +0530, Gauri Kanekar wrote: SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS campname, rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS advid, ra.name AS advname, rpt_chn.id AS chanid, rpt_chn.name AS channame, rpt_cre.dn AS dn, SUM(rm.imdel) AS impression, SUM(rm.cdel) AS click, rd.sqldate AS date FROM rm, rn CROSS JOIN rd, ra, rs, rc, rpt_chn, rpt_cre WHERE rm.date_key = rd.key AND rm.net_key = rn.key AND rm.adv_key = ra.key AND rm.camp_key = rc.key AND rm.s_key = rs.key AND rm.chn_key = rpt_chn.key AND rm.cre_key = rpt_cre.key AND ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND '12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate , rs.id, rs.name, ra.id, ra.name, rc.id, rc.name, rc.rev_type , rc.act_type, rpt_chn.id, rpt_chn.name, rpt_cre.dn; you did not answer other questions, so do this: 1) VACUUM ANALYZE your database 2) set these in your postgresql.conf: enable_seqscan = true join_collapse_limit = 8 3) restart postgresql 4) do the EXPLAIN ANALYZE again, and send us it's output gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Nested Loop
On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote: you did not show your query, nor did you answer whather you had vacuumed and analyzed. enable_seqscan = off why this? this is unlikely to help QUERY PLAN ... - Nested Loop (cost=0.00..1104714.83 rows=6801 width=44) (actual time=1820.153..229779.814 rows=10945938 loops=1) the estimates are way off here. you sure you have analyzed? gnari ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On þri, 2007-01-02 at 09:04 -0500, Geoffrey wrote: Alvaro Herrera wrote: Actually it has been suggested that a combination of ext2 (for WAL) and ext3 (for data, with data journalling disabled) is a good performer. AFAIK you don't want the overhead of journalling for the WAL partition. I'm curious as to why ext3 for data with journalling disabled? Would that not be the same as ext2? I believe Alvaro was referring to ext3 with journalling enabled for meta-data, but not for data. I also believe this is the standard ext3 configuration, but I could be wrong on that. gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] max_fsm_pages and check_points
On mið, 2006-12-20 at 05:31 +, ALVARO ARCILA wrote: HI, I've looking around the log files of my server and lately they indicate that I should consider increase the check_point segments because they're beeing reading too often and also recommend increasing the max_fsm_pages over 169728... if this has been happening for some time, some tables might possibly have become bloated with dead rows, so a one-time VACUUM FULL or CLUSTER on these might be indicated to speed up reaching the steady state. I think the max_fsm_pages is a minimum recommendation, so you might want to look at VACUUM VERBOSE output after setting it, to see if an even higher value is indicated those are the config values present in the postgresql.conf shared_buffers = 1000 work_mem = 8192 if you have got a lot of memory, you might want to experiment with these a little gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow update with simple query
On mið, 2006-12-13 at 11:51 +0100, Arnaud Lesauvage wrote: Hi list ! I am running a query to update the boolean field of a table based on another table's fields. The query is (changed names for readability): UPDATE t1 SET booleanfield = (t2.field1 IN ('some', 'other') AND t2.field2 = 'Y') FROM t2 WHERE t1.uid = t2.uid t2.uid is the PRIMARY KEY. t2 only has ~1000 rows, so I think it fits fully in memory. t1 as ~2.000.000 rows. There is an index on t1.uid also. The explain (sorry, not explain analyze available yet) is : Hash Join (cost=112.75..307410.10 rows=2019448 width=357) Hash Cond: (outer.uid= inner.uid) - Seq Scan on t1 (cost=0.00..261792.01 rows=2033001 width=340) - Hash (cost=110.20..110.20 rows=1020 width=53) - Seq Scan on t2 (cost=0.00..110.20 rows=1020 width=53) My query has been running for more than 1.5 hour now, and it is still running. Nothing else is running on the server. There are two multicolumn-indexes on this column (both are 3-columns indexes). One of them has a functional column (date_trunc('month', datefield)). Do you think the problem is with the indexes ? I guess so. are you sure about the index on t1.uid? what are the column definitions for t1.uid and t2.uid ? are they the same ? you should ba able to get a plan similar to: Merge Join (cost=0.00..43.56 rows=1000 width=11) Merge Cond: (outer.uid = inner.uid) - Index Scan using t1i on t1 (cost=0.00..38298.39 rows=235 width=10) - Index Scan using t2i on t2 (cost=0.00..26.73 rows=1000 width=5) what postgres version are you using ? gnari The hardware is not great, but the database is on a RAID1 array, so its not bad either. I am surprised that it takes more than 3 seconds per row to be updated. Thanks for your opinion on this ! -- Arnaud ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update with simple query
On mið, 2006-12-13 at 14:38 +0100, Arnaud Lesauvage wrote: Jens Schipkowski a écrit : On Wed, 13 Dec 2006 13:23:41 +0100, Arnaud Lesauvage [EMAIL PROTECTED] Why is this query better than the other one ? Because it runs the (field IN ('some','other') AND field2 = 'Y') once and then executes the join with the resulting set ? True. The Subselect in FROM clause will be executed once and will be joined using the condition at where clause. So your condition at t2 is not executed for each row in t1(2mio records) but for each row in t2(1k records). And the boolean value is already set during update. OK Jens, thanks for clarifying this. I thought the planner could guess what to do in such cases. don't worry, it will. this is not your problem gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] strange query behavior
On mið, 2006-12-13 at 13:42 -0500, Tim Jones wrote: I have a query that uses an IN clause and it seems in perform great when there is more than two values in it but if there is only one it is really slow. Also if I change the query to use an = instead of IN in the case of only one value it is still slow. Possibly I need to reindex this particular index? can you provide us with an EXPLAIN ANALYZE for these 2 cases? what version pg is this? does this happen only for a particular single value, or for any values? I assume you have ANALYZEd the table in question. gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SQL_CALC_FOUND_ROWS in POSTGRESQL / Some one can
On mán, 2006-12-11 at 17:01 +1100, Chris wrote: Mark Kirkwood wrote: Chris wrote: It's the same as doing a select count(*) type query using the same clauses, but all in one query instead of two. It doesn't return any extra rows on top of the limit query so it's better than using pg_numrows which runs the whole query and returns it to php (in this example). Their docs explain it: http://dev.mysql.com/doc/refman/4.1/en/information-functions.html See FOUND_ROWS() Note that from the same page: If you are using SELECT SQL_CALC_FOUND_ROWS, MySQL must calculate how many rows are in the full result set. However, this is faster than running the query again without LIMIT, because the result set need not be sent to the client. yes but not any faster than a select count(*) from (full query without LIMIT) so the only advantage to the SQL_CALC_FOUND_ROWS thingie is that instead of doing select count(*) from full-query select * from query-with-LIMIT which will do the query twice, but possibly with different optimisations, you would do a non-standard select SQL_CALC_FOUND_ROWS query-with-LIMIT select FOUND_ROWS() which will do one full query, without any LIMIT optimisation, but with the same number of round-trips, and same amount of data over the line. the only case where the second way may be more effective, is if no LIMIT optimisation can be made, and where the dataset is larger than file buffer space, so that there is no effect from caching. gnari ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Split select completes, single select doesn't and
On þri, 2006-05-30 at 10:26 +1000, Anthony Ransley wrote: Can any one explain why the following query select f(q) from ( select * from times where '2006-03-01 00:00:00'=q and q'2006-03-08 00:00:00' order by q ) v; never completes, but splitting up the time span into single days does work. select f(q) from ( select * from times where '2006-03-01 00:00:00'=q and q'2006-03-02 00:00:00' order by q ) v; first question: is f() relevant to your problem? I mean do you see the same effect with: select q from ( select * from times where '2006-03-01 00:00:00'=q and q'2006-03-08 00:00:00' order by q ) v; or even: select q from times where '2006-03-01 00:00:00'=q and q'2006-03-08 00:00:00' order by q if f() is needed to make this happen show us f() if f() is not relevant, show us the simplest cases where you see this. show us EXPLAIN on the query that does not finish, show us EXPLAIN ANALYZE on the queries that do. second question: what indexes exist on the table times ? another question: how many rows in the table ? next question: is the table newly ANALYZED? finally: what version of postgresql are you using? whithout more info , it is difficult to guess what your problem is, but possibly you need to increase the statistics target of column q gnari ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] column totals
On fös, 2006-05-26 at 11:56 +0200, James Neethling wrote: SELECT branch_id, prod_cat_id, sum(prod_profit) as prod_cat_profit FROM () as b1 WHERE x = y GROUP BY branch, prod_cat_id Now, I also need the branch total, effectively, SELECT branch_id, sum(prod_profit) as branch_total FROM () as b1 WHERE x = y GROUP BY branch_id. Since the actual queries for generating prod_profit are non-trivial, how do I combine them to get the following select list? one simple way using temp table and 2 steps: CREATE TEMP TABLE foo AS SELECT branch_id, prod_cat_id, sum(prod_profit) as prod_cat_profit FROM () as b1 WHERE x = y GROUP BY branch, prod_cat_id; SELECT branch_id, prod_cat_id, prod_cat_profit, branch_total FROM foo as foo1 JOIN (SELECT branch_id, sum(prod_cat_profit) as branch_total FROM foo GROUP BY branch_id ) as foo2 USING branch_id; (untested) gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performs WAY better with enable_seqscan = off
On sun, 2006-05-21 at 02:21 -0600, Brendan Duddridge wrote: Hi, I have a query that performs WAY better when I have enable_seqscan = off: explain analyze select ac.attribute_id, la.name, ac.sort_order from attribute_category ac, localized_attribute la where ac.category_id = 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and la.attribute_id = ac.attribute_id and exists ( select 'x' from product_attribute_value pav, category_product cp where (pav.product_id || '.' || pav.attribute_id) = (cp.product_id || '.' || ac.attribute_id) and pav.status_code is null and (cp.category_id || '.' || cp.is_visible) = '1001402.true') order by (ac.sort_order is null), ac.sort_order, la.name asc; is there some reason for the complicated form of the join conditions in the subselect? would this not be clearer: explain analyze select ac.attribute_id, la.name, ac.sort_order from attribute_category ac, localized_attribute la where ac.category_id = 1001402 and la.locale_id = 101 and ac.is_browsable = 'true' and la.attribute_id = ac.attribute_id and exists (select 'x' from product_attribute_value pav, category_product cp where pav.product_id = cp.product_id and pav.attribute_id = ac.attribute_id and pav.status_code is null and cp.category_id= '1001402' and cp.is_visible = 'true' ) order by (ac.sort_order is null), ac.sort_order, la.name asc; possibly the planner would have a better time figuring out if any indexes are usable or estimating the subselect rowcount gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM]
On mán, 2006-04-10 at 10:30 +0200, Doron Baranes wrote: I Attached here a file with details about the tables, the queries and the Explain analyze plans. Hope this can be helpful to analyze my problem first query: explain analyze SELECT date_trunc('hour'::text, i.entry_time) AS datetime, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_time '2006-01-25' AND f.grp_fate_id=fr.grp_fate_id GROUP BY datetime order by datetime; if i.origin has high selectivity (if very few rows in msg_info have origin=1 in this case), an index on msg_info(orgin) can help. unfortunately, as you are using 7.4 and this is a smallint column, you would have to change the query slightly to make use of that: WHERE i.origin = 1::smallint if more than a few % or the rows have this value, then this will not help the index on msg_info(entry_time) is unlikely to be used, because a simple '' comparison has little selectivity. try to add an upper limit to the query to make it easier for the planner so see that few rows would be returned (if that is the case) for example: AND i.entry_time BETWEEN '2006-01-25' AND '2006-05-01' this might also improve the estimated number of groups on datetime (notice: estimated rows=1485233, real=623), although I am not sure if that will help you I do now know how good the planner is with dealing with the date_trunc('hour'::text, i.entry_time), so possibly you could get some improvement with an indexed entry_hour column populated with trigger or by your application, and change your query to: explain analyze SELECT i.entry_hour, COUNT(fr.grp_fate_id) , SUM(i.size) FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr WHERE i.origin = 1 AND i.msgid=f.msgid AND i.entry_hour BETWEEN '2006-01-25:00:00' AND '2006-05-01:00:00' AND f.grp_fate_id=fr.grp_fate_id GROUP BY entry_hour order by entry_hour; (adjust the upper limit to your reality) do these suggestions help at all? gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM]
On sun, 2006-04-09 at 12:47 +0200, Doron Baranes wrote: Hi I am running on postgres 7.4.6 on a pineapp with 512MB RAM. I did a database vacuum analyze and rebuild my indexes. If you have previously done a lot of deletes or updates without regular vacuums, you may have to do a VACUUM FULL ANALYZE once to get the table into normal state. After this, regular normal VACUUM ANALYZE should be enough. When I perform queries on tables of 2M-10M of rows it takes several minutes and We would need to see the output of EXPLAIN ANALYZE for your query, along with some information about the schema of the tables involved, such as what indexes have been created. Also, let us know about any non-default configuration. gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query planner is using wrong index.
On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote: I have a problem with the choice of index made by the query planner. My table looks like this: CREATE TABLE t ( p1 varchar not null, p2 varchar not null, p3 varchar not null, i1 integer, i2 integer, i3 integer, i4 integer, i5 integer, d1 date, d2 date, d3 date, PRIMARY KEY (p1, p2, p3) ); I have also created an index on (p2, p3), as some of my lookups are on these only. All the integers and dates are data values. The table has around 9 million rows. I am using postgresl 7.4.7 I have set statistics to 1000 on the p1, p2 and p3 columns, and run vacuum full analyse. However, I still see query plans like this: ... db=# explain select * from t where p2 = 'fairly_common' and p3 = 'fairly_common'; QUERY PLAN --- Index Scan using p2p3 on t (cost=0.00..6.01 rows=1 width=102) Index Cond: (((p2)::text = 'fairly_common'::text) AND ((p3)::text = 'fairly_common'::text)) (3 rows) please show us an actual EXPLAIN ANALYZE this will show us more. I would like the query planner to use the primary key for all of these lookups. How can I enforce this? How would that help? have you tested to see if it would actualy be better? gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query planner is using wrong index.
On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote: --- Ragnar [EMAIL PROTECTED] wrote: On fim, 2006-04-06 at 12:35 +1000, Brian Herlihy wrote: ... PRIMARY KEY (p1, p2, p3) ... I have also created an index on (p2, p3), as some of my lookups are on these only. ... db=# explain select * from t where p2 = 'fairly_common' and p3 = 'fairly_common'; please show us an actual EXPLAIN ANALYZE I would like the query planner to use the primary key for all of these lookups. have you tested to see if it would actualy be better? Yes, the primary key is far better. I gave it the ultimate test - I dropped the (p2, p3) index. It's blindingly fast when using the PK, I have problems understanding exactly how an index on (p1,p2,p3) can be faster than and index on (p2,p3) for a query not involving p1. can you demonstrate this with actual EXPLAIN ANALYZES ? something like: EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?; BEGIN; DROP INDEX p2p3; EXPLAIN ANALYZE select * from t where p2 = ? and p3 = ?; ROLLBACK; maybe your p2p3 index needs REINDEX ? My options seem to be - Fudge the analysis results so that the selectivity estimate changes. I have tested reducing n_distinct, but this doesn't seem to help. - Combine the columns into one column, allowing postgres to calculate the combined selectivity. - Drop the (p2, p3) index. But I need this for other queries. None of these are good solutions. So I am hoping that there is a better way to go about this! I think we must detemine exactly what the problem is before devising complex solutions gnari ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query planner is using wrong index.
On fös, 2006-04-07 at 00:01 +1000, Brian Herlihy wrote: --- Ragnar [EMAIL PROTECTED] wrote: On fim, 2006-04-06 at 19:27 +1000, Brian Herlihy wrote: Yes, the primary key is far better. I gave it the ultimate test - I dropped the (p2, p3) index. It's blindingly fast when using the PK, I have problems understanding exactly how an index on (p1,p2,p3) can be faster than and index on (p2,p3) for a query not involving p1. db# explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=q=kgs=kls='; this is different from what you said earlier. in your original post you showed a problem query without any reference to p1 in the WHERE clause. this confused me. Index Scan using p2_p3_idx on t (cost=0.00..6.02 rows=1 width=102) (actual time=2793.247..2793.247 rows=0 loops=1) Index Cond: (((p2)::text = 'uk.altavista.com'::text) AND ((p3)::text = 'web/results?itag=q=kgs=kls='::text)) Filter: ((p1)::text = 'a'::text) Total runtime: 2793.303 ms (4 rows) try to add an ORDER BY clause: explain analyze select * from t WHERE p1 = 'a' and p2 = 'uk.altavista.com' AND p3 = 'web/results?itag=q=kgs=kls=' ORDER BY p1,p2,p3; this might push the planner into using the primary key gnari ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Process Time X200
On fös, 2006-03-10 at 10:11 +0100, NbForYou wrote: Hey Michael, you sure know your stuff! Versions: PostgreSQL 7.3.9-RH running on the webhost. PostgreSQL 8.0.3 running on my homeserver. So the only solution is to ask my webhost to upgrade its postgresql? The question is will he do that? After all a license fee is required for commercial use. And running a webhosting service is a commercial use. A licence fee for what? Certainly not for postgresql. gnari ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sequencial scan instead of using index
On mán, 2006-03-06 at 13:46 -0500, Harry Hehl wrote: Query: select * from ommemberrelation where srcobj='somevalue' and dstobj in (select objectid from omfilesysentry where name='dir15_file80'); Columns srcobj, dstobj name are all indexed. -- Nested Loop IN Join (cost=486.19..101533.99 rows=33989 width=177) (actual time=5.493..90.682 rows=1 loops=1) Join Filter: (outer.dstobj = inner.objectid) - Seq Scan on ommemberrelation (cost=0.00..2394.72 rows=33989 width=177) (actual time=0.078..70.887 rows=100 loops=1) Filter: (srcobj = '3197a4e6-abf1-11da-a0f9-000fb05ab829'::text) - Materialize (cost=486.19..487.48 rows=129 width=16) (actual time=0.004..0.101 rows=26 loops=100) Looks like the planner is expecting 33989 rows, making an index scan a ppor choice, but in fact only 100 rows actually match your srcobj value. Could we see the explain analyze with enable_seqscan = false please ? Possibly you might want totry to increase the statistics target for this columns , as in: ALTER TABLE ommemberrelation ALTER COLUMN srcobj SET STATISTICS 1000; ANALYZE; and try again (with enable_seqscan=true) A target of 1000 ismost probably overkill, but start with this value, and if it improves matters, you can experiment with lower settings. gnari ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Created Index is not used
On fim, 2006-02-23 at 13:35 +0100, Kjeld Peters wrote: Select and update statements are quite slow on a large table with more than 600,000 rows. The table consists of 11 columns (nothing special). The column id (int8) is primary key and has a btree index on it. The following select statement takes nearly 500ms: SELECT * FROM table WHERE id = 60; A prepending EXPLAIN to the statement reveals a seq scan: EXPLAIN SELECT * FROM table WHERE id = 60; Seq Scan on table (cost=0.00..15946.48 rows=2 width=74) Filter: (id = 60) I tried a full vacuum and a reindex, but had no effect. Why is PostgreSQL not using the created index? try one of: SELECT * FROM table WHERE id = '60'; SELECT * FROM table WHERE id = 60::int8; PostgreSQL 8+ gnari ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
On fös, 2006-02-17 at 08:01 -0500, Ron wrote: At 04:24 AM 2/17/2006, Ragnar wrote: On fös, 2006-02-17 at 01:20 -0500, Ron wrote: OK, so here's _a_ way (there are others) to obtain a mapping such that if a b then f(a) f (b) and if a == b then f(a) == f(b) By scanning the table once, we can map say 001h (Hex used to ease typing) to the row with the minimum value and 111h to the row with the maximum value as well as mapping everything in between to their appropriate keys. That same scan can be used to assign a pointer to each record's location. This step is just as expensive as the original sort you want to replace/improve. Why do you think that? External sorts involve the equivalent of multiple scans of the table to be sorted, sometimes more than lgN (where N is the number of items in the table to be sorted). Since this is physical IO we are talking about, each scan is very expensive, and therefore 1 scan is going to take considerably less time than = lgN scans will be. Call me dim, but please explain exactly how you are going to build this mapping in one scan. Are you assuming the map will fit in memory? If you want to keep this mapping saved as a sort of an index, or as part ot each row data, this will make the cost of inserts and updates enormous. Not sure you've got this right either. Looks to me like we are adding a = 32b quantity to each row. Once we know the mapping, incrementally updating it upon insert or update would seem to be simple matter of a fast search for the correct ranking [Interpolation search, which we have all the needed data for, is O(lglgN). Hash based search is O(1)]; plus an increment/decrement of the key values greater/less than the key value of the row being inserted / updated. Given than we are updating all the keys in a specific range within a tree structure, that update can be done in O(lgm) (where m is the number of records affected). Say again ? Let us say you have 1 billion rows, where the column in question contains strings like baaaaaa baaaaab baaaaac ... not necessarily in this order on disc of course The minimum value would be keyed as 0001h, the next one as 0002h and so on. Now insert new value 'a' Not only will you have to update 1 billion records, but also all the values in your map. please explain gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Large Database Design Help
On Fri, 2006-02-10 at 11:24 +0100, Markus Schaber wrote: For lots non-read-only database workloads, RAID5 is a performance killer. Raid 1/0 might be better, or having two mirrors of two disks each, the first mirror holding system, swap, and the PostgreSQL WAL files, the second one holding the data. I was under the impression that it is preferable to keep the WAL on its own spindles with no other activity there, to take full advantage of the sequential nature of the WAL writes. That would mean one mirror for the WAL, and one for the rest. This, of course, may sometimes be too much wasted disk space, as the WAL typically will not use a whole disk, so you might partition this mirror into a small ext2 filesystem for WAL, and use the rest for files seldom accessed, such as backups. gnari ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow Query
On Thu, 2005-07-14 at 10:06 +1000, Marc McIntyre wrote: I'm having a problem with a query that performs a sequential scan on a table when it should be performing an index scan. The interesting thing is, when we dumped the database on another server, it performed an index scan on that server. ... The EXPLAIN ANALYZE from the system performing an sequential scan: QUERY PLAN Sort (cost=30079.79..30079.89 rows=42 width=113) (actual time=39889.989..39890.346 rows=260 loops=1) ... The EXPLAIN ANALYZE from the system performing an index scan scan: Sort (cost=16873.64..16873.74 rows=40 width=113) (actual time=2169.905..2169.912 rows=13 loops=1) looks like the first query is returning 260 rows, but the second one 13 this may not be your problem, but are you sure you are using the same query on the same data here ? gnari ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Planner issue
On Tue, 2005-03-22 at 14:36 -0500, Alex Turner wrote: I will use an index 220-300, but not 200-300. ... Seq Scan on propmain (cost=0.00..15517.56 rows=6842 width=4) (actual time=0.039..239.760 rows=6847 loops=1) ... Index Scan using propmain_listprice_i on propmain (cost=0.00..22395.95 rows=6842 width=4) (actual time=0.084..25.751 rows=6847 loops=1) the rows estimates are accurate, so it is not a question of statistics anymore. first make sure effective_cache_size is correctly set, and then if that is not enough, you might try to lower random_page_cost a bit gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to read query plan
On Sun, 2005-03-13 at 16:32 +0100, Miroslav ulc wrote: Hi all, I am new to PostgreSQL and query optimizations. We have recently moved our project from MySQL to PostgreSQL and we are having performance problem with one of our most often used queries. On MySQL the speed was sufficient but PostgreSQL chooses time expensive query plan. I would like to optimize it somehow but the query plan from EXPLAIN ANALYZE is little bit cryptic to me. [snip output of EXPLAIN ANALYZE] for those of us who have not yet reached the level where one can infer it from the query plan, how abour showing us the actual query too ? but as an example of what to look for, consider the first few lines (reformatted): Merge Right Join (cost=9868.84..9997.74 rows=6364 width=815) (actual time=9982.022..10801.216 rows=6364 loops=1) Merge Cond: (outer.idpk = inner.cadastralunitidfk) - Index Scan using cadastralunits_pkey on cadastralunits (cost=0.00..314.72 rows=13027 width=31) (actual time=0.457..0.552 rows=63 loops=1) - Sort (cost=9868.84..9884.75 rows=6364 width=788) (actual time=9981.405..10013.708 rows=6364 loops=1) notice that the index scan is expected to return 13027 rows, but actually returns 63. this might influence the a choice of plan. gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Help with tuning this query
On Wed, 2005-03-02 at 01:51 -0500, Ken Egervari wrote: select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc ... shipment contains 40,000 rows shipment_status contains 80,000 rows I may be missing something, but it looks like the second join on shipment_status (the left join) is not adding anything to your results, except more work. ss is not used for output, nor in the where clause, so what is its purpose ? if cs.date has an upper limit, it might be helpful to change the condition to a BETWEEN in any case, i would think you might need an index on shipment(carrier_code_id) shipment(current_status_id) shipment_status(id) gnari ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help with tuning this query
On Wed, 2005-03-02 at 13:28 -0500, Ken Egervari wrote: select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc I may be missing something, but it looks like the second join on shipment_status (the left join) is not adding anything to your results, except more work. ss is not used for output, nor in the where clause, so what is its purpose ? ... The second left join is for eager loading so that I don't have to run a seperate query to fetch the children for each shipment. This really does improve performance because otherwise you'll have to make N+1 queries to the database, and that's just too much overhead. are you saying that you are actually doing a select s.*,ss.* ... ? if cs.date has an upper limit, it might be helpful to change the condition to a BETWEEN Well, I could create an upper limit. It would be the current date. Would adding in this redundant condition improve performance? it might help the planner estimate better the number of cs rows affected. whether this improves performance depends on whether the best plans are sensitive to this. an EXPLAIN ANALYSE might reduce the guessing. gnari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL
On Mon, 2005-01-24 at 15:45 -0800, Josh Berkus wrote: [about keeping open DB connections between web-client connections] [I wrote:] no. you can only count on web-server-process==connection, but not web-user==connection, unless you can garantee that the same user client always connects to same web-server process. Are there ones that you use which might use several different connections to send a series of queries from a single web-user, less than 5 seconds apart? actually, it had never occurred to me to test all browsers in this reguard, but i can think of LWP::UserAgent. gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PgPool changes WAS: PostgreSQL clustering VS MySQL
On Mon, 2005-01-24 at 09:52 -0800, Josh Berkus wrote: [about keeping connections open in web context] Ah, clarity problem here.I'm talking about connection pooling tools from the client (webserver) side, such as Apache::DBI, PHP's pg_pconnect, Jakarta's connection pools, etc. Not pooling on the database server side, which is what pgPool provides. note that these sometimes do not provide connection pooling as such, just persistent connections (Apache::DBI) Most of these tools allocate a database connection to an HTTP/middleware client, and only release it after a specific period of inactivity.This means that you *could* count on web-user==connection for purposes of switching back and forth to the master -- as long as the connection-recycling timeout were set higher than the pgPool switch-off period. no. you can only count on web-server-process==connection, but not web-user==connection, unless you can garantee that the same user client always connects to same web-server process. am i missing something ? gnari ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index scan of whole table, can't see why
On Wed, 2005-01-19 at 21:00 -0800, [EMAIL PROTECTED] wrote: Let's see if I have been paying enough attention to the SQL gurus. The planner is making a different estimate of how many deprecated'' versus how many broken ''. I would try SET STATISTICS to a larger number on the ports table, and re-analyze. that should not help, as the estimate is accurate, according to the explain analyze. gnari ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index scan of whole table, can't see why
On Wed, 2005-01-19 at 20:37 -0500, Dan Langille wrote: Hi folks, Running on 7.4.2, recently vacuum analysed the three tables in question. The query plan in question changes dramatically when a WHERE clause changes from ports.broken to ports.deprecated. I don't see why. Well, I do see why: a sequential scan of a 130,000 rows. The query goes from 13ms to 1100ms because the of this. The full plans are at http://rafb.net/paste/results/v8ccvQ54.html I have tried some tuning by: set effective_cache_size to 4000, was 1000 set random_page_cost to 1, was 4 The resulting plan changes, but no speed improvment, are at http://rafb.net/paste/results/rV8khJ18.html this just confirms that an indexscan is not always better than a tablescan. by setting random_page_cost to 1, you deceiving the planner into thinking that the indexscan is almost as effective as a tablescan. Any suggestions please? did you try to increase sort_mem ? gnari ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [SQL] OFFSET impact on Performance???
On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote: The best way to do pages for is not to use offset or cursors but to use an index. This only works if you can enumerate all the sort orders the application might be using and can have an index on each of them. To do this the query would look something like: SELECT * FROM tab WHERE col ? ORDER BY col LIMIT 50 Then you take note of the last value used on a given page and if the user selects next you pass that as the starting point for the next page. this will only work unchanged if the index is unique. imagine , for example if you have more than 50 rows with the same value of col. one way to fix this is to use ORDER BY col,oid gnari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [SQL] OFFSET impact on Performance???
On Thu, 2005-01-20 at 19:12 +, Ragnar Hafstað wrote: On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote: The best way to do pages for is not to use offset or cursors but to use an index. This only works if you can enumerate all the sort orders the application might be using and can have an index on each of them. To do this the query would look something like: SELECT * FROM tab WHERE col ? ORDER BY col LIMIT 50 Then you take note of the last value used on a given page and if the user selects next you pass that as the starting point for the next page. this will only work unchanged if the index is unique. imagine , for example if you have more than 50 rows with the same value of col. one way to fix this is to use ORDER BY col,oid and a slightly more complex WHERE clause as well, of course gnari ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgres Optimizer is not smart enough?
On Thu, 2005-01-13 at 12:14 +1300, Mark Kirkwood wrote: [snip some explains] I have random_page_cost = 0.8 in my postgresql.conf. Setting it back to the default (4) results in a plan using test_id1. it is not rational to have random_page_cost 1. if you see improvement with such a setting, it is as likely that something else is wrong, such as higher statistic targets needed, or a much too low effective_cache setting. gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Howto Increased performace ?
On Mon, 2004-12-27 at 22:31 +0700, Amrit Angsusingh wrote: [ [EMAIL PROTECTED] ] These are some settings that I am planning to start with for a 4GB RAM dual opteron system with a maximum of 100 connections: shared_buffers 8192 (=67MB RAM) sort_mem 4096 (=400MB RAM for 100 connections) effective_cache_size 38(@8KB =3.04GB RAM) vacuum_mem 32768 KB wal_buffers 64 checkpoint_segments 8 In theory, effective cache size is the amount of memory left over for the OS to cache the filesystem after running all programs and having 100 users connected, plus a little slack. I reduced the connection to 160 and configured as below there is some improvement in speed . shared_buffers = 27853 [Should I reduce it to nearly as you do and what will happen?] at some point, more shared buffers will do less good than leaving the memory to the OS to use as disk buffers. you might want to experiment a bit with different values to find what suits your real-life conditions sort_mem = 8192 vacuum_mem = 16384 effective_cache_size = 81920 [Should I increase it to more than 20 ?] as Iain wrote, this value is an indication of how much memory will be available to the OS for disk cache. when all other settings have been made, try to see how much memory your OS has left under normal conditions, and adjust your setting accordingly, if it differs significantly. I have seen cases where an incorrect value (too low) influenced the planner to use sequential scans instead of better indexscans, presumably because of a higher ratio of estimated cache hits. Thanks for any comment again. NB. There is a huge diaster in my country Tsunamies and all the people over the country include me felt into deep sorrow. my condolescences. Amrit Angsusingh Thailand gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Howto Increased performace ?
On Tue, 2004-12-21 at 16:31 +0700, Amrit Angsusingh wrote: I used postgresql 7.3.2-1 with RH 9 on a mechine of 2 Xeon 3.0 Ghz and ram of 4 Gb. Since 1 1/2 yr. when I started to use the database server after optimizing the postgresql.conf everything went fine until a couple of weeks ago , my database grew up to 3.5 Gb and there were more than 140 concurent connections. ... shared_buffers = 25 this is much higher than usually adviced on this list. try to reduce this to 25000 effective_cache_size = 5000 and increase this instead, to say, 5 gnari ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Speed in V8.0
On Wed, 2004-12-22 at 00:03 +0100, Thomas Wegner wrote: Hello, i have a problem between V7.4.3 Cygwin and V8.0RC2 W2K. I have 2 systems: 1. Production Machine - Dual P4 3000MHz - 2 GB RAM - W2K - PostgreSQL 7.4.3 under Cygwin - i connect to it over a DSL Line 2. Develop Machine - P4 1800MHz - 760 MB RAM - PostgreSQL Native Windows - local connection 100MB/FD Both systems use the default postgresql.conf. Now the problem. I have an (unoptimized, dynamic) query wich was execute on the production machine over DSL in 2 seconds and on my develop machine, connected over local LAN, in 119 seconds! has the development database been ANALYZED ? gnari ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Tips for a system with _extremely_ slow IO?
On Fri, 2004-12-17 at 23:51 -0800, Ron Mayer wrote: Any advice for settings for extremely IO constrained systems? A demo I've set up for sales seems to be spending much of it's time in disk wait states. The particular system I'm working with is: Ext3 on Debian inside Microsoft VirtualPC on NTFS on WindowsXP on laptops of our sales team. As this is only for demo purposes, you might consider turning fsync off, although I have no idea if it would have any effect on your setup. gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [NOVICE] \d output to a file
On Wed, 2004-12-15 at 11:50 -0500, Tom Lane wrote: Geoffrey [EMAIL PROTECTED] writes: sarlav kumar wrote: I would like to write the output of the \d command on all tables in a database to an output file. What is the OS? On any UNIX variant you can do: echo '\d' | psql outputfile Or use \o: regression=# \o zzz1 regression=# \d or: =# \d * to get all tables as th OP wanted regression=# \o gnari ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]