Re: [PERFORM] Question regarding the file system
Gabriele, > By any chance, do you have some reference or some tests that talk about the > fact that RAID5 with less than 5 disks is not performing? Just this list. But it's easy to test yourself; run bonnie++ and compare the performance of seeks and random writes (which PG does a lot of) vs. a plain single disk. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow performance with Group By
Erik Norvelle <[EMAIL PROTECTED]> writes: >>> it=> explain select codelemm, sectref, count(codelemm) from indethom > group by codelemm, sectref; >>> QUERY PLAN >>> --- > - >>> GroupAggregate (cost=2339900.60..2444149.44 rows=1790528 width=13) >>> -> Sort (cost=2339900.60..2364843.73 rows=9977252 width=13) >>> Sort Key: codelemm, sectref >>> -> Seq Scan on indethom (cost=0.00..455264.52 rows=9977252 > width=13) Actually the painful part of that is the sort. If you bump up sort_mem enough it will eventually switch over to a HashAggregate with no sort, which may be a better plan if there's not too many groups (is the estimate of 1.79 million on the mark at all??) regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] vacuum analyze slows sql query
patrick ~ wrote: Hi John, Thanks for your reply and analysis. --- John Meinel <[EMAIL PROTECTED]> wrote: patrick ~ wrote: [...] pkk=# explain analyze execute pkk_01(241 ); QUERY PLAN One other thing that I just thought of. I think it is actually possible to add an index on a function of a column. So if you have the "is_really_pending" function, you might be able to do: CREATE INDEX pkk_is_really_pending ON pkk_purchase (is_really_pending(purchase_id)); But you need a better guru than I to make sure of that. This *might* do what you need. John =:-> signature.asc Description: OpenPGP digital signature
[PERFORM] Slow performance with Group By
Greetings all, This question has probably been asked many times, but I was unable to use the list archives to search, since the term "Group" matches thousands of of messages with the names of user groups in them... so sorry if I'm repeating! Here's the problem: I have a table of 10,000,000 records called "indethom", each record representing a word in the works of a particular author. Each record contains, among other columns, an CHAR(5) column representing the "lemma" code (i.e. which word it is) called "codelemm", and an integer representing a textual unit, i.e. chapter or other division of a work (these are numbered consecutively from 0 to around 50,000), called "sectref". What I want to do is find out how many times every word occurs in each textual unit (or no row returned for textual units where a particular word doesn't appear). I used a group-by clause to group by "sectref", and then used the COUNT(codelemm) function to sum up the occurrences. The codelemm column had to be grouped on, in order to satisfy Postgres's requirements. Here's the query as I have it: > create table matrix2.tuo as select codelemm, sectref, count(codelemm) from indethom group by codelemm, sectref; And the explain results are as follows: >it=> explain select codelemm, sectref, count(codelemm) from indethom group by codelemm, sectref; > QUERY PLAN >--- - > GroupAggregate (cost=2339900.60..2444149.44 rows=1790528 width=13) > -> Sort (cost=2339900.60..2364843.73 rows=9977252 width=13) > Sort Key: codelemm, sectref > -> Seq Scan on indethom (cost=0.00..455264.52 rows=9977252 width=13) I have an index defined as follows: > create index indethom_clemm_sect_ndx on indethom using btree(codelemm, sectref); I also performed an ANALYZE after creating the index. I have the gut feeling that there's got to be a better way than a sequence scan on 10,000,000 records, but I'll be darned if I can find any way to improve things here. Thanks for any help you all can offer!! Erik Norvelle ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] vacuum analyze slows sql query
Lets say for a second that you manage to trick it into using index scan, and then you actually call the function with one of the values that returns 1,000s of rows. Probably it will take 10-100 times longer than if it used a seq scan. I don't know if it matters (I suspect that it does) but I am using LIMIT 1 in the sub-query/stored function. All I need is one single row meeting any of the criteria laid out in the stored procedure to establish an offer_id is "pending". So, in your case if you LIMIT the index scan will always be fast, and the seq scan will be catastrophic, because you don't need to retrieve all the rows, but just one. (IMHO the planner screws these LIMIT clauses becauses it expects the data to be randomly distributed in the first page while in real life it's not). You could use EXIST to test the existence of a subquery (after all, thats its purpose), or you could : When SELECT ... FROM table WHERE stuff=value LIMIT 1 obstinately uses a seq scan, spray a little order by : When SELECT ... FROM table WHERE stuff=value ORDER BY stuff LIMIT 1 the ORDER BY will make the planner think "I could use the index to order"... ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] vacuum analyze slows sql query
patrick ~ wrote: Hi John, Thanks for your reply and analysis. No problem. It just happens that this is a problem we ran into recently. --- John Meinel <[EMAIL PROTECTED]> wrote: patrick ~ wrote: [...] Hmm... The fact is I am selecting (in this example anyway) over all values in pkk_offer table and calling the stored function with each pkk_offer.offer_id which in turn does a select on pkk_purchase table. Note that offer_id is a foreign key in pkk_purchase referencing pkk_offer table. I don't know if it matters (I suspect that it does) but I am using LIMIT 1 in the sub-query/stored function. All I need is one single row meeting any of the criteria laid out in the stored procedure to establish an offer_id is "pending". If you are trying to establish existence, we also had a whole thread on this. Basically what we found was that adding an ORDER BY clause, helped tremendously in getting the planner to switch to an Index scan. You might try something like: SELECT column FROM mytable WHERE column='myval' ORDER BY column LIMIT 1; There seems to be a big difference between the above statement and: SELECT column FROM mytable WHERE column='myval' LIMIT 1; So what is the solution? The only one I'm aware of is to turn your static function into a dynamic one. So somewhere within the function you build up a SQL query string and call EXECUTE str. This forces the query planner to be run every time you call the function. This means that if you call it will a "nice" value, you will get the fast index scan, and if you call it with a "bad" value, it will switch back to seq scan. The downside is you don't get much of a benefit from using as stored procedure, as it has to run the query planner all the time (as though you issue the query manually each time.) But it still might be better for you in the long run. Well, running the query without the stored function, basically typing out the stored function as a sub-query shows me: pkk=# explain analyze select o0.offer_id, ( select case when ( select p0.purchase_id from pkk_purchase p0 where p0.offer_id = o0.offer_id and ( p0.pending = true or ( ( p0.expire_time > now() or p0.expire_time isnull ) and p0.cancel_date isnull ) ) limit 1 ) isnull then false else true end ) from pkk_offer o0 ; QUERY PLAN Seq Scan on pkk_offer o0 (cost=0.00..1834.11 rows=618 width=4) (actual time=2413.398..1341885.084 rows=618 loops=1) SubPlan -> Result (cost=2.94..2.95 rows=1 width=0) (actual time=2171.287..2171.289 rows=1 loops=618) InitPlan -> Limit (cost=0.00..2.94 rows=1 width=4) (actual time=2171.264..2171.266 rows=1 loops=618) -> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=12670 width=4) (actual time=2171.245..2171.245 rows=1 loops=618) Filter: ((offer_id = $0) AND (((expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 1341887.523 ms (8 rows) while deleting all statistics on the pkk_% tables I get: pkk=# delete from pg_statistic where pg_statistic.starelid = pg_class.oid and pg_class.relname like 'pkk_%'; DELETE 11 pkk=# explain analyze select o0.offer_id, ( select case when ( select p0.purchase_id from pkk_purchase p0 where p0.offer_id = o0.offer_id and ( p0.pending = true or ( ( p0.expire_time > now() or p0.expire_time isnull ) and p0.cancel_date isnull ) ) limit 1 ) isnull then false else true end ) from pkk_offer o0 ; QUERY PLAN Seq Scan on pkk_offer o0 (cost=0.00..6646.94 rows=618 width=4) (actual time=0.190..799.930 rows=618 loops=1) SubPlan -> Result (cost=10.73..10.74 rows=1 width=0) (actual time=1.277..1.278 rows=1 loops=618) InitPlan -> Limit (cost=0.00..10.73 rows=1 width=4) (actual time=1.266..1.267 rows=1 loops=618) -> Index Scan using pur_offer_id_idx on pkk_purchase p0 (cost=0.00..20690.18 rows=1929 width=4) (actual time=1.258..1.258 rows=1 loops=618) Index Cond: (offer_id = $0) Filter: expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 801.234 ms (9 rows) As you can see this query (over all values of pkk_offer) with out any pg_statistics on the pkk_purchase table is extremely fast. Is this a bug in the PostgreSQL planner that misjudges the best choice with pg_statistics at hand? --patrick In order to understand your query I broke it up and restructured it
Re: [ADMIN] [PERFORM] poor performance in migrated database
Carlos Lopez <[EMAIL PROTECTED]> writes: > This is one of the queries that work,and is the first > in a 4 level nested query Do you really need UNION (as opposed to UNION ALL) in this query? The EXPLAIN shows that almost half the runtime is going into the sort/uniq to eliminate duplicates ... and according to the row counts, there are no duplicates, so it's wasted effort. I looked at your schema and saw an awful lot of SELECT DISTINCTs that looked like they might not be necessary, too. But I'm not willing to crawl through 144 views with no information about which ones are causing you problems. What's a typical query that you are unsatisfied with the performance of? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] vacuum analyze slows sql query
Hi John, Thanks for your reply and analysis. --- John Meinel <[EMAIL PROTECTED]> wrote: > patrick ~ wrote: > [...] > > pkk=# explain analyze execute pkk_01(241 ); > > QUERY PLAN > > > - > > Result (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213 > > rows=1 loops=1) > >InitPlan > > -> Limit (cost=0.00..2.66 rows=1 width=4) (actual > > time=2872.189..2872.189 rows=0 loops=1) > >-> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=13983 > > width=4) (actual time=2872.180..2872.180 rows=0 loops=1) > > Filter: ((offer_id = $1) AND (((expire_time)::timestamp > with > > time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND > > ((cancel_date IS NULL) OR (pending = true))) > > Total runtime: 2872.339 ms > > (6 rows) > > > > Time: 2873.479 ms > > > > [...] > > > So, is this the ultimate solution to this issue? > > > > --patrick > > It's not so much that correlation is < 0.5. It sounds like you're > running into the same issue that I ran into in the past. You have a > column with lots of repeated values, and a few exceptional ones. Notice > this part of the query: > -> Seq Scan on pkk_purchase p0 (cost rows=13983) (actual rows=0) > > For a general number, it thinks it might return 14,000 rows, hence the > sequential scan. Before you do ANALYZE, it uses whatever defaults exist, > which are probably closer to reality. > > The problem is that you probably have some values for pkk_purchase where > it could return 14,000 rows (possibly much much more). And for those, > seq scan is the best plan. However, for the particular value that you > are testing, there are very few (no) entries in the table. You are absoultely correct: pkk=# select offer_id,count(*) from pkk_purchase group by offer_id order by count ; offer_id | count --+ 1019 | 1 1018 | 1 1016 | 1 (many of these) ... |... 2131 | 6 844 | 6 1098 | 6 (a dozen or so of these) ... |... 2263 |682 2145 |723 2258 |797 2091 |863 ... |... 1153 | 96330 (the few heavy weights) 244 | 122163 242 | 255719 243 | 273427 184 | 348476 > With a prepared statement (or a function) it has to determine ahead of > time what the best query is without knowing what value you are going to > ask for. > > Lets say for a second that you manage to trick it into using index scan, > and then you actually call the function with one of the values that > returns 1,000s of rows. Probably it will take 10-100 times longer than > if it used a seq scan. Hmm... The fact is I am selecting (in this example anyway) over all values in pkk_offer table and calling the stored function with each pkk_offer.offer_id which in turn does a select on pkk_purchase table. Note that offer_id is a foreign key in pkk_purchase referencing pkk_offer table. I don't know if it matters (I suspect that it does) but I am using LIMIT 1 in the sub-query/stored function. All I need is one single row meeting any of the criteria laid out in the stored procedure to establish an offer_id is "pending". > So what is the solution? The only one I'm aware of is to turn your > static function into a dynamic one. > > So somewhere within the function you build up a SQL query string and > call EXECUTE str. This forces the query planner to be run every time you > call the function. This means that if you call it will a "nice" value, > you will get the fast index scan, and if you call it with a "bad" value, > it will switch back to seq scan. > > The downside is you don't get much of a benefit from using as stored > procedure, as it has to run the query planner all the time (as though > you issue the query manually each time.) But it still might be better > for you in the long run. Well, running the query without the stored function, basically typing out the stored function as a sub-query shows me: pkk=# explain analyze select o0.offer_id, ( select case when ( select p0.purchase_id from pkk_purchase p0 where p0.offer_id = o0.offer_id and ( p0.pending = true or ( ( p0.expire_time > now() or p0.expire_time isnull ) and p0.cancel_date isnull ) ) limit 1 ) isnull then false else true end ) from pkk_offer o0 ; QUERY PLAN Seq Scan on pkk_offer o0 (cost=0.00..1834.11 rows=618 width=4) (actual time=2413.398..1341885.084 rows=618 loops=1) SubPlan -> Result (cost=2.94..2.95 rows=1 width=0) (actual time=2171.287..2171.289 rows=1 loops=618)
Re: [ADMIN] [PERFORM] poor performance in migrated database
This is one of the queries that work,and is the first in a 4 level nested query where do I find how to interpret explains??? thanks in advance, Carlos. mate=# explain analyze select * from vdocinvdpre; QUERY PLAN - Subquery Scan vdocinvdpre (cost=265045.23..281225.66 rows=231149 width=684) (actual time=29883.231..37652.860 rows=210073 loops=1) -> Unique (cost=265045.23..278914.17 rows=231149 width=423) (actual time=29883.182..34109.259 rows=210073 loops=1) -> Sort (cost=265045.23..265623.10 rows=231149 width=423) (actual time=29883.166..31835.849 rows=210073 loops=1) Sort Key: no_doc, seq, codigoinv, lote, no_rollo, costo_uni, po, cantidad_total, id_pedido, id_proveedor, udm, doc_ref, corte, id_planta, accion, costo_total, ubicacion, cantidad_detallada, descripcion, observaciones, factura, fecha_factura, correlativo -> Append (cost=36954.34..60836.63 rows=231149 width=423) (actual time=4989.382..18277.031 rows=210073 loops=1) -> Subquery Scan "*SELECT* 1" (cost=36954.34..44100.17 rows=79542 width=402) (actual time=4989.371..8786.752 rows=58466 loops=1) -> Merge Left Join (cost=36954.34..43304.75 rows=79542 width=402) (actual time=4989.341..7767.335 rows=58466 loops=1) Merge Cond: (("outer".seq = "inner".seq) AND ("outer"."?column18?" = "inner"."?column6?")) -> Sort (cost=29785.78..29925.97 rows=56076 width=366) (actual time=2829.242..3157.807 rows=56076 loops=1) Sort Key: docinvdtrims.seq, ltrim(rtrim((docinvdtrims.no_doc)::text)) -> Seq Scan on docinvdtrims (cost=0.00..2522.76 rows=56076 width=366) (actual time=17.776..954.557 rows=56076 loops=1) -> Sort (cost=7168.56..7310.40 rows=56738 width=60) (actual time=2159.854..2460.061 rows=56738 loops=1) Sort Key: docinvdtrimsubica.seq, ltrim(rtrim((docinvdtrimsubica.no_doc)::text)) -> Seq Scan on docinvdtrimsubica (cost=0.00..1327.38 rows=56738 width=60) (actual time=14.545..528.530 rows=56738 loops=1) -> Subquery Scan "*SELECT* 2" (cost=0.00..16736.46 rows=151607 width=423) (actual time=7.731..7721.147 rows=151607 loops=1) -> Seq Scan on docinvdrollos (cost=0.00..15220.39 rows=151607 width=423) (actual time=7.699..5109.468 rows=151607 loops=1) Total runtime: 38599.868 ms (17 filas) --- Simon Riggs <[EMAIL PROTECTED]> wrote: > On Sat, 2004-11-06 at 19:52, Carlos Lopez wrote: > > The problem is that there are many nested views > which > > normally join tables by using two fields, one > > character and other integer. > > PostgreSQL has difficulty with some multi-column > situations, even though > in general it has a particularly good query > optimizer. > > If the first column is poorly selective, yet the > addition of the second > column makes the combination very highly selective > then PostgreSQL may > not be able to realise this, ANALYZE or not. ANALYZE > doesn't have > anywhere to store multi-column selectivity > statistics. > > EXPLAIN ANALYZE will show you whether this is the > case. It seems likely > that the estimated cardinality of certain joins is > incorrect. > > -- > Best Regards, Simon Riggs > > > ---(end of > broadcast)--- > TIP 9: the planner will ignore your desire to choose > an index scan if your > joining column's datatypes do not match > = ___ Ing. Carlos López Linares IT Consultant Quieres aprender linux? visita http://www.aprende-linux.com.sv __ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] vacuum analyze slows sql query
patrick ~ wrote: [...] pkk=# explain analyze execute pkk_01(241 ); QUERY PLAN - Result (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213 rows=1 loops=1) InitPlan -> Limit (cost=0.00..2.66 rows=1 width=4) (actual time=2872.189..2872.189 rows=0 loops=1) -> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=13983 width=4) (actual time=2872.180..2872.180 rows=0 loops=1) Filter: ((offer_id = $1) AND (((expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 2872.339 ms (6 rows) Time: 2873.479 ms [...] So, is this the ultimate solution to this issue? --patrick It's not so much that correlation is < 0.5. It sounds like you're running into the same issue that I ran into in the past. You have a column with lots of repeated values, and a few exceptional ones. Notice this part of the query: -> Seq Scan on pkk_purchase p0 (cost rows=13983) (actual rows=0) For a general number, it thinks it might return 14,000 rows, hence the sequential scan. Before you do ANALYZE, it uses whatever defaults exist, which are probably closer to reality. The problem is that you probably have some values for pkk_purchase where it could return 14,000 rows (possibly much much more). And for those, seq scan is the best plan. However, for the particular value that you are testing, there are very few (no) entries in the table. With a prepared statement (or a function) it has to determine ahead of time what the best query is without knowing what value you are going to ask for. Lets say for a second that you manage to trick it into using index scan, and then you actually call the function with one of the values that returns 1,000s of rows. Probably it will take 10-100 times longer than if it used a seq scan. So what is the solution? The only one I'm aware of is to turn your static function into a dynamic one. So somewhere within the function you build up a SQL query string and call EXECUTE str. This forces the query planner to be run every time you call the function. This means that if you call it will a "nice" value, you will get the fast index scan, and if you call it with a "bad" value, it will switch back to seq scan. The downside is you don't get much of a benefit from using as stored procedure, as it has to run the query planner all the time (as though you issue the query manually each time.) But it still might be better for you in the long run. Example: instead of create function test(int) returns int as ' declare x alias for $1; int y; begin select into y ... from ... where id=x limit ...; return y; end '; use this format create function test(int) returns int as ' declare x alias for $1; int y; begin EXECUTE ''select into y ... from ... where id='' ||quote_literal(x) || '' limit ...''; return y; end; '; I think that will point you in the right direction. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] vacuum analyze slows sql query
Sorry for the late reply. Was feeling a bit under the weather this weekend and didn't get a chance to look at this. --- Tom Lane <[EMAIL PROTECTED]> wrote: > patrick ~ <[EMAIL PROTECTED]> writes: > > PREPARE pkk_00 ( integer ) ) > > This is what you want to do, but not quite like that. The PREPARE > determines the plan and so VACUUMing and re-EXECUTing is going to show > the same plan. What we need to look at is > - standing start > PREPARE pkk_00 ... > EXPLAIN ANALYZE EXECUTE pkk_00 ... > VACUUM ANALYZE; > PREPARE pkk_01 ... > EXPLAIN ANALYZE EXECUTE pkk_01 ... But of course! I feel a bit silly now. This is what I get after following Tom's directions: pkk=# prepare pkk_00 ( integer ) as select ... PREPARE Time: 1.753 ms pkk=# execute pkk_00(241 ); case -- f (1 row) Time: 0.788 ms pkk=# explain analyze execute pkk_00(241 ); QUERY PLAN Result (cost=10.73..10.74 rows=1 width=0) (actual time=0.067..0.068 rows=1 loops=1) InitPlan -> Limit (cost=0.00..10.73 rows=1 width=4) (actual time=0.055..0.055 rows=0 loops=1) -> Index Scan using pur_offer_id_idx on pkk_purchase p0 (cost=0.00..20690.18 rows=1929 width=4) (actual time=0.052..0.052 rows=0 loops=1) Index Cond: (offer_id = $1) Filter: expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 0.213 ms (7 rows) Time: 24.654 ms pkk=# vacuum analyze ; VACUUM Time: 128826.078 ms pkk=# prepare pkk_01 ( integer ) as select ... PREPARE Time: 104.658 ms pkk=# execute pkk_01(241 ); case -- f (1 row) Time: 7652.708 ms pkk=# explain analyze execute pkk_01(241 ); QUERY PLAN Result (cost=2.66..2.67 rows=1 width=0) (actual time=2872.211..2872.213 rows=1 loops=1) InitPlan -> Limit (cost=0.00..2.66 rows=1 width=4) (actual time=2872.189..2872.189 rows=0 loops=1) -> Seq Scan on pkk_purchase p0 (cost=0.00..37225.83 rows=13983 width=4) (actual time=2872.180..2872.180 rows=0 loops=1) Filter: ((offer_id = $1) AND (((expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 2872.339 ms (6 rows) Time: 2873.479 ms So it looks like after the VACCUM the planner resorts to Seq Scan rather than Index Scan. This is because of the value of correlation field in pg_stats (according to PostgreSQL docs) being closer to 0 rather than ±1: pkk=# select tablename,attname,correlation from pg_stats where tablename = 'pkk_purchase' and attname = 'offer_id' ; tablename | attname | correlation --+--+- pkk_purchase | offer_id |0.428598 (1 row) So I started to experiment with ALTER TABLE SET STATISTICS values to see which gets the correlation closer to ±1. The trend seems to indicat the higher the stat value is set it pushes the correlation value closer to 0: set statistics correlation 800 0.393108 500 0.408137 200 0.43197 50 0.435211 1 0.45758 And a subsequent PREPARE and EXPLAIN ANALYZE confirms that the Planer reverts back to using the Index Scan after setting stats to 1 (even though correlation value is still closer to 0 than 1): pkk=# explain analyze execute pkk_02(241 ); QUERY PLAN Result (cost=2.95..2.96 rows=1 width=0) (actual time=0.068..0.069 rows=1 loops=1) InitPlan -> Limit (cost=0.00..2.95 rows=1 width=4) (actual time=0.056..0.056 rows=0 loops=1) -> Index Scan using pur_offer_id_idx on pkk_purchase p0 (cost=0.00..35810.51 rows=12119 width=4) (actual time=0.053..0.053 rows=0 loops=1) Index Cond: (offer_id = $1) Filter: expire_time)::timestamp with time zone > now()) OR (expire_time IS NULL) OR (pending = true)) AND ((cancel_date IS NULL) OR (pending = true))) Total runtime: 0.200 ms (7 rows) So, is this the ultimate solution to this issue? --patrick __ Do you Yahoo!? Check out the new Yahoo! Front Page. www.yahoo.com ---(end of broadcast)--- TIP 3: if posting/reading t
Re: [PERFORM] Question regarding the file system
Gabriele, > I have been given a dual PIII with 768MB RAM and I am going to install > PostgreSQL on it, for data warehousing reasons. I have also been given four > 160 Ultra SCSI disks (36MB each) with a RAID controller (Adaptec 2100). I > am going to use a RAID5 architecture (this gives me approximately 103 GB of > data) and install a Debian Linux on it: this machine will be dedicated > exclusively to PostgreSQL. FWIW, RAID5 with < 5 disks is probably the worst-performing disk setup for PG with most kinds of DB applications. However, with 4 disks you don't have a lot of other geometries available.If the database will fit on one disk, I might suggest doing RAID 1 for 2 of the disks, and having two single disks, one with the OS and swap, and one with the database log. If you're doing Debian, make sure to get a current version of PG from Debian Unstable. > I was wondering which file system you suggest me: ext3 or reiserfs? These seem to be equivalent in data=writeback mode for most database applications. Use whichever you find easier to install & maintain. > Also, I was thinking of using the 2.6.x kernel which offers a faster thread > support: will PostgreSQL gain anything from it or should I stick with > 2.4.x? PostgreSQL won't gain anything from the thread support (unless you're using a threaded front-end app with thread-safe ecpg). But it will gain from several other improvements in 2.6, especially better scheduling and VM support. Use 2.6. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] ext3 journalling type
Matt, > It should be fastest because it is the least overhead, and safe because > postgres does it's own write-order guaranteeing through fsync(). You > should also mount the FS with the 'noatime' option. This, of course, assumes that PostgreSQL is the only thing on the partition. Which is a good idea in general, but not to be taken for granted ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] ext3 journalling type
I have some data here, no detailed analyses though: http://www.osdl.org/projects/dbt2dev/results/fs/ Mark On Mon, Nov 08, 2004 at 01:26:09PM +0100, Dawid Kuroczko wrote: > The ext3fs allows to selet type of journalling to be used with > filesystem. Journalling pretty much "mirrors" the work of WAL > logging by PostgreSQL... I wonder which type of journalling > is best for PgSQL in terms of performance. > Choices include: > journal > All data is committed into the journal prior to being > written into the main file system. > ordered > This is the default mode. All data is forced directly > out to the main file system prior to its metadata being > committed to the journal. > writeback > Data ordering is not preserved - data may be written into > the main file system after its metadata has been commit- > ted to the journal. This is rumoured to be the highest- > throughput option. It guarantees internal file system > integrity, however it can allow old data to appear in > files after a crash and journal recovery. > > Am I right to assume that "writeback" is both fastest and at the same > time as safe to use as ordered? Maybe any of you did some benchmarks? > > Regards, > Dawid > ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] ext3 journalling type
> Am I right to assume that "writeback" is both fastest and at > the same time as safe to use as ordered? Maybe any of you > did some benchmarks? It should be fastest because it is the least overhead, and safe because postgres does it's own write-order guaranteeing through fsync(). You should also mount the FS with the 'noatime' option. But For some workloads, there are tests showing that 'data=journal' can be the fastest! This is because although the data is written twice (once to the journal, and then to its real location on disk) in this mode data is written _sequentially_ to the journal, and later written out to its destination, which may be at a quieter time. There's a discussion (based around 7.2) here: http://www.kerneltraffic.org/kernel-traffic/kt20020401_160.txt M ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] ext3 journalling type
Dawid Kuroczko wrote: > The ext3fs allows to selet type of journalling to be used with > filesystem. Journalling pretty much "mirrors" the work of WAL > logging by PostgreSQL... I wonder which type of journalling > is best for PgSQL in terms of performance. > Choices include: > journal > All data is committed into the journal prior to being > written into the main file system. > ordered > This is the default mode. All data is forced directly > out to the main file system prior to its metadata being > committed to the journal. > writeback > Data ordering is not preserved - data may be written into > the main file system after its metadata has been commit- > ted to the journal. This is rumoured to be the highest- > throughput option. It guarantees internal file system > integrity, however it can allow old data to appear in > files after a crash and journal recovery. > > Am I right to assume that "writeback" is both fastest and at the same > time as safe to use as ordered? Maybe any of you did some benchmarks? Yes. I have seen benchmarks that say writeback is fastest but I don't have any numbers handy. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] postgresql amd-64
> > Good, I'll give it a shot and see what I come up with...thx. > > > Do share your experience with us. Will do. I have to ship the server on Friday, and the parts are on order. If they come today, I'll have time to test Gentoo, Redhat 32/64, and win32 by then. If I can't get it built until tomorrow, unfortunately the Gentoo test will have to be skipped. The win32 test is forced because our clients prefer win32 and I have to justify any platform change with a reasonable performance advantage. I have to compile and install a lot of software (including subversion, which I'm using to manage our application binaries), and I'm wary of 64 bit library issues which will hold me up. Any major roadblocks and I'll be forced to drop the test. When I'm finished I'll throw a link to this list, probably Friday. Merlin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] ext3 journalling type
The ext3fs allows to selet type of journalling to be used with filesystem. Journalling pretty much "mirrors" the work of WAL logging by PostgreSQL... I wonder which type of journalling is best for PgSQL in terms of performance. Choices include: journal All data is committed into the journal prior to being written into the main file system. ordered This is the default mode. All data is forced directly out to the main file system prior to its metadata being committed to the journal. writeback Data ordering is not preserved - data may be written into the main file system after its metadata has been commit- ted to the journal. This is rumoured to be the highest- throughput option. It guarantees internal file system integrity, however it can allow old data to appear in files after a crash and journal recovery. Am I right to assume that "writeback" is both fastest and at the same time as safe to use as ordered? Maybe any of you did some benchmarks? Regards, Dawid ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Postgresql is using seqscan when is should use indexes.
On Mon, 8 Nov 2004 09:40 pm, Andreas Ãkre Solberg wrote: > We have two tables, dst_port_hour and dst_port_day, which should be > very similar, they both have about 50.000.000 rows. In both tables we > have an index for period_id. > > We run postgresql 7.4.5 on a dedicated Debian server, with dual Intel > Xeon 3GHz and 4GB memory. > > The problem is that on the dst_port_day table, postgresql is using > seqscan, and not the index when it should. Forcing the use of the index > by setting enable_seqscan to false, makes the query lighthening fast. > When using seqscan, the query takes several minutes. The planner > calculates the cost for Index scan to be much more than sequence scan. > > Why is our query planner misbehaving? > > Here are the exaplain analyze output with and without index-force: > > > SET enable_seqscan=false; > > stager=> explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE > cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC > LIMIT 5; > dst_port_day cur (cost=0.00..2019931.14 rows=546150 width=12) (actual time=0.038..303.801 rows=48072 loops=1) The guess of the number of rows returned by the index scan is out by a factor of 10. 500k rows is greater than 1% of the rows, so I think the planner is likely to choose a sequence scan at this amount, unless you have tuned things like random page cost. What is the selectivity like on that column? Have you analyzed recently? If so, you should probably increase the statistics on that column See ALTER TABLE SET STATISTICS in the manual. > QUERY PLAN > - > Limit (cost=2022664.62..2022664.63 rows=5 width=12) (actual > time=831.772..831.816 rows=5 loops=1) > -> Sort (cost=2022664.62..2022664.82 rows=80 width=12) (actual > time=831.761..831.774 rows=5 loops=1) > Sort Key: sum(octets) > -> HashAggregate (cost=2022661.89..2022662.09 rows=80 width=12) > (actual time=587.036..663.991 rows=16396 loops=1) > -> Index Scan using dst_port_day_period_id_key on > dst_port_day cur (cost=0.00..2019931.14 rows=546150 width=12) (actual > time=0.038..303.801 rows=48072 loops=1) > Index Cond: (period_id = 2779) > Total runtime: 836.362 ms > (7 rows) > > > > SET enable_seqscan=true; > > stager=> explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE > cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC > LIMIT 5; > > QUERY PLAN > -- > Limit (cost=1209426.88..1209426.89 rows=5 width=12) (actual > time=299053.006..299053.053 rows=5 loops=1) > -> Sort (cost=1209426.88..1209427.08 rows=80 width=12) (actual > time=299052.995..299053.008 rows=5 loops=1) > Sort Key: sum(octets) > -> HashAggregate (cost=1209424.15..1209424.35 rows=80 width=12) > (actual time=298803.273..298881.020 rows=16396 loops=1) > -> Seq Scan on dst_port_day cur (cost=0.00..1206693.40 > rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072 loops=1) > Filter: (period_id = 2779) > Total runtime: 299057.643 ms > (7 rows) > Regards Russell Smith ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Better Hardware, worst Results
Em Qui, 2004-11-04 às 20:58, Rod Taylor escreveu: > All 3 plans have crappy estimates. > > Run ANALYZE in production, then send another explain analyze (as an > attachment please, to avoid linewrap). First of all, I'd like to apoligize for taking so long to post a new position. After this, I apologize again because the problem was in my query. It used some functions that for some reason made the Dell machine have a greater cost than our house-made machine. After correcting this functions, the results were faster in the Dell machine. The last apologize is for the linewrapped explains. In our brazilian PostgreSQL mailing list, attachments are not allowed, so I send them as inline text. Thanks to everyone who spent some time to help me solving this problem. -- +---+ | Alvaro Nunes MeloAtua Sistemas de Informacao | | [EMAIL PROTECTED]www.atua.com.br | |UIN - 42722678(54) 327-1044| +---+ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgresql amd-64
Iain wrote: I'm hoping I'll have the opportunity to build a similar machine soon and am wondering about the choice of 64 bit distributions. Gentoo is obviously a possibility but I'm also condsidering Debian. There is also a 64 compile of redhat sources somewhere around, but I can't remember what they call it offhand. RedHat's community OS is now called Fedora: http://fedora.redhat.com/ There's been two AMD64 releases of this OS, Fedora Core 1 and Fedora Core 2. Core 3 is just around the corner. I've been running FC2 x86_64 with kernel 2.6 as a desktop system for quite some time now, with PostgreSQL 7.4.2 / 64bit installed. I find Fedora to be a really good Linux distro, continuing and improving upon the fine tradition of RedHat's releases. You can also get RedHat's commercial releases on AMD64; according to http://www.redhat.com/software/rhel/features/ you can also get a EM64T release. If anyone has opinions about that, I'd be happy to hear. -- Radu-Adrian Popescu CSA, DBA, Developer Aldrapay MD Aldratech Ltd. +40213212243 smime.p7s Description: S/MIME Cryptographic Signature
[PERFORM] Postgresql is using seqscan when is should use indexes.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 We have two tables, dst_port_hour and dst_port_day, which should be very similar, they both have about 50.000.000 rows. In both tables we have an index for period_id. We run postgresql 7.4.5 on a dedicated Debian server, with dual Intel Xeon 3GHz and 4GB memory. The problem is that on the dst_port_day table, postgresql is using seqscan, and not the index when it should. Forcing the use of the index by setting enable_seqscan to false, makes the query lighthening fast. When using seqscan, the query takes several minutes. The planner calculates the cost for Index scan to be much more than sequence scan. Why is our query planner misbehaving? Here are the exaplain analyze output with and without index-force: SET enable_seqscan=false; stager=> explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC LIMIT 5; QUERY PLAN - - - - Limit (cost=2022664.62..2022664.63 rows=5 width=12) (actual time=831.772..831.816 rows=5 loops=1) -> Sort (cost=2022664.62..2022664.82 rows=80 width=12) (actual time=831.761..831.774 rows=5 loops=1) Sort Key: sum(octets) -> HashAggregate (cost=2022661.89..2022662.09 rows=80 width=12) (actual time=587.036..663.991 rows=16396 loops=1) -> Index Scan using dst_port_day_period_id_key on dst_port_day cur (cost=0.00..2019931.14 rows=546150 width=12) (actual time=0.038..303.801 rows=48072 loops=1) Index Cond: (period_id = 2779) Total runtime: 836.362 ms (7 rows) SET enable_seqscan=true; stager=> explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE cur.period_id='2779' GROUP BY cur.portnr ORDER BY SUM(cur.octets) DESC LIMIT 5; QUERY PLAN - - - -- Limit (cost=1209426.88..1209426.89 rows=5 width=12) (actual time=299053.006..299053.053 rows=5 loops=1) -> Sort (cost=1209426.88..1209427.08 rows=80 width=12) (actual time=299052.995..299053.008 rows=5 loops=1) Sort Key: sum(octets) -> HashAggregate (cost=1209424.15..1209424.35 rows=80 width=12) (actual time=298803.273..298881.020 rows=16396 loops=1) -> Seq Scan on dst_port_day cur (cost=0.00..1206693.40 rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072 loops=1) Filter: (period_id = 2779) Total runtime: 299057.643 ms (7 rows) - -- Andreas Ãkre Solberg, UNINETT AS Testnett Contact info and Public PGP Key available on: http://andreas.solweb.no/?account=Work -BEGIN PGP SIGNATURE- Version: PGP 8.1 Comment: My public key is available at http://andreas.solweb.no iQA/AwUBQY9NBPyFPYEtpdl2EQKIcwCgpPEkZ3PQKWNf6JWP6tQ4eFBPEngAoKTT 4eGkB0NVyIg0surd1LJdFD7+ =bYtH -END PGP SIGNATURE- ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match