Re: [PERFORM] Tunning FreeeBSD and PostgreSQL
On Monday 14 Jul 2003 3:31 pm, Stephen Howie wrote: [snip] My problem is that I have not totally put my head around the concepts of the shmmax, shmmaxpgs, etc As it pertains to my current setup and the shared mem values in postgresql.conf. I'm looking for a good rule of thumb when approaching this. Any help or direction would be greatly appreciated. There are two articles recently posted here: http://www.varlena.com/GeneralBits/ They should provide a good start. -- Richard Huxton ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] factoring problem with view in 7.3.3
On Wednesday 23 July 2003 11:21, Rajesh Kumar Mallah wrote: Hi , I have a view which is a union of select of certain feilds from indentical tables. The problem is when we query a column on which index exists exists foreach of the tables does not use the indexes. But when we query individual tables it uses indexes. tradein_clients=# create view sent_enquiry_eyp_iid_ip_cat1 as select rfi_id,sender_uid,receiver_uid,subject,generated from eyp_rfi UNION select rfi_id,sender_uid,receiver_uid,subject,generated from iid_rfi UNION select rfi_id,sender_uid,receiver_uid,subject,generated from ip_rfi UNION select rfi_id,sender_uid,receiver_uid,subject,generated from catalog_rfi ; CREATE VIEW tradein_clients=# tradein_clients=# explain analyze select rfi_id from sent_enquiry_eyp_iid_ip_cat1 where sender_uid = 34866; [snip query plan showing full selects being done and then filtering on the outputs] I do remember some talk about issues with pushing where clauses down into unions on a view (sorry - can't remember when - maybe check the archives). Actually, I thought work had been done on that for 7.3.3, but it might have been 7.4 If you generally do that particular query (checking agains sender_uid) then the simplest solution is to build an SQL query to push the comparison down for you: CREATE my_function(int4) RETURNS SETOF my_type AS ' SELECT ... FROM eyp_rfi WHERE sender_uid = $1 UNION ...etc... ' LANGUAGE 'SQL'; Note that you may get an error about an operator =$ if you miss the spaces around the =. HTH -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] PostgreSQL performance problem - tuning
On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote: Hi All! First, thanks for answers! Richard Huxton wrote: On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote: IIRC there is a limit on filesystem cache on freeBSD. 300MB by default. If that is the case, you might have to raise it to make effective_cache_size really effective.. Try various sysctls says nothing for me. I want use *all available RAM* (of course, without needed for OS use) for PostgreSQL. PG will be using the OS' disk caching. I think all applications using OS disk caching. ;) Or you want to say that PostgreSQL tuned for using OS-specific cache implementation? Do you know method for examining real size of OS filesystem cache? If I understood right, PostgreSQL dynamically use all available RAM minus shared_buffers minus k * sort_mem minus effective_cache_size? I want configure PostgreSQL for using _maximum_ of available RAM. PG's memory use can be split into four areas (note - I'm not a developer so this could be wrong). 1. Shared memory - vital so that different connections can communicate with each other. Shouldn't be too large, otherwise PG spends too long managing its shared memory rather than working on your queries. 2. Sort memory - If you have to sort results during a query it will use up to the amount you define in sort_mem and then use disk if it needs any more. This is for each sort. 3. Results memory - If you're returning 8000 rows then PG will assemble these and send them to the client which also needs space to store the 8000 rows. 4. Working memory - to actually run the queries - stack and heap space to keep track of its calculations etc. Your best bet is to start off with some smallish reasonable values and step them up gradually until you don't see any improvement. What is vital is that the OS can cache enough disk-space to keep all your commonly used tables and indexes in memory - if it can't then you'll see performance drop rapidly as PG has to keep accessing the disk. For the moment, I'd leave the settings roughly where they are while we look at the query, then once that's out of the way we can fine-tune the settings. [snip suggestion to break the query down] Yes, you're right. I've tested a few statements and obtain interesting results. SELECT * FROM v_file02wide WHERE... executes about 34 seconds. SELECT showcalc(...); executes from 0.7 seconds (without recursion) up to 6.3 seconds if recursion is used! :( This mean, that approximate execute time for fully qualified SELECT with about 8K rows is... about 13 hours! :-O Hmm - not good. Hence, problem is in my function showcalc: That's certainly the place to start, although we might be able to do something with v_file02wide later. CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4), NUMERIC(16)) RETURNS NUMERIC(16) LANGUAGE SQL STABLE AS ' -- Parameters: code, dd, r020, t071 SELECT COALESCE( (SELECT sc.koef * $4 FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = $1 AND NOT SUBSTR(acc_mask, 1, 1) = ''['' AND SUBSTR(acc_mask, 1, 4) = $3 AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)), (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, LENGTH(acc_mask) - 2), $2, $3, $4), 0)) FROM showing AS s NATURAL JOIN showcomp AS sc WHERE s.kod = $1 AND SUBSTR(acc_mask, 1, 1) = ''[''), 0) AS showing; '; BTW, cross join , with WHERE clause don't improve performance relative to NATURAL JOIN. Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)), used for indexing, showcalc executes about 16 seconds. With function SUBSTR the same showcalc executes 6 seconds. Fair enough - substr should be fairly efficient. [snip explanation of table structures and usage] I'm not going to claim I understood everything in your explanation, but there are a couple of things I can suggest. However, before you go and do any of that, can I ask you to post an EXPLAIN ANALYSE of two calls to your showcalc() function (once for a simple account, once for one with recursion)? You'll need to cut and paste the query as standard SQL since the explain won't look inside the function body. OK - bear in mind that these suggestions are made without the benefit of the explain analyse: 1. You could try splitting out the various tags of your mask into different fields - that will instantly eliminate all the substr() calls and might make a difference. If you want to keep the mask for display purposes, we could build a trigger to keep it in sync with the separate flags. 2. Use a calculations table and build your results step by step. So - calculate all
Re: [PERFORM] PostgreSQL performance problem - tuning
On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote: IIRC there is a limit on filesystem cache on freeBSD. 300MB by default. If that is the case, you might have to raise it to make effective_cache_size really effective.. Try various sysctls says nothing for me. I want use *all available RAM* (of course, without needed for OS use) for PostgreSQL. PG will be using the OS' disk caching. While idle time top says: Mem: 14M Active, 1944K Inact, 28M Wired, 436K Cache, 48M Buf, 331M Free Swap: 368M Total, 17M Used, 352M Free, 4% Inuse After 1 minute of EXPLAIN ANALYZE SELECT SUM(showcalc('B00204', dd, r020, t071)) FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980; executing: Mem: 64M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 221M Free Swap: 368M Total, 3192K Used, 365M Free PID USERNAMEPRI NICE SIZERES STATETIME WCPUCPU COMMAND 59063 postgres 49 0 65560K 55492K RUN 1:06 94.93% 94.63% postgres After 12 minutes of query executing: Mem: 71M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 214M Free Swap: 368M Total, 3192K Used, 365M Free PID USERNAMEPRI NICE SIZERES STATETIME WCPUCPU COMMAND 59063 postgres 56 0 73752K 62996K RUN 12:01 99.02% 99.02% postgres I suspect that swap-file size is too small for my query... but query isn't too large, about 8K rows only. :-| Looks fine - PG isn't growing too large and your swap usage seems steady. We can try upping the sort memory later, but given the amount of data you're dealing with I'd guess 64MB should be fine. I think we're going to have to break the query down a little and see where the issue is. What's the situation with: EXPLAIN ANALYZE SELECT some_field FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980; and: EXPLAIN ANALYZE SELECT SUM(showcalc(parameters)) FROM something simple Hopefully one of these will run in a reasonable time, and the other will not. Then we can examine the slow query in more detail. Nothing from your previous EXPLAIN (email of yesterday 13:42) looks unreasonable but something must be going wild in the heart of the query, otherwise you wouldn't be here. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL performance problem - tuning
is to get those two configuration settings somewhere sane, then we can tune properly. You might like the document at: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance problems on a fairly big table with two key columns.
On Thursday 04 September 2003 23:53, Rasmus Aveskogh wrote: Hi, I have a table that looks like this: DATA ID TIME |--||--| The table holds app. 14M rows now and grows by app. 350k rows a day. The ID-column holds about 1500 unique values (integer). The TIME-columns is of type timestamp without timezone. I have one index (b-tree) on the ID-column and one index (b-tree) on the time-column. My queries most often look like this: SELECT DATA FROM tbl WHERE ID = 1 AND TIME now() - '1 day'::interval; [snip] I tried applying a multicolumn index on ID and TIME, but that one won't even be used (after ANALYZE). The problem is likely to be that the parser isn't spotting that now()-'1 day' is constant. Try an explicit time and see if the index is used. If so, you can write a wrapper function for your expression (mark it STABLE so the planner knows it won't change during the statement). Alternatively, you can do the calculation in the application and use an explicit time. HTH -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [GENERAL] Seq scan of table?
On Friday 05 September 2003 09:47, Bjorn T Johansen wrote: I think I have found out why.. I have a where clause on a ID field but it seems like I need to cast this integer to the same integer as the field is defined in the table, else it will do a tablescan. Is this assumtion correct? And if it is, do I then need to change all my sql's to cast the where clause where I just have a number (eg where field = 1) to force the planner to use index scan instead of seq scan? PG's parser will assume an explicit number is an int4 - if you need an int8 etc you'll need to cast it, yes. You should find plenty of discussion of why in the archives, but the short reason is that PG's type structure is quite flexible which means it can't afford to make too many assumptions. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [GENERAL] Seq scan of table?
On Friday 05 September 2003 19:20, Neil Conway wrote: On Fri, 2003-09-05 at 06:07, Richard Huxton wrote: PG's parser will assume an explicit number is an int4 - if you need an int8 etc you'll need to cast it, yes. Or enclose the integer literal in single quotes. You should find plenty of discussion of why in the archives, but the short reason is that PG's type structure is quite flexible which means it can't afford to make too many assumptions. Well, it's definitely a bug in PG, it's quite flexible type structure notwithstanding. It certainly catches out a lot of people. I'd guess it's in the top three issues in the general/sql lists. I'd guess part of the problem is it's so silent. In some ways it would be better to issue a NOTICE every time a typecast is forced in a comparison - irritating as that would be. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] pgsql inserts problem
On Wednesday 27 August 2003 13:50, Tarhon-Onu Victor wrote: shared_buffers = 520 max_locks_per_transaction = 128 wal_buffers = 8 max_fsm_relations = 3 max_fsm_pages = 482000 sort_mem = 131072 vacuum_mem = 131072 effective_cache_size = 1 random_page_cost = 2 Slightly off-topic, but I think your tuning settings are a bit out. You've got 4MB allocated to shared_buffers but 128MB allocated to sort_mem? And only 80MB to effective_cache_size? Your settings might be right, but you'd need a very strange set of circumstances. As for PG silently discarding inserts, your best bet might be to write a short Perl script to reproduce the problem. Without that, people are likely to be sceptical - if PG tended to do this sort of thing, none of us would use it. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Need advice about triggers
On Tuesday 09 September 2003 13:40, Mindaugas Riauba wrote: Hello, I have small table (up to 1 rows) and every row will be updated once per minute. Table also has before update on each row trigger written in plpgsql. But trigger 99.99% of the time will do nothing to the database. It will just compare old and new values in the row and those values almost always will be identical. Now I tried simple test and was able to do 1 updates on 1000 rows table in ~30s. That's practically enough but I'd like to have more room to slow down. Also best result I achieved by doing commit+vacuum every ~500 updates. How can I improve performance and will version 7.4 bring something valuable for my task? Rewrite to some other scripting language is not a problem. Trigger is simple enough. Well, try it without the trigger. If performance improves markedly, it might be worth rewriting in C. If not, you're probably saturating the disk I/O - using iostat/vmstat will let you see what's happening. If it is your disks, you might see if moving the WAL onto a separate drive would help, or check the archives for plenty of discussion about raid setups. Postgres v7.3.4, shared_buffers=4096 max_fsm settings also bumped up 10 times. Well effective_cache_size is useful for reads, but won't help with writing. You might want to look at wal_buffers and see if increasing that helps, but I couldn't say for sure. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] A Basic Question
On Friday 03 October 2003 07:34, [EMAIL PROTECTED] wrote: 12:28p Dear All, This question is regarding the performance of queries in general. The performance of the queries wud varying depending on the no. Of tuples it is returning, and the sort of alogorithm that will be implemented or the retrieval. Now if the relation returns zero tuples.. (the seq, and the index scan is the best option) and if there are 1 or more then rest PG-supported scans will be the best. Now here is where I am having a bit of considerations. My relation works fast, when it returns more than on tuple. But get's slow when it returns zero tuple. Now how shud I got abt it. If PG has to examine a lot of tuples to rule them out, then returning no rows can take longer. If you post EXPLAIN ANALYSE output for both queries, someone will be able to explain why in your case. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [SQL] sql performance and cache
On Saturday 11 October 2003 10:43, Chris Faulkner wrote: Hello all I have two very similar queries which I need to execute. They both have exactly the same from / where conditions. When I execute the first, it takes about 16 seconds. The second is executed almost immediately after, it takes 13 seconds. In short, I'd like to know why the query result isn't being cached and any ideas on how to improve the execution. The short answer is that PG doesn't cache query results. The only way it could do so safely is to lock all tables you access to make sure that no other process changes them. That would effectively turn PG into a single-user DB in short notice. The first query attempts to find the maximum size of an array in the result set- the field is called level. IT contains anything between 1 and 10 integers. I just need to know what the largest size is. I do this to find out the maximum size of the level array. max(replace(split_part(array_dims(level),':',2),']','')::int) I know this is big and ugly but is there any better way of doing it ? The second query just returns the result set - it has exactly the same FROM/Where clause. I assume these two queries are linked? If you rely on the max size being unchanged and have more than one process using the database, you should make sure you lock the rows in question. OK - so I could execute the query once, and get the maximum size of the array and the result set in one. I know what I am doing is less than optimal but I had expected the query results to be cached. So the second execution would be very quick. So why aren't they ? I have increased my cache size - shared_buffers is 2000 and I have doubled the default max_fsm... settings (although I am not sure what they do). sort_mem is 8192. PG will cache the underlying data, but not the results. The values you are changing are used to hold table/index rows etc. This means the second query shouldn't need to access the disk if the rows it requires are cached. There is a discussion of the postgresql.conf file and how to tune it at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN ANALYSE of either/both queries to the performance list. I'd drop the sql list when we're just talking about performance. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [SQL] sql performance and cache
On Saturday 11 October 2003 12:12, Chris Faulkner wrote: Hello Thanks for the reply. The short answer is that PG doesn't cache query results. The only way it could do so safely is to lock all tables you access to make sure that no other process changes them. That would effectively turn PG into a single-user DB in short notice. I am not sure I agree with you. I have done similar things with Oracle and found that the second query will execute much more quickly than the first. It could be made to work in at least two scenarios I'm guessing because the underlying rows and perhaps the plan are cached, rather than the results. If you cached the results of the first query you'd only have the max length, not your other data anyway. [snip] I assume these two queries are linked? If you rely on the max size being unchanged and have more than one process using the database, you should make sure you lock the rows in question. I can rely on the max size remaining the same. As I mentioned above, the tables are entirely read only. The data will not be updated or deleted by anyone - I don't need to worry about that. The data will be updated en masse once every 3 months. Hmm - might be worth adding a column for your array length and pre-calculating if your data is basically static. There is a discussion of the postgresql.conf file and how to tune it at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php Thanks for that. Given the explain attached, 16 secs seems slow. Could you post an EXPLAIN ANALYSE of either/both queries to the performance list. I'd drop the sql list when we're just talking about performance. To be honest, my main concern was about the cache. If the second one could use a cache amd execute in 2 seconds, that would be better that reducing the execution of each individual query by 30% or so. I'm puzzled as to why they aren't both below 2 seconds to start with - you're not dealing with that many rows. Thanks for the offer of help on this one. explain analyze gives me the same as the last message - did you want verbose ? Nope, this is what I need. Verbose prints pages of stuff that only the developers would be interested in. This one actually runs the query and gives you a second set of figures showing times. Nested Loop (cost=0.00..147.11 rows=1 width=148) (actual time=84.00..12323.00 rows=67 loops=1) Join Filter: (inner.GEOM_ID_OF_POINT = outer.POINT_ID) - Index Scan using gidx_oscar_point on oscar_point p (cost=0.00..61.34 rows=1 width=57) (actual time=0.00..9.00 rows=67 loops=1) Index Cond: (wkb_geometry 'SRID=-1;BOX3D(529540 179658.88 0,530540 1 80307.12 0)'::geometry) Filter: (((TILE_REF = 'TQ27NE'::bpchar) OR (TILE_REF = 'TQ28SE'::bp char) OR (TILE_REF = 'TQ37NW'::bpchar) OR (TILE_REF = 'TQ38SW'::bpchar)) AND (FEAT_CODE = 3500)) This next bit is the issue. It's joining on TILE_REF and then filtering by your three static values. That's taking 67 * 150ms = 10.05secs - Index Scan using idx_on_tile_ref on oscar_node n (cost=0.00..85.74 rows=2 width=91) (actual time=0.06..150.07 rows=4797 loops=67) Index Cond: (n.TILE_REF = outer.TILE_REF) Filter: ((TILE_REF = 'TQ27NE'::bpchar) OR (TILE_REF = 'TQ28SE'::bpchar) OR (TILE_REF = 'TQ37NW'::bpchar) OR (TILE_REF = 'TQ38SW'::bpchar)) Now if you look at the first set of figures, it's estimating 2 rows rather than the 4797 you're actually getting. That's probably why it's chosen to join then filter rather than the other way around. I'd suggest the following: 1. VACUUM FULL on the table in question if you haven't done so since the last update/reload. If you aren't doing this after every bulk upload, you probably should be. 2. VACUUM ANALYSE/ANALYSE the table. 3. Check the tuning document I mentioned and make sure your settings are at least reasonable. They don't have to be perfect - that last 10% takes forever, but if they are badly wrong it can cripple you. 4. PG should now have up-to-date stats and a reasonable set of config settings. If it's still getting its row estimates wrong, we'll have to look at the statistics its got. If we reach the statistics tinkering stage, it might be better to wait til Monday if you can - more people on the list then. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tuning for mid-size server
On Tuesday 21 October 2003 15:28, Anjan Dave wrote: Hi, Pretty soon, a PowerEdge 6650 with 4 x 2Ghz XEONs, and 8GB Memory, with internal drives on RAID5 will be delivered. Postgres will be from RH8.0. You'll want to upgrade PG to v7.3.4 I am planning for these values for the postgres configuration - to begin with: Shared_buffers (25% of RAM / 8KB)) = 8589934592 * .25 / 8192 = 262144 Sort_mem (4% of RAM / 1KB) = 335544. We'll take about half of that - 167772 Effective_cache_size = 262144 (same as shared_buffers - 25%) My instincts would be to lower the first two substantially, and increase the effective cache once you know load levels. I'd probably start with something like the values below and work up: shared_buffers = 8,000 - 10,000 (PG is happier letting the OS do the cacheing) sort_mem = 4,000 - 8,000 (don't forget this is for each sort) You'll find the annotated postgresql.conf and performance tuning articles useful: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php In a generic sense, these are recommended values I found in some documents. The database will be small in size and will gradually grow over time from few thousands to a few million records, or more. The activity will be mostly of select statements from a few tables with joins, orderby, groupby clauses. The web application is based on Apache/Resin and hotspot JVM 1.4.0. You'll need to figure out how many concurrent users you'll have and how much memory will be required by apache/java. If your database grows radically, you'll probably want to re-tune as it grows. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Problem with insert into select...
On Thursday 20 November 2003 21:04, stephen farrell wrote: I'm having a problem with a queyr like: INSERT INTO FACT (x,x,x,x,x,x) SELECT a.key,b.key,c.key,d.key,e.key,f.key from x,a,b,c,d,e,f where x=a and x=b -- postgres7.4 is running out of memory. When this has happened to me it's always been because I've got an unconstrained join due to pilot error. Try an EXPLAIN on the select part and see if that pops up anything. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] expression (functional) index use in joins
On Wednesday 26 November 2003 16:38, Roger Ging wrote: I just installed v7.4 and restored a database from v7.3.4. [snip] Hmm - you seem to be getting different row estimates in the plan. Can you re-analyse both versions and post EXPLAIN ANALYSE rather than just EXPLAIN? - Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) planner results on 7.3.4: - Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3209.16 rows=870 width=20) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Followup - expression (functional) index use in joins
On Wednesday 26 November 2003 18:39, Roger Ging wrote: version 7.4 results: explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; - Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1) The estimated number of rows here (4335) is *way* off (173998 actually). If you only had 4335 rows, then this might be a more sensible plan. First step is to run: VACUUM ANALYSE program; Then, check the definition of your function fn_mri_id_no_program() and make sure it is marked immutable/stable (depending on what it does) and that it's returning a varchar. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] tuning questions
On Thursday 04 December 2003 19:50, Jack Coates wrote: I'm trying to set Postgres's shared memory usage in a fashion that allows it to return requested results quickly. Unfortunately, none of these changes allow PG to use more than a little under 300M RAM. vacuumdb --analyze is now taking an inordinate amount of time as well (40 minutes and counting), so that change needs to be rolled back. You don't want PG to use all your RAM, it's designed to let the underlying OS do a lot of caching for it. Probably worth having a look at vmstat/iostat and see if it's saturating on I/O. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Thursday 04 December 2003 19:59, William Yu wrote: Ivar Zarans wrote: I am experiencing strange behaviour, where simple UPDATE of one field is very slow, compared to INSERT into table with multiple indexes. I have two tables - one with raw data records (about 24000), where one field In Postgres and any other DB that uses MVCC (multi-version concurrency), UPDATES will always be slower than INSERTS. With MVCC, what the DB does is makes a copy of the record, updates that record and then invalidates the previous record. [snip] Yes, but he's seeing 0.25secs to update one row - that's something odd. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] tuning questions
On Thursday 04 December 2003 23:16, Jack Coates wrote: effective_cache_size = 1 This is way the heck too low. it's supposed to be the size of all available RAM; I'd set it to 2GB*65% as a start. This makes a little bit of difference. I set it to 65% (15869 pages). That's still only about 127MB (15869 * 8KB). Now we have some real disk IO: procs memoryswap io system cpu r b w swpd free buff cache si sobibo incs us sy id 0 3 1 2804 10740 40808 1899856 0 0 26624 0 941 4144 According to this your cache is currently 1,899,856 KB which in 8KB blocks is 237,482 - be frugal and say effective_cache_size = 20 (or even 15 if the trace above isn't typical). -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Slow UPADTE, compared to INSERT
On Friday 05 December 2003 02:07, Ivar Zarans wrote: I have played around with explain and explain analyze and noticed one interesting oddity: [snip] Why first example, where recid is given as numeric constant, is using sequential scan, but second example, where recid is given as string constant works with index scan, as expected? Third example shows, that numeric constant must be typecasted in order to function properly. Is this normal behaviour of fields with bigint type? As Christopher says, normal (albeit irritating). Not sure it applies here - all the examples you've shown me are using the index. Well - I must admit I'm stumped. Unless you have a *lot* of indexes and foreign keys to check, I can't see why it would take so long to update a single row. Can you post the schema for the table? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested loop question
On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote: Hi- I'm trying to optimize a query that I *think* should run very fast. Essentially, I'm joining two tables that have very selective indexes and constraining the query on an indexed field. (There's a third small lookup table in the mix, but it doesn't really affect the bottom line.) I'm unsure what is happening next. I notice that an index scan is occurring on actor_summary_pk, with an actual time of 9.15, but then it looks like a nested loop occurs at the next level to join these tables. Does this mean that each probe of the actor_summary index will take 9.15 msec, but the nested loop is going to do this once for each actor_id? That's right - you need to multiply the actual time by the number of loops. In your case this would seem to be about 33 seconds. - Index Scan using actor_summary_pk on actor_summary (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15 rows=1 loops=3639) Index Cond: (outer.actor_id = actor_summary.actor_id) The nested loop appears to be where most of my time is going, so I'm focusing on this area, but don't know if there is a better approach to this join. Is there a more efficient means than a nested loop to handle such a join? Would a different method be chosen if there was exactly one row in actor_summary for every row in actor? Hmm - tricky to say in your case. PG has decided to filter on actor then look up the corresponding values in actor_summary. Given that you have 3 million rows in both tables that seems a reasonable approach. You could always try forcing different plans by switching the various ENABLE_HASHJOIN etc options (see the runtime configuration section of the manuals). I'm not sure that will help you here though. The fact that it's taking you 9ms to do each index lookup suggests to me that it's going to disk each time. Does that sound plausible, or do you think you have enough RAM to cache your large indexes? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow query problem
On Friday 09 January 2004 07:29, Dennis Björklund wrote: On Thu, 8 Jan 2004, Bradley Tate wrote: select invheadref, invprodref, sum(units) from invtran group by invheadref, invprodref For the above query, shouldn't you have one index for both columns (invheadref, invprodref). Then it should not need to sort at all to do the grouping and it should all be fast. Not sure if that would make a difference here, since the whole table is being read. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] freebsd 5.2 and max_connections
On Tuesday 13 January 2004 16:04, David Hill wrote: Hello - I am using postgresql to hold aliases, users, and relay_domains for postfix and courier to do lookups from. I am not storing mail in sql. I need postgresql to have fast read performance, so i setup index's on the tables. Also, the queries are basically select blah from table where domain='domain.com';, so i dont need to be able to support large results. I will have a lot of mail servers connecting to this postgresql db, so i need to support a lot of connections... but dont need to support large results. Firstly - if you don't know about the tuning guidelines/annotated config file, you should go here: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php Hmm - small result sets accessed directly via indexed fields, so sort_mem probably isn't important to you. Make sure your effective cache setting is accurate though, so PG can estimate whether it'll need to access the disks. Not sure if clustering one or more tables will help - I'm guessing not. What might help is to increase the statistics gathered on important columns. That should give the planner a more accurate estimate of value distribution and shouldn't cost you too much to keep accurate, since I'm guessing a low rate of updating. You might want to play with the random page cost (?or is it random access cost?) but more RAM for a bigger disk cache is probably the simplest tweak. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Trigger question
On Thursday 15 January 2004 13:13, pginfo wrote: Hi, I am using pg 7.4.1 and have created a trigger over table with 3 M rows. If I start masive update on this table, pg executes this trigger on every row and dramaticaly slows the system. Exists in pg any way to define the trigger execution only if I have changes on some fields? Not at the moment (and I don't know of any plans for it). For example I am able to declare this in oracle. My trigger is writen in pgSQL. Hmm - I can only think of two things you can try: 1. check for the change first thing you do and exit if not there 2. do the same, but write the trigger function in 'C' -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] shared_buffer value
On Thursday 15 January 2004 22:49, Anjan Dave wrote: Gurus, I have defined the following values on a db: shared_buffers = 10240 # 10240 = 80MB max_connections = 100 sort_mem = 1024 # 1024KB is 1MB per operation effective_cache_size = 262144 # equals to 2GB for 8k pages Rest of the values are unchanged from default. The poweredge 2650 machine has 4GB RAM, and the size of the database (size of 'data' folder) is about 5GB. PG is 7.4, RH9. OK - settings don't look unreasonable so far. The machine has been getting quite busy (when, say, 50 students login at the same time, when others have logged in already) and is maxing out at 100 connections (will increase this tonight probably to 200). We have been getting too many clients message upon trying to connect. Once connected, the pgmonitor, and the 'pg_stat_activity' show connections reaching about 100. There's a series of SELECT and UPDATE statements that get called for when a group of users log in simultaneously...and for some reason, many of them stay there for a while... During that time, if i do a 'top', i can see multiple postmaster processes, each about 87MB in size. The Memory utilization drops down to about 30MB free, and i can see a little bit of swap utilization in vmstat then. On linux you'll see three values: SIZE, RSS and SHARE. SIZE is what you're looking at, RSS is resident set size (it's in main memory) and SHARE is how much is shared with other processes. So - 3 processes each with RSS=15MB, SIZE=10MB take up 10+5+5+5 = 25MB. Don't worry about a tiny bit of swap - how is your buff/cache doing then? Should i decrease the buffer value to about 50MB and monitor? That shared_buffer is between all backends. The sort_mem however, is *per sort*, not even per backend. So - if a complicated query uses four sorts you could use 4MB in one backend. Interestingly, at one point, we vacuumed the database, and the size reported by 'df -k' on the pgsql slice dropped very significantly...guess, it had been using a lot of temp files? You need to run VACUUM regularly to reclaim unused space. Since you're on 7.4, take a look at the pg_autovacuum utility, or start by running VACUUM ANALYZE from a cron job every evening. Perhaps a VACUUM FULL at weekends? Further steps will be to add more memory, and possibly drop/recreate a couple of indexes that are used in the UPDATE statements. A REINDEX might be worthwhile. Details on this and VACUUM in the manuals. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Trigger question
On Tuesday 20 January 2004 00:01, Neil Conway wrote: Harald Fuchs [EMAIL PROTECTED] writes: Does anyone know how to access the affected values for statement-level triggers? I mean what the old and new pseudo-records are for row-level triggers. Yeah, I didn't get around to implementing that. If anyone wants this feature, I'd encourage them to step up to the plate -- I'm not sure when I'll get the opportunity/motivation to implement this myself. I didn't think they'd be meaningful for a statement-level trigger. Surely OLD/NEW are by definition row-level details. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Trigger question
On Tuesday 20 January 2004 16:42, Tom Lane wrote: Harald Fuchs [EMAIL PROTECTED] writes: Why? If the underlying table has a primary key, finding corresponding pairs is trivial; if there isn't, it's impossible. Exactly. Nonetheless, the correspondence exists --- the UPDATE definitely updated some particular row of the OLD set into some particular one of the NEW set. If the trigger API makes it impossible to reconstruct the matchup, the API is broken. Perhaps they should be cursors? The only sensible way I can think of working with them would be: 1. count how many rows affected 2. step through one row at a time, doing something. I suppose there might be cases where you'd want to GROUP BY... which would mean you'd need some oid/row-id added to a real recordset. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Queries with timestamps
On Wednesday 21 January 2004 19:06, Josh Berkus wrote: Arnau, As the number of rows grows the time needed to execute this query takes longer. What'd I should do improve the performance of this query? Tip #1) add an index to the timestamp column Tip #2) make sure that you VACUUM and ANALYZE regularly Tip #3) You will get better performance if you pass the current_date - 1 month as a constant from the client instead of in the query. This is a known issue, expected to be fixed in 7.5. (I think Tip 3 is already fixed in 7.3, or I misunderstand what Josh is saying) Note that this is timestamp-related and not timestamp with time zone related. Most of the time you want the latter anyway. If you can use with time zones and drop the cast you might well find the index is being used... EXPLAIN ANALYSE SELECT * FROM log_core WHERE log_ts CURRENT_DATE - '1 week'::interval; QUERY PLAN - Index Scan using log_core_ts_idx on log_core (cost=0.00..18.73 rows=239 width=117) (actual time=0.79..0.79 rows=0 loops=1) Index Cond: (log_ts ((('now'::text)::date - '7 days'::interval))::timestamp with time zone) Total runtime: 1.03 msec (3 rows) It seems to help an accurate estimate of number-of-rows if you put an upper and lower limit in. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Explain plan for 2 column index
On Thursday 29 January 2004 19:29, [EMAIL PROTECTED] wrote: I have 2 columns index. The question is if optimizer can use both columns of an index or not, Should do. i.e. the plan should read like this: Index Cond: ((name)::text = 'name1'::text) AND ((date_from)::timestamp with time zone= ('now'::text)::timestamp(6) with time zone) Whilst I am getting index scan on first column and filter on the other: Index Scan using testtab_name_date_from on testtab (cost=0.00..2.01 rows=1 width=18) Index Cond: ((name)::text = 'name1'::text) Filter: ((date_from)::timestamp with time zone = ('now'::text)::timestamp(6)with time zone) Could the problem be timestamp column or timestamp with time zones? What types are the columns here? If date_from isn't timestamp with time zone, that might be the issue. Also, I'm not convinced timestamp is the same thing as timestamp(6) - why the different accuracies. Also, note that 'now' is deprecated - now() or CURRENT_TIMESTAMP/DATE/etc are preferred. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Bulk Record upload (was Re: [PERFORM] Increasing number of PG connections)
On Monday 02 February 2004 19:39, Qing Zhao wrote: I am new here. I have a question related to this in some way. Hmm - no real connection I can see - might have been better to start a new thread rather than replying to this one. Also, it is usually considered best practice not to quote large amounts of the previous message if you're not replying to it, Our web site needs to upload a large volume of data into Postgres at a time. The performance deterioates as number of rows becomes larger. When it reaches 2500 rows, it never come back to GUI. Since the tests were run through GUI, my suspision is that it might be caused by the way the application server talking to Postgres server, the connections, etc.. What might be the factors involved here? Does anyone know? You don't really give us enough information. What GUI are you talking about? How are you loading this data - as a series of INSERT statements, text-file with separators, from Access/MySQL etc? In general, the fastest way to add a large number of rows is via the COPY sql command. Next best is to batch your inserts together into larger transactions of say 100-1000 inserts. Two other things to be aware of are: use of VACUUM/ANALYZE and configuration tuning (see http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php). PG shouldn't have a problem with inserting a few thousand rows, so I suspect it's something to do with your application/GUI setup. Hope that helps, if not try turning on statement logging for PG and then we can see what commands your GUI is sending. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Database conversion woes...
On Tuesday 03 February 2004 16:42, Kevin Carpenter wrote: Thanks in advance, will give more detail - just looking for some open directions and maybe some kicks to fuel my thought in other areas. I've taken to doing a lot of my data manipulation (version conversions etc) in PG even if the final resting place is MySQL. It's generally not too difficult to transfer data but you will have problems with MySQL's more relaxed attitude to data types (things like all-zero timestamps). I tend to write a script to tidy the data before export, and repeatedly restore from backup until the script corrects all problems.Not sure how convenient that'll be with dozens of gigs of data. Might be practical to start with the smaller databases, let your script grow in capabilities before importing the larger ones. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Database conversion woes...
On Tuesday 03 February 2004 22:29, Kevin wrote: The mammoth replicator has been working well. I had tried the pgsql-r and had limited success with it, and dbmirror was just taking to long having to do 4 db transactions just to mirror one command. I have eserv but was never really a java kind of guy. When this is over and you've got the time, I don't suppose you could put together a few hundred words describing your experiences with the Mammoth replicator - there are a couple of places they could be posted. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Index Performance Help
On Thursday 05 February 2004 12:13, Damien Dougan wrote: Hi All, I've been seeing very slow read performance on a database of 1 million indexed subscribers, which I believe is nothing to do with the data itself, but delays on processing the index. If I make a random jump into the index (say X), it can take about 50ms to read the subscriber. If I then make a close by lookup (say X+10), it takes only about 0.5ms to read the subscriber. Making another lookup to a far away (say X+1000), it again takes about 50ms to read. The first time, it has to fetch a block from disk. The second time that disk block is already in RAM so it's much faster. The third time it needs a different disk block. Am I correct in my analysis? Is there anything I can do to improve the performance of the index lookups? Make sure you have enough RAM to buffer your disks. Buy faster disks. I've tried increasing the index memory and making a number of queries around the index range, but a stray of several hundred indexes from a cached entry always results in a major lookup delay. Yep, that'll be your disks. I've also increased the shared memory available to Postgres to 80MB incase this is a paging of the index, but it hasn't seemed to have any effect. Probably the wrong thing to do (although you don't mention what hardware you've got). Read the tuning document at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php Sample analyze output for an initial query: hydradb=# explain analyze select * from pvsubscriber where actorid = 'b3432-asdas-232-Subscriber793500'; ... - Index Scan using mc_actor_key on mc_actor (cost=0.00..4.08 rows=1 width=69) (actual time=39.497..39.499 rows=1 loops=1) ... Total runtime: 49.845 ms And the analyze output for a nearby subscriber (10 indexes away): hydradb=# explain analyze select * from pvsubscriber where actorid = 'b3432-asdas-232-Subscriber793510'; ... - Index Scan using mc_actor_key on mc_actor (cost=0.00..4.08 rows=1 width=69) (actual time=0.220..0.221 rows=1 loops=1) Total runtime: 0.428 ms (15 rows) That certainly seems to be the big change - the only way to consistently get 1ms timings is going to be to make sure all your data is cached. Try the tuning guide above and see what difference that makes. If that's no good, post again with details of your config settings, hardware, number of clients etc... -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cacheable stored functions?
On Friday 20 February 2004 15:35, Bill Moran wrote: I'm converting a SQL application to PostgreSQL. The majority of the logic in this application is in the stored functions in the database. Somewhere, I saw a reference to WITH (iscachable) for stored functions, looking again, I'm unable to find any reference to this directive. I have a single function that is _obviously_ safe to cache using this, and it generates no errors or problems that I can see. Now I'm looking at a lot of other functions that, if cached, would speed up performance considerably. Yet I'm reluctant to use this directive since I can't find documentation on it anywhere. From memory, iscachable was replaced in version 7.3 by the three finer-grained settings IMMUTABLE, STABLE, VOLATILE. I'm guessing the old behaviour is still there for backwards compatibility, but it's probably best to use the new versions. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] A cache for the results of queries ?
On Thursday 26 February 2004 13:30, David Pradier wrote: Hi everybody, i'd like to know if it exists a system of cache for the results of queries. What i'd like to do : select whatever_things from (selection_in_cache) where special_conditions; The interesting thing would be to have a precalculated selection_in_cache, especially when selection_in_cache is a very long list of joins... You might want to search the archives for the -sql list for a message Materialized View Summary - some time this week. That's almost exactly what you want. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Select-Insert-Query
On Friday 27 February 2004 16:52, [EMAIL PROTECTED] wrote: *please* don't post HTML-only messages. brbrwhat is the most performant way to select for example the first 99 rows of a table and insert them into another table...brbrat the moment i do this:brbr for userrecord in select * from table where account_id = a_account_id and counter_id = userrecord.counter_id and visitortable_id between a_minid and a_maxid limit 99 loop insert into lastusers (account_id, counter_id, date, ip, hostname) values(a_account_id,userrecord.counter_id,userrecord.date,userrecord.ip, userrecord.hostname); end loop; If that is the actual query, I'm puzzled as to what you're doing, since you don't know what it is you just inserted. Anyway, you can do this as a single query INSERT INTO lastusers (account_id ... hostname) SELECT a_account_id, counter_id... FROM table where... The LIMIT shouldn't take any time in itself, although if you are sorting then PG may need to sort all the rows before discarding all except the first 99. If this new query is no better, make sure you have vacuum analyse'd the tables and post the output of EXPLAIN ANALYSE for the query. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] WAL Optimisation - configuration and usage
Rob Sir - I have to congratulate you on having the most coherently summarised and yet complex list query I have ever seen. I fear that I will be learning from this problem rather than helping, but one thing did puzzle me - you've set your random_page_cost to 0.5? I'm not sure this is sensible - you may be compensating for some other parameter out-of-range. -- Richard Huxton ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] syslog slowing the database?
On Tuesday 09 March 2004 20:29, Greg Spiegelberg wrote: iostat reported ~2000 blocks written every 2 seconds to the DB file system. I turned syslog off to see if it was blocking anything and in the past couple minutes 1GB has been restored and iostat reports ~35,000 blocks written every 2 seconds to the DB file system. Can anyone confirm this for me? If syslog is set to sync after every line and you're logging too much then it could slow things down as the disk heads shift back and fore between two areas of disk. How many disks do you have and in what configuration? Also - was PG logging a lot of info, or is some other application the culprit? Tip: put a minus - in front of the file-path in your syslog.conf and it won't sync to disk after every entry. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] atrocious update performance
On Tuesday 16 March 2004 00:08, Tom Lane wrote: I'm inclined to suspect an issue with foreign-key checking. You didn't give us any details about foreign key relationships your cust table is involved in --- could we see those? And the schemas of the other tables involved? Two questions Tom: 1. Do the stats tables record FK checks, or just explicit table accesses? 2. If not, should they? If the only real activity is this update then simple before/after views of the stats might be revealing. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] two seperate queries run faster than queries ORed together
On Thursday 18 March 2004 21:21, Joseph Shraibman wrote: explain SELECT COUNT(u.ukey) FROM u, d WHERE d.ukey = u.ukey AND u.pkey = 260 AND (u.status = 3 OR d.status = 3 ) AND NOT u.boolfield ; QUERY PLAN --- Aggregate (cost=128867.45..128867.45 rows=1 width=4) - Hash Join (cost=32301.47..128866.77 rows=272 width=4) Hash Cond: (outer.ukey = inner.ukey) Join Filter: ((inner.status = 3) OR (outer.status = 3)) - Seq Scan on u (cost=0.00..41215.97 rows=407824 width=6) Filter: ((pkey = 260) AND (NOT boolfield)) There's your problem. For some reason it thinks it's getting 407,824 rows back from that filtered seq-scan. I take it that pkey is a primary-key and is defined as being UNIQUE? If you actually did have several hundred thousand matches then a seq-scan might be sensible. I'd start by analyze-ing the table in question, and if that doesn't have any effect look at the column stats and see what spread of values it thinks you have. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Help with query plan inconsistencies
On Tuesday 23 March 2004 18:49, Woody Woodring wrote: Hello, I am using postgres 7.4.2 as a backend for geocode data for a mapping application. My question is why can't I get a consistent use of my indexes during a query, I tend to get a lot of seq scan results. I'm not sure it wants to be using the indexes all of the time. Nested Loop Left Join (cost=0.00..23433.18 rows=1871 width=34) (actual time=0.555..5095.434 rows=3224 loops=1) Total runtime: 5100.028 ms Nested Loop Left Join (cost=0.00..76468.90 rows=9223 width=34) (actual time=0.559..17387.427 rows=19997 loops=1) Total runtime: 17416.501 ms Nested Loop Left Join (cost=0.00..29160.02 rows=2327 width=34) (actual time=0.279..510.773 rows=5935 loops=1) Total runtime: 516.782 ms #1 = 630 rows/sec (with index on cable_billing) #2 = 1,148 rows/sec (without index) #3 = 11,501 rows/sec (with index) The third case is so much faster, I suspect the data wasn't cached at the beginning of this run. In any case #2 is faster than #1. If the planner is getting things wrong, you're not showing it here. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [ADMIN] Raw vs Filesystem
On Monday 29 March 2004 22:56, Jaime Casanova wrote: ok. if i don't misunderstand you (english is not my mother tongue, so i can be wrong). your point is that speed is not necesarily performance, that's right. so, the real question is what is the best filesystem for optimal speed in postgresql? That's going to depend on a number of things: 1. Size of database 2. Usage patterns (many updates or mostly reads? single user or many?...) 3. What hardware you've got 4. What OS you're running. 5. How you've configured your hardware, OS and PG. There are some test results people have provided in the archives, but whether they apply to your setup is open to argument. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [ADMIN] Raw vs Filesystem
On Tuesday 30 March 2004 17:43, Josh Berkus wrote: Jaime, Richard, That's going to depend on a number of things: There are some test results people have provided in the archives, but whether they apply to your setup is open to argument. True. On Linux overall, XFS, JFS, and Reiser have all looked good at one time or another. Ext3 has never been a leader for performance, though, so that's an easy elimination. True, but on the sorts of commodity boxes I use, it doesn't make sense for me to waste time setting up non-standard filesystems - it's cheaper to spend a little more for better performance. I think SuSE offer Reiser though, so maybe we'll see a wider selection available by default. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] select slow?
On Tuesday 30 March 2004 20:25, Jaime Casanova wrote: hi all, i have an amd athlon with 256 ram (i know, this is not a *real* server but my tables are small) Nothing wrong with it - it's what I still use as my development server. i'm using vb6 (win98) with pgsql-7.3.4 (rh8) trough the psqlodbc. when i do a select in took long to execute, here is an example CREATE TABLE ICC_M_BANCO ( CodBanco SMALLINT NOT NULL, select * from icc_m_banco where codbanco = 1; it tooks 13s from it's send until it's executed. Try: SELECT * FROM icc_m_banco WHERE codbanco = 1::smallint; By default, PG will treat a numeric constant as integer not smallint, so when it looks for an index it can't find one for integer, so scans instead. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] What index for 'like (%keyword%)' ???
On Wednesday 31 March 2004 10:51, Priem, Alexander wrote: Hi everyone, I am building a query which uses a clause like Where doc_description like '%keyword%'. I know a normal index won't be of any use here, but since the table in question will get fairly big, I do want to use an index. Can anyone give me some advise on what kind of index I can use here? Or shouldn't I use one in this case? You probably want to look at the contrib/tsearch2 full-text indexing module. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Back to Linux 2.6 kernel thoughts...
I seem to remember discussion of anticipatory vs deadline scheduler in 2.6. Here is what Andrew Morton (I think) says: The deadline scheduler has two additional scheduling queues that were not available to the 2.4 IO scheduler. The two new queues are a FIFO read queue and a FIFO write queue. This new multi-queue method allows for greater interactivity by giving the read requests a better deadline than write requests, thus ensuring that applications rarely will be delayed by read requests. Deadline scheduling is best suited for database servers and high disk performance systems. Morton has experienced up to 15 percent increases on database loads while using deadline scheduling. http://story.news.yahoo.com/news?tmpl=storycid=75e=2u=/nf/20040405/tc_nf/23603 Nothing very in-depth in the story. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] good pc but bad performance,why?
On Wednesday 07 April 2004 05:00, huang yaqin wrote: hello Thanks, you are right. I use postmaster -o -F to start my PGand performance improved greatly. I don't think Tom was recommending turning fsync off. If you have a system crash/power glitch then the database can become corrupted. If you are happy the possibility if losing your data, write performance will improve noticably. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] plan problem
On Wednesday 07 April 2004 10:03, Ken Geis wrote: Richard Huxton wrote: On Tuesday 06 April 2004 21:25, Ken Geis wrote: I am trying to find an efficient way to draw a random sample from a complex query. I also want it to be easy to use within my application. So I've defined a view that encapsulates the query. The id in the driving table is exposed, and I run a query like: select * from stats_record_view where id in (select id from driver_stats order by random() limit 3); How about a join? SELECT s.* FROM stats_record_view s JOIN (SELECT id FROM driver_stats ORDER BY random() LIMIT 3) AS r ON s.id = r.id; Yes, I tried this too after I sent the first mail, and this was somewhat better. I ended up adding a random column to the driving table, putting an index on it, and exposing that column in the view. Now I can say SELECT * FROM stats_record_view WHERE random 0.093; For my application, it's OK if the same sample is picked time after time and it may change if data is added. Fair enough - that'll certainly do it. Also worth checking the various list archives - this has come up in the past, but some time ago. There are some messages in the archives about how to get a random sample. I know how to do that, and that's not why I posted my message. Are you saying that the planner behavior I spoke of is in the archives? I wouldn't know what to search on to find that thread. Does anyone think that the planner issue has merit to address? Can someone help me figure out what code I would look at? I was assuming after getting a random subset they'd see the same problem you are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of your original query would be good. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
On Wednesday 14 April 2004 18:53, Rajesh Kumar Mallah wrote: Hi I have .5 million rows in a table. My problem is select count(*) takes ages. VACUUM FULL does not help. can anyone please tell me how to i enhance the performance of the setup. SELECT count(*) from eyp_rfi; If this is the actual query you're running, and you need a guaranteed accurate result, then you only have one option: write a trigger function to update a table_count table with every insert/delete to eyp_rfi. There is loads of info on this (and why it isn't as simple as you might think) in the archives. First though: 1. Is this the actual query, or just a representation? 2. Do you need an accurate figure or just something near enough? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] select count(*) very slow on an already vacuumed table.
On Thursday 15 April 2004 08:10, Rajesh Kumar Mallah wrote: The problem is that i want to know if i need a Hardware upgrade at the moment. Eg i have another table rfis which contains ~ .6 million records. SELECT count(*) from rfis where sender_uid 0; Time: 117560.635 ms Which is approximate 4804 records per second. Is it an acceptable performance on the hardware below: RAM: 2 GB DISKS: ultra160 , 10 K , 18 GB Processor: 2* 2.0 Ghz Xeon Hmm - doesn't seem good, does it? If you run it again, is it much faster (since the data should be cached then)? What does vmstat 10 show while you're running the query? One thing you should have done is read the performance tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php The default values are very conservative, and you will need to change them. What kind of upgrades shoud be put on the server for it to become reasonable fast. If you've only got one disk, then a second disk for OS/logging. Difficult to say more without knowing numbers of users/activity etc. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already
On Thursday 15 April 2004 17:19, Rajesh Kumar Mallah wrote: Bill Moran wrote: Rajesh Kumar Mallah wrote: Hi, The problem was solved by reloading the Table. the query now takes only 3 seconds. But that is not a solution. If dropping/recreating the table improves things, then we can reasonably assume that the table is pretty active with updates/inserts. Correct? Yes the table results from an import process and under goes lots of inserts and updates , but thats before the vacuum full operation. the table is not accessed during vacuum. What i want to know is is there any wat to automate the dumping and reload of a table individually. will the below be safe and effective: Shouldn't be necessary assuming you vacuum (not full) regularly. However, looking back at your original posting, the vacuum output doesn't seem to show any rows that need removing. # VACUUM full verbose eyp_rfi; INFO: vacuuming public.eyp_rfi INFO: eyp_rfi: found 0 removable, 505960 nonremovable row versions in 71987 pages DETAIL: 0 dead row versions cannot be removed yet. Since your select count(*) showed 505960 rows, I can't see how dropping/replacing could make a difference on a sequential scan. Since we're not using any indexes I don't see how it could be related to that. begin work; create table new_tab AS select * from tab; truncate table tab; insert into tab select * from new_tab; drop table new_tab; commit; analyze tab; i havenot tried it but plan to do so. but i feel insert would take ages to update the indexes if any. It will have to update them, which will take time. BTW is there any way to disable checks and triggers on a table temporarily while loading data (is updating reltriggers in pg_class safe?) You can take a look at pg_restore and copy how it does it. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [OFF-TOPIC] - Known maximum size of the PostgreSQL
Christopher Kings-Lynne wrote: What's the case of bigger database PostgreSQL (so greate and amount of registers) that they know??? Didn't someone say that RedSheriff had a 10TB postgres database or something? From http://www.redsheriff.com/us/news/news_4_201.html According to the company, RedSheriff processes 10 billion records a month and the total amount of data managed is more than 32TB. Griffin said PostgreSQL has been in production for 12 months with not a single database fault in that time The stability of the database can not be questioned. Needless to say, we are extremely happy. I think it's safe to assume this is not on a spare Dell 600SC though. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] LIKE and INDEX
Jie Liang wrote: All, This is old topic, when I use: select url from urlinfo where url like 'http://www.lycos.de%'; it uses the index, good! but if I use: select url from urlinfo where url like 'http://%.lycos.de'; it won't use index at all, NOT good! is there any way I can force secon query use index??? I've seen people define a reverse(text) function via plperl or similar then build a functional index on reverse(url). Of course, that would rely on your knowing which end of your search pattern has the % wildcard. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL caching
Vitaly Belman wrote: Hello, I have the following problem: When I run some query after I just run the Postmaster, it takse several seconds to execute (sometimes more than 10), if I rerun it again afterwards, it takes mere milliseconds. So, I guess it has to do with PostgreSQL caching.. But how exactly does it work? What does it cache? And how can I control it? There are two areas of cache - PostgreSQL's shared buffers and the operating system's disk-cache. You can't directly control what data is cached, it just keeps track of recently used data. It sounds like PG isn't being used for a while so your OS decides to use its cache for webserver files. I would like to load selected information in the memory before a user runs the query. Can I do it somehow? As PostgreSQL is used in my case as webserver, it isn't really helping if the user has to wait 10 seconds every time he goes to a new page (even if refreshing the page would be really quick, sine Postgre already loaded the data to memory). If you could pin data in the cache it would run quicker, but at the cost of everything else running slower. Suggested steps: 1. Read the configuration/tuning guide at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php 2. Post a sample query/explain analyse that runs very slowly when not cached. 3. If needs be, you can write a simple timed script that performs a query. Or, the autovacuum daemon might be what you want. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] select max(id) from aTable is very slow
David Teran wrote: Hi, we have a table with about 6.000.000 rows. There is an index on a column with the name id which is an integer and serves as primary key. When we execute select max(id) from theTable; it takes about 10 seconds. Explain analyze returns: Due to the open-ended nature of PG's aggregate function system, it can't see inside the max() function to realise it doesn't need all the values. Fortune favours the flexible however - the simple workaround is to use the equivalent: SELECT id FROM theTable ORDER BY id DESC LIMIT 1; -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [SQL] Materialized View Summary
On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: I've written a summary of my findings on implementing and using materialized views in PostgreSQL. I've already deployed eagerly updating materialized views on several views in a production environment for a company called RedWeek: http://redweek.com/. As a result, some queries that were taking longer than 30 seconds to run now run in a fraction of a millisecond. You can view my summary at http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html Interesting (and well written) summary. Even if not a built in feature, I'm sure that plenty of people will find this useful. Make sure it gets linked to from techdocs. If you could identify candidate keys on a view, you could conceivably automate the process even more. That's got to be possible in some cases, but I'm not sure how difficult it is to do in all cases. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Major differences between oracle and postgres performance
Gary Cowell wrote: I'm not as familiar with postgresql as I am with Oracle but I think I've configured comparible buffering and sort area sizes, certainly there isn't much physical IO going on in either case. People are going to want to know: 1. version of PG 2. explain analyse output, rather than just explain 3. What values you've used for the postgresql.conf file The actual plan from explain analyse isn't going to be much use - as you say, a scan of the whole table followed by sorting is the best you'll get. However, the actual costs of these steps might say something useful. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] memory allocation
Michael Ryan S. Puncia wrote: Hi everyone . How much memory should I give to the kernel and postgresql I have 1G of memory and 120G of HD Devrim's pointed you to a guide to the configuration file. There's also an introduction to performance tuning on the same site. An important thing to remember is that the sort_mem is the amount of memory available *per sort* and some queries can use several sorts. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] after using pg_resetxlog, db lost
Tom Lane wrote: Shea,Dan [CIS] [EMAIL PROTECTED] writes: The pg_resetxlog was run as root. It caused ownership problems of pg_control and xlog files. Now we have no access to the data now through psql. The data is still there under /var/lib/pgsql/data/base/17347 (PWFPM_DEV DB name). But there is no reference to 36 of our tables in pg_class. Also the 18 other tables that are reported in this database have no data in them. Is there anyway to have the database resync or make it aware of the data under /var/lib/pgsql/data/base/17347? How can this problem be resolved? What this sounds like is that you reset the transaction counter along with the xlog, so that those tables appear to have been created by transactions in the future. This could be repaired by doing pg_resetxlog with a more appropriate initial transaction ID, but figuring out what that value should be is not easy :-( Tom - would there be any value in adding this to a pg_dump? I'm assuming the numbers attached to tables etc are their OIDs anyway, so it might be a useful reference in cases like this. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] postgres 7.4 at 100%
Chris Cheston wrote: Hi all, I was running Postgres 7.3 and it was running at about 15% with my application. On Postgres 7.4 on another box, it was running at 100%... People are going to need more information. Are you talking about CPU/disk IO/memory? My settings are default on both boxes I think. Doubtful - PG crawls with the default settings. Check your old postgresql.conf file and compare. Also, read the tuning article at: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php There are only about 20 inserts per second, which is really low. Anyone have any ideas as to something I have to do to Postgres 7.4 to change it from the default so that it's not eating up all my CPU? I have no clue how to debug this... What does top/vmstat/iostat show during heavy usage? Help please Should I downgrade to 7.3 to see what happens? BTW I'm running Postgres 7.3.2 on: Linux box 2.4.25-040218 #1 SMP Wed Feb 18 17:59:29 CET 2004 i686 i686 i386 GNU/Linux on a single processor P4 1.4GHz, 512 MB RAM. Does the SMP kernel do something with the single processor CPU? or should this not affect psql? Don't know about the SMP thing. Unlikely that one of the big distributions would mess that up much though. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Query performance
Bill wrote: Okso here lies the output of oclh (i.e \d oclh) Table public.oclh Column | Type | Modifiers +---+--- symbol | character varying(10) | not null default '' date | date | not null default '0001-01-01' open | numeric(12,2) | not null default '0.00' close | numeric(12,2) | not null default '0.00' low| numeric(12,2) | not null default '0.00' high | numeric(12,2) | not null default '0.00' Indexes: symbol_2_oclh_index btree (symbol, date), symbol_oclh_index btree (symbol, date) Well, I'm not sure why the two indexes on the same columns, and I'm not sure it makes sense to have defaults for _any_ of the columns there. So - you want: 1. ratio = abs(closing-opening)/opening 2. average = all the ratios of each day of each stock 3. Highest average Well, I don't know what you mean by #2, but #1 is just: SELECT symbol, date, abs(close - open)/open AS ratio FROM oclh GROUP BY symbol, date; I'd probably fill in a summary table with this and use that as the basis for your further queries. Presumably from yesterday back, the ratios/averages won't change. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Query performance
Bill wrote: Ok, thanks. So let me explain the query number 2 as this is the more difficult to write. So I have a list of stocks, this table contains the price of all of the stocks at the open and close date. Ok, now we have a ratio from query (1) that returns at least a very rough index of the daily performance of a given stock, with each ratio representing the stock's performance in one day. Now we need to average this with the same stock's ratio every day, to get a total average for each stock contained in the database. Now I would simply like to find a ratio like this that represents the average of every stock in the table and simply find the greatest ratio. Sorry about the lousy explanation before, is this a bit better? Here is an example if needed. Say we have a stock by the name of YYY I know, due to query 1 that stock YYY has a abs(close-open)/open price ratio of for example, 1.3 on Dec 1 and (for simplicity let's say we only have two dates) and Dec 2 the ratio for YYY is 1.5. So the query averages and gets 1.4. Now it needs to do this for all of the stocks in the table and sort by increasing ratio. Well, the simplest would be something like: CREATE VIEW my_ratios AS SELECT ...(select details we used for #1 previously) Query #1 then becomes: SELECT * FROM my_ratios; Then you could do: SELECT symbol, avg(ratio) as ratio_avg FROM my_ratios GROUP BY symbol ORDER BY avg(ratio) ; Now, in practice, I'd probably create a symbol_ratio table and fill that one day at a time. Then #2,#3 would be easier. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] [GENERAL] How to know which queries are to be optimised?
Ulrich Wisser wrote: You can log queries that run for at least a specified amount of time. This will be useful in finding what the long running queries are. You can then use explain analyse to see why they are long running. But is there a tool that could compile a summary out of the log? The log grows awefully big after a short time. You might want to look at the Practical Query Analyser - haven't used it myself yet, but it seems a sensible idea. http://pqa.projects.postgresql.org/ -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Weird Database Performance problem!
Arash Zaryoun wrote: Hi Richard, Thanks for your prompt reply. It fixed the problem. Just one more question: Do I need to create an index for FKs? You don't _need_ to, but on the referring side (e.g. table GCTBALLOT in your example) PostgreSQL won't create one automatically. Of course, the primary-key side will already have an index being used as part of the constraint. I've cc:ed the list on this, the question pops up quite commonly. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] using an index worst performances
Gaetano Mendola wrote: Hi all, I'm tring to optimize the following query: http://rafb.net/paste/results/YdO9vM69.html as you can see from the explain after defining the index the performance is worst. If I raise the default_statistic_target to 200 then the performance are worst then before: Without index: 1.140 ms With index: 1.400 ms With default_statistic_targer = 200: 1.800 ms Can I just check that 1.800ms means 1.8 secs (You're using . as the thousands separator)? If it means 1.8ms then frankly the times are too short to mean anything without running them 100 times and averaging. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] What is the best way to do attribute/values?
Daniel Ceregatti wrote: Hi list, I have a database with 1M people in it. Each person has about 20 attributes, such as height, weight, eye color, etc. I need to be able to search for people based on these attributes. A search can be conducted on one attribute, all attributes, or any number in between. How would _you_ do this? I have already attempted to answer this. My attempts are detailed here: http://sh.nu/email.txt Hmm... interesting. Shot in the dark - try a tsearch2 full-text index. Your problem could be translated into searching strings of the form hair=black eyes=blue age=117 Not pretty, but might give you the speed you want. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Optimizer Selecting Incorrect Index
Dennis Bjorklund wrote: On Wed, 25 Aug 2004, Richard Huxton wrote: These queries are different. The first returns 687 rows and the second 713 rows. The 687 and 713 are the number of rows in the plan, not the number of rows the queries return. D'OH! Thanks Dennis -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [ADMIN] slower every day
G u i d o B a r o s i o wrote: Conclusion: If you comment a line on the conf file, and reload it, will remain in the last state. (either wast true or false, while I expected a default) Yes, that's correct. No, you're not the only one to have been caught out by this. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] View Query Performance
Igor Maciel Macaubas wrote: Hi all, I'm trying to find smarter ways to dig data from my database, and have the following scenario: table1 -- id -- name . . . . . . table2 -- id -- number . . . . . . I want to create a view to give me back just what I want: The id, the name and the number. I tought in doing the following: create view my_view as select t1.id, t1.name, t2.number from table1 as t1, table2 as t2 where t1.id = t2.id; Will this be enough fast ? Are there a faster way to make it work ?! This table is mid-big, around 100K registers .. That's as simple a way as you will find. If you apply further conditions, e.g. SELECT * FROM my_view WHERE id = 123; then you should see any index on id being used. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [pgsql-hackers-win32] Poor Performance for large queries
John Meinel wrote: So notice that when doing the actual select it is able to do the index query. But for some reason with a prepared statement, it is not able to do it. Any ideas? In the index-using example, PG knows the value you are comparing to. So, it can make a better estimate of how many rows will be returned. With the prepared/compiled version it has to come up with a plan that makes sense for any value. If you look back at the explain output you'll see PG is guessing 181,923 rows will match with the prepared query but only 1 for the second query. If in fact you returned that many rows, you wouldn't want to use the index - it would mean fetching values twice. The only work-around if you are using plpgsql functions is to use EXECUTE to make sure your queries are planned for each value provided. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] inconsistent/weird index usage
Dustin Sallings wrote: The following view creates the illusion of the old ``single-table'' model: create view samples as select * from samples_1999 union select * from samples_2000 union select * from samples_2001 union select * from samples_2002 union select * from samples_2003 union select * from samples_2004 Try this with UNION ALL (you know there won't be any duplicates) and possibly with some limits too: SELECT * FROM samples_1999 WHERE ts BETWEEN '1999-01-01 00:00:00+00' AND '1999-12-31 11:59:59+00' UNION ALL ... select s.serial as serial_num, s.name as name, date(ts) as day, min(sample) as min_temp, avg(sample) as avg_temp, stddev(sample) as stddev_temp, max(sample) as max_temp from samples inner join sensors s using (sensor_id) where ts current_date - 7 group by serial_num, name, day order by serial_num, day desc Try restricting the timestamp too WHERE ts BETWEEN (current_date -7) AND current_timestamp Hopefully that will give the planner enough smarts to know it can skip most of the sample_200x tables. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] index not used if using IN or OR
Mario Ivankovits wrote: Hello ! Sorry if this has been discussed before, it is just hard to find in the archives using the words or or in :-o I use postgres-8.0 beta4 for windows. I broke down my problem to a very simple table - two columns primary_key and secondary_key. Creates and Insert you will find below. If I query the _empty_ freshly created table I get the following explain result: select * from tt where seckey = 1; Index Scan using seckey_key on tt (cost=0.00..17.07 rows=5 width=12) Index Cond: (seckey = 1) If I use OR (or IN) things get worse: select * from tt where seckey = 1 or seckey = 2 Seq Scan on tt (cost=0.00..0.00 rows=1 width=12) Filter: ((seckey = 1) OR (seckey = 2)) Note the Seq Scan instead of using the index. But as you said, your table is *empty* - why would an index be faster? Try running EXPLAIN ANALYSE on these queries and look at the actual times. After populating the table with 8920 records and analyze the scenario gets even worser: select * from tt where seckey = 1; Seq Scan on tt (cost=0.00..168.50 rows=1669 width=12) (actual time=0.000..15.000 rows=1784 loops=1) Filter: (seckey = 1) Total runtime: 31.000 ms Now also this simple query uses a Seq Scan. Well, it thinks it's going to be returning 1669 rows. If that's roughly right, then scanning the table probably is faster. Run the queries again with EXPLAIN ANALYSE. Also try issuing set enable_seqscan=false; This will force the planner to use any indexes it finds. Compare the times with and without, and don't forget to account for the effects of caching. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: FW: [PERFORM] FW: Index usage
BBI Edwin Punzalan wrote: Thanks, Tim. I tried adding an upper limit and its still the same as follows: == db=# explain analyze select date from chatlogs where date='11/24/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 width=4) (actual time=0.18..239.69 rows=10737 loops=1) Total runtime: 246.22 msec EXPLAIN db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/24/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec We have two issues here 1. In the first example it only picks an index because it thinks it is going to get 37 rows, it actually gets 10737 2. It's taking 4455ms to run a seq-scan but only 246ms to run an index-scan over 10737 rows (and then fetch the rows too). Questions: 1. How many rows do you have in chatlogs? 2. Is this the only problem you are experiencing, or just one from many? 3. Have you tuned any configuration settings? e.g. as suggested in: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres vs. MySQL
Evilio del Rio wrote: Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? Do you have any record of configuration, system hardware, usage patterns, queries executed? If you can tell us what you tried (and why) then we might be able to help, otherwise there's not much information here. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres backend using huge amounts of ram
Gary Doades wrote: How much RAM can a single postgres backend use? I've just loaded a moderately sized dataset into postgres and was applying RI constraints to the tables (using pgadmin on windows). Part way though I noticed the (single) postgres backend had shot up to using 300+ MB of my RAM! Oops - guess that's why they call it a Beta. My first guess was a queue of pending foreign-key checks or triggers etc. but then you go on to say... Since I can't get an explain of what the alter table was doing I used this: select count(*) from booking_plan,reqt_dates where booking_plan.reqt_date_id = reqt_dates.reqt_date_id and sure enough this query caused the backend to use 300M RAM. The plan for this was: [snip] I then analysed the database. Note, there are no indexes at this stage except the primary keys. the same query then gave: [snip] This is the same set of hash joins, BUT the backend only used 30M of private RAM. I'm guessing in the first case that the default estimate of 1000 rows in a table means PG chooses to do the join in RAM. Once it knows there are a lot of rows it can tell not to do so. However, I thought PG was supposed to spill to disk when the memory required exceeded config-file limits. If you could reproduce a simple test case I'm sure someone would be interested in squashing this bug. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Normalization or Performance
Alvaro Nunes Melo wrote: Hi, Before writing this mail, I'd researched a little about this topic, and got some opinions from guys like Fabien Pascal, who argues that logical design should be separated from physical design, and other sources. As this is not fact, I'm writing to you guys, that make things work in real world. I believe he's right. Or at least that you should only compromise your logical design once it becomes absolutely necessary due to physical limitations. We started our first big (for our company standards) project always thinking in normalization. But once we imported legacy data into the DB, things got harder. One example is the clients status. A client might be active, inactive or pending (for many reasons). We store all the status a client have since it is in the system. To check what is the actual status of a client, we get the last status from this historical status table. This take a considerable time, so our best results were achieved building a function that checks the status and indexing this function. The problem is that indexed functions mus bu immutable, so as you can figure, if the status change after the creation of the index, the retunr of the function is still the same. What do you suggest for situations like this? Should I add a field to clients table and store its actual status, keep storing data in the historical table an control its changes with a trigger? Trigger + history table is a common solution, it's easy to implement and there's nothing non-relational about it as a solution. There are other situations that are making things difficult to us. For example, one query must return the total amount a client bought in the last 90 days. It's taking too long, when we must know it for many clients, many times. So should I create summarization tables to store this kind of stuff, update it with a trigger in daily basis (for example), and solve this problem with one join? One solution I use for this sort of thing is a summary table grouped by date, and accurate until the start of today. Then, I check the summary table and the live table for todays information and sum those. Our database is not that big. The larger table has about 7.000.000 rows. About 50.000 clients, half of them active. All that I'd point out above uses indexes for queries, etc. But even with this it's not been fast enough. We have a Dell server for this (I know, the Dell issue), a Dual Xeon 2.8, SCSI HD, 1 GB mem. Do we need better hardware for our system? Swap one of your processors for more RAM and disks, perhaps. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Seqscan rather than Index
David Brown wrote: You might want to reduce random_page_cost a little. Keep in mind that your test case is small enough to fit in RAM and is probably not reflective of what will happen with larger tables. I am also running 8.0 rc1 for Windows. Despite many hours spent tweaking various planner cost constants, I found little effect on cost estimates. Even reducing random_page_cost from 4.0 to 0.1 had negligible impact and failed to significantly influence the planner. I'm not sure setting random_page_cost below 1.0 makes much sense. Increasing the statistics target for the last_name column to 250 or so *may* help, at least if you're only selecting one name at a time. Not going to do anything in this case. The planner is roughly right about how many rows will be returned, it's just not expecting everything to be in RAM. That's the standard advice around here and the only thing I've found useful. Half the threads in this forum are about under-utilized indexes. It would be great if someone could admit the planner is broken and talk about actually fixing it! Not sure I agree here - when the stats are accurate, you can get the planner to make near-optimal choices most of the time. Is there any particular pattern you've seen? I'm unconvinced that the planner only favours sequential scans as table size decreases. In my experience so far, larger tables have the same problem only it's more noticeable. Hmm - assuming your statistics are good, this would suggest the other cost settings just aren't right for your hardware. The issue hits PostgreSQL harder than others because of its awful sequential scan speed, which is two to five times slower than other DBMS. The archives show there has been talk for years about this, but it seems, no solution. The obvious thing to consider is the block size, but people have tried increasing this in the past with only marginal success. Must admit this puzzles me. Are you saying you can't saturate your disk I/O? Or are you saying other DBMS store records in 0.5 to 0.2 times less space than PG? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres on Linux Cluster!
Gurpreet Sachdeva wrote: I have recently transfered a big database on my master node of a 4 node openSSI Cluster... The system is working fine but sometimes, I get following errors: http://192.168.1.100/cgi-bin/search.py File /usr/lib/python2.2/site-packages/pyPgSQL/PgSQL.py, line 3067, in execute, referer: http://192.168.1.100/cgi-bin/search.py self.conn.conn.query('ROLLBACK WORK'), referer: http://192.168.1.100/cgi-bin/search.py libpq.ProgrammingError: no connection to the server, referer: http://192.168.1.100/cgi-bin/search.py , referer: http://192.168.1.100/cgi-bin/search.py At a wild guess, this happens when a CGI process is migrated to another node without migrating the accompanying connection (however you'd do that). This error comes while insertion of data takes place... Is Postgres successfull on Cluster??? Will that give me performance enhancement in any way??? Please help... Probably not, and almost certainly not. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance delay
Hasnul Fadhly bin Hasan wrote: Hi, just want to share with all of you a wierd thing that i found when i tested it. i was doing a query that will call a function long2ip to convert bigint to ips. so the query looks something like this. select id, long2ip(srcip), long2ip(dstip) from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' order by id limit 30; for your info, there are about 300k rows for that timeframe. it cost me about 57+ secs to get the list. which is about the same if i query select id, long2ip(srcip), long2ip(dstip) from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' it will cost me about 57+ secs also. Now if i did this select id,long2ip(srcip), long2ip(dstip) from ( * from sometable where timestamp between timestamp '01-10-2005' and timestamp '01-10-2005 23:59' order by id limit 30) as t; it will cost me about 3+ secs The difference will be that in the final case you only make 30 calls to long2ip() whereas in the first two you call it 300,000 times and then throw away most of them. Try running EXPLAIN ANALYSE ... for both - that will show how PG is planning the query. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] sum of all values
Madison Kelly wrote: Hi all, Is there a fast(er) way to get the sum of all integer values for a certain condition over many thousands of rows? What I am currently doing is this (which takes ~5-10sec.): OK, I'm assuming you've configured PG to your satisfaction and this is the only query giving you problems. SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of that, although the query looks straightforward enough. Here are the schemas, in case they help: tle-bu= \d file_info_1Table public.file_info_1 Column | Type | Modifiers -+-+ file_acc_time | bigint | not null file_group_name | text| not null file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text| not null file_parent_dir | text| not null file_perm | text| not null file_size | bigint | not null file_type | text| not null default 'f'::text file_user_name | text| not null file_user_uid | integer | not null Indexes: file_info_1_display_idx btree (file_parent_dir, file_name, file_type) file_info_1_search_idx btree (file_parent_dir, file_name, file_type) tle-bu= \d file_set_1 Table public.file_set_1 Column | Type | Modifiers ---+-+ fs_backup | boolean | not null default true fs_display| boolean | not null default false fs_name | text| not null fs_parent_dir | text| not null fs_restore| boolean | not null default false fs_type | text| not null default 'f'::text Indexes: file_set_1_sync_idx btree (fs_parent_dir, fs_name, fs_type) 1. WHERE ARE YOUR PRIMARY KEYS??? 2. Why do you have two identical indexes on file_info_1 3. WHERE ARE YOUR PRIMARY KEYS??? 4. Am I right in thinking that always, file_name==fs_name (i.e. they represent the same piece of information) and if so, why are you storing it twice? Same for _parent_dir too 5. file_type/fs_type are being held as unbounded text? Not an index into some lookup table or a varchar(N)? Can you explain what you're trying to do here - it might be you want to alter your database design. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] sum of all values
Madison Kelly wrote: Richard Huxton wrote: Madison Kelly wrote: Hi all, Is there a fast(er) way to get the sum of all integer values for a certain condition over many thousands of rows? What I am currently doing is this (which takes ~5-10sec.): OK, I'm assuming you've configured PG to your satisfaction and this is the only query giving you problems. This is a program for general consumption (hopefully... eventually...) so I want to leave the psql config alone. Once I am happier with the program I will try different tuning options and write a faq though I expect 9 out of 10 users won't read it. PostgreSQL is not FireFox, and you can't expect it to work efficiently without doing at least some configuration. The settings to support 100 simultaneous connections on a dual-Opteron with 8GB RAM are not the same as on a single-user laptop. Take half an hour to read through the performance-tuning guide here: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of that, although the query looks straightforward enough. tle-bu= EXPLAIN ANALYZE SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; QUERY PLAN Aggregate (cost=2202.54..2202.54 rows=1 width=8) (actual time=5078.744..5078.748 rows=1 loops=1) - Merge Join (cost=724.94..2202.51 rows=11 width=8) (actual time=3281.677..4969.719 rows=12828 loops=1) Merge Cond: ((outer.file_parent_dir = inner.fs_parent_dir) AND (outer.file_name = inner.fs_name) AND (outer.file_type = inner.fs_type)) - Index Scan using file_info_1_search_idx on file_info_1 a (cost=0.00..1317.11 rows=12828 width=104) (actual time=0.042..116.825 rows=12828 loops=1) - Sort (cost=724.94..740.97 rows=6414 width=96) (actual time=3281.516..3350.640 rows=12828 loops=1) Sort Key: b.fs_parent_dir, b.fs_name, b.fs_type - Seq Scan on file_set_1 b (cost=0.00..319.35 rows=6414 width=96) (actual time=0.029..129.129 rows=12828 loops=1) Filter: (fs_backup = true) Total runtime: 5080.729 ms Well, it's slow, but that's probably your settings. Run VACUUM ANALYSE on the tables though, it looks like you've got default statistics (It's expecting exactly 1/2 the fs_backup values to be true - 6414 out of 12828). Here are the schemas, in case they help: tle-bu= \d file_info_1Table public.file_info_1 Column | Type | Modifiers -+-+ file_acc_time | bigint | not null file_group_name | text| not null file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text| not null file_parent_dir | text| not null file_perm | text| not null file_size | bigint | not null file_type | text| not null default 'f'::text file_user_name | text| not null file_user_uid | integer | not null Indexes: file_info_1_display_idx btree (file_parent_dir, file_name, file_type) file_info_1_search_idx btree (file_parent_dir, file_name, file_type) tle-bu= \d file_set_1 Table public.file_set_1 Column | Type | Modifiers ---+-+ fs_backup | boolean | not null default true fs_display| boolean | not null default false fs_name | text| not null fs_parent_dir | text| not null fs_restore| boolean | not null default false fs_type | text| not null default 'f'::text Indexes: file_set_1_sync_idx btree (fs_parent_dir, fs_name, fs_type) 1. WHERE ARE YOUR PRIMARY KEYS??? 2. Why do you have two identical indexes on file_info_1 3. WHERE ARE YOUR PRIMARY KEYS??? 4. Am I right in thinking that always, file_name==fs_name (i.e. they represent the same piece of information) and if so, why are you storing it twice? Same for _parent_dir too 5. file_type/fs_type are being held as unbounded text? Not an index into some lookup table or a varchar(N)? Can you explain what you're trying to do here - it might be you want to alter your database design. -- Richard Huxton Archonet Ltd This is where I have to admit my novice level of knowledge. Until now I have been more concerned with making it work. It is only now that I have finished (more or less) the program that I have started going back and trying to find ways to speed it up. I have not used postgres (or perl or anything) before this program. I hope my questions aren't too basic. ^.^; There's a rule of thumb about throwing the first version of anything
Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql
[EMAIL PROTECTED] wrote: I would like to upgrade both OS kernel and PGsql version , so in my opinion the best way to handle it is to *backup* the data in .tar Just remember if you're going from 7.3.2 = 7.4.x or 8.0 then you'll need to use pg_dump not just tar up the directories. If you do use tar, remember to backup *all* the directories. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Andrei Bintintan wrote: Hi to all, I have the following 2 examples. Now, regarding on the offset if it is small(10) or big(5) what is the impact on the performance of the query?? I noticed that if I return more data's(columns) or if I make more joins then the query runs even slower if the OFFSET is bigger. How can I somehow improve the performance on this? There's really only one way to do an offset of 1000 and that's to fetch 1000 rows and then some and discard the first 1000. If you're using this to provide pages of results, could you use a cursor? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Andrei Bintintan wrote: If you're using this to provide pages of results, could you use a cursor? What do you mean by that? Cursor? Yes I'm using this to provide pages, but If I jump to the last pages it goes very slow. DECLARE mycursor CURSOR FOR SELECT * FROM ... FETCH FORWARD 10 IN mycursor; CLOSE mycursor; Repeated FETCHes would let you step through your results. That won't work if you have a web-app making repeated connections. If you've got a web-application then you'll probably want to insert the results into a cache table for later use. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Alex Turner wrote: I am also very interesting in this very question.. Is there any way to declare a persistant cursor that remains open between pg sessions? Not sure how this would work. What do you do with multiple connections? Only one can access the cursor, so which should it be? This would be better than a temp table because you would not have to do the initial select and insert into a fresh table and incur those IO costs, which are often very heavy, and the reason why one would want to use a cursor. I'm pretty sure two things mean there's less difference than you might expect: 1. Temp tables don't fsync 2. A cursor will spill to disk beyond a certain size -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Ron Mayer wrote: Richard Huxton wrote: If you've got a web-application then you'll probably want to insert the results into a cache table for later use. If I have quite a bit of activity like this (people selecting 1 out of a few million rows and paging through them in a web browser), would it be good to have a single table with a userid column shared by all users, or a separate table for each user that can be truncated/dropped? I started out with one table; but with people doing 10s of thousand of inserts and deletes per session, I had a pretty hard time figuring out a reasonable vacuum strategy. As often as you can, and make sure your config allocates enough free-space-map for them. Unless, of course, you end up I/O saturated. Eventually I started doing a whole bunch of create table tmp_ tables where is a userid; and a script to drop these tables - but that's quite ugly in a different way. With 8.0 I guess I'll try the single table again - perhaps what I want may be to always have a I/O throttled vacuum running... hmm. Well, there have been some tweaks, but I don't know if they'll help in this case. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [SQL] OFFSET impact on Performance???
Greg Stark wrote: Andrei Bintintan [EMAIL PROTECTED] writes: If you're using this to provide pages of results, could you use a cursor? What do you mean by that? Cursor? Yes I'm using this to provide pages, but If I jump to the last pages it goes very slow. The best way to do pages for is not to use offset or cursors but to use an index. This only works if you can enumerate all the sort orders the application might be using and can have an index on each of them. To do this the query would look something like: SELECT * FROM tab WHERE col ? ORDER BY col LIMIT 50 Then you take note of the last value used on a given page and if the user selects next you pass that as the starting point for the next page. Greg's is the most efficient, but you need to make sure you have a suitable key available in the output of your select. Also, since you are repeating the query you could get different results as people insert/delete rows. This might or might not be what you want. A similar solution is to partition by date/alphabet or similar, then page those results. That can reduce your resultset to a manageable size. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Profiling a function...
Robert Sanford wrote: How do I profile a user-defined function so that I know which parts of the function are the ones that are taking the biggest chunk of time? When I run EXPLAIN on the queries within the function none of them show up as onerous burdens to the performance. But when they are all operating together within the function and within the functional logic they become really expensive. Obviously I've made a mistake somewhere but it isn't obvious (otherwise it would be fixed already) and I'd prefer having a profile report telling me what is taking so long rather than guessing and possibly making things worse. So is there any way to get a line-by-line timing profile of a user-defined function? Not really. What you can do is simulate the queries in functions by using PREPARE. You're probably seeing a difference because when PG plans the queries for functions/prepared queries it doesn't know the actual values. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Possibly slow query
Peter Darley wrote: Folks, I'm using PostgreSQL 7.4.1 on Linux, and I'm trying to figure out weather a query I have is going to be slow when I have more information in my tables. both tables involved will likely have ~500K rows within a year or so. Specifically I can't tell if I'm causing myself future problems with the subquery, and should maybe re-write the query to use a join. The reason I went with the subquery is that I don't know weather a row in Assignments will have a corresponding row in Assignment_Settings The query is: SELECT User_ID FROM Assignments A WHERE A.User_ID IS NOT NULL AND (SELECT Value FROM Assignment_Settings WHERE Setting='Status' AND Assignment_ID=A.Assignment_ID) IS NULL GROUP BY User_ID; You could always use a LEFT JOIN instead, like you say. I'd personally be tempted to select distinct user_id's then join, but it depends on how many of each. You're not going to know for sure whether you'll have problems without testing. Generate 500k rows of plausible looking test-data and give it a try. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Possibly slow query
Peter Darley wrote: Folks, I'm using PostgreSQL 7.4.1 on Linux Oh, and move to the latest in the 7.4 series too. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Very important choice
Lago, Bruno Almeida do wrote: Hello my friends, I'd like to know (based on your experience and technical details) which OS is recommended for running PostgreSQL keeping in mind 3 indicators: 1 - Performance (SO, Network and IO) 2 - SO Stability 3 - File System Integrity The short answer is almost certainly whichever OS you are most familiar with. If you have a problem, you don't want to be learning new details about your OS while fixing it. That rules out FreeBSD for now. What hardware you want to use will affect performance and choice of OS. You'll need to decide what hardware you're looking to use. As far as file-systems are concerned, ext3 seems to be the slowest, and the general feeling seems to be that XFS is perhaps the fastest. In terms of reliability, avoid cutting-edge releases of any file-system - let others test them for you. One thing to consider is how long it takes to recover from a crash - you can run PostgreSQL on ext2, but checking a large disk can take hours after a crash. That's the real benefit of journalling for PG - speed of recovery. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Tunning postgresql on linux (fedora core 3)
Din Adrian wrote: Hello, I have a little time and I decided to improve the performance of my server(s). I have found on google many 'tips' in tuning linux kernel and postgresql database ... but I can't decide wich 'how-to' is better ... :( So the question is: where to find a 'easy' and complete documentation about this tweaks ... ? Try the performance tuning article linked from this page: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tunning postgresql on linux (fedora core 3)
Please CC the mailing list as well as replying to me, so that others can help too. Din Adrian wrote: yes I have read this as well ... One question about this option: fsync = true / false a) I have Raid and UPS - it is safe to turn this off ... (' But be very aware that any unexpected database shutdown will force you to restore the database from your last backup.' - from my last backup if the server goes down ??? why ? just at 'any unexpected database shutdown' ? !!!) Because fsync=true flushes transaction details to disk (the Write Ahead Log). That way if (say) the power-supply in your server fails you can check the WAL and compare it to the main database files to make sure everything is in a known state. b) in docs say that after 7.2 seting this to false does'n turn off the wall ...!? wich option does? The docs don't say that, as far as I can see. It doesn't make sense to turn off the WAL. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tunning postgresql on linux (fedora core 3)
I'll repeat myself: Please CC the mailing list as well as replying to me, so that others can help too. Din Adrian wrote: On Thu, 03 Feb 2005 13:56:50 +, Richard Huxton dev@archonet.com wrote: Please CC the mailing list as well as replying to me, so that others can help too. b) in docs say that after 7.2 seting this to false does'n turn off the wall ...!? wich option does? The docs don't say that, as far as I can see. It doesn't make sense to turn off the WAL. hmm this is the doc about ... ' NOTE: Since 7.2, turning fsync off does NOT stop WAL. It does stop checkpointing, however. This is a change in the notes that follow Turn WAL off (fsync=false) only for a read-only database or one where the database can be regenerated from external software. While RAID plus UPSes can do a lot to protect your data, turning off fsync means that you will be restoring from backup in the event of hardware or power failure.' I don't know what this is, and you don't give a URL, but it DOES NOT appear to be in the manuals. You should probably read the sections of the manuals regarding run-time configuration and write ahead logs. The manuals are quite extensive, are available online at http://www.postgresql.org/ and also in most distributions. This is probably a good place to start. http://www.postgresql.org/docs/8.0/interactive/runtime-config.html#RUNTIME-CONFIG-WAL If you turn it off you should have more speed ... !!!??? Basically, as I said in my last email - fsync=true makes sure transaction details are safely stored on disk. If you turn this off, the database doesn't have to wait for the data to physically be written to the disk. But, if power fails then data might be in OS or disk cache and so lost when you restart the machine. Please CC the mailing list if you reply to this message. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] How to interpret this explain analyse?
Joost Kraaijeveld wrote: Hi all, A question on how to read and interpret the explain analyse statement (and what to do) I have a query SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer; Both tables have an btree index on klantnummer (int4, the column the join is on). I have vacuumed and analyzed both tables. The explain analyse is: Indexes not necessarily useful here since you're fetching all rows in A and presumably much of B Sort Hash Left Join Seq Scan on orders a Hash Seq Scan on klt_alg b I've trimmed the above from your explain output. It's sequentially scanning b and using a hash to join to a before sorting the results. Questions: - Hash Left Join (cost=41557.43..110069.51 rows=1100836 width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1) 0. What exactly are the numbers in cost=41557.43..110069.51 ( I assume for the other questions that 41557.43 is the estimated MS the query will take, what are the others)? The cost numbers represent effort rather than time. They're only really useful in that you can compare one part of the query to another. There are two numbers because the first shows startup, the second final time. So - the outer parts of the query will have increasing startup values since the inner parts will have to do their work first. The actual time is measured in ms, but remember to multiply it by the loops value. Oh, and actually measuring the time slows the query down too. 1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is the estimated cost and (actual time=21263.858..42845.158 rows=1104380 loops=1) the actual cost. Is the difference acceptable? 2. If not, what can I do about it? The key thing to look for here is the number of rows. If PG expects say 100 rows but there are instead 10,000 then it may choose the wrong plan. In this case the estimate is 1,100,836 and the actual is 1,104,380 - very close. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help with tuning this query
Ken Egervari wrote: I've tried to use Dan Tow's tuning method Who? What? and created all the right indexes from his diagraming method, but the query still performs quite slow both inside the application and just inside pgadmin III. Can anyone be kind enough to help me tune it so that it performs better in postgres? I don't think it's using the right indexes, or maybe postgres needs special treatment. I've converted the below query to SQL from a Hibernate query, so the syntax is probably not perfect but it's semantics are exactly the same. I've done so by looking at the source code, but I can't run it to get the exact SQL since I don't have the database on my home machine. Hibernate is a java thing, no? It'd be helpful to have the actual SQL the hibernate class (or whatever) generates. One of the problems with SQL is that you can have multiple ways to get the same results and it's not always possible for the planner to convert from one to the other. Anyway, people will want to see EXPLAIN ANALYSE for the query in question. Obviously, make sure you've vacuumed and analysed the tables in question recently. Oh, and make sure yousay what version of PG you're running. select s.* from shipment s inner join carrier_code cc on s.carrier_code_id = cc.id inner join carrier c on cc.carrier_id = c.id inner join carrier_to_person ctp on ctp.carrier_id = c.id inner join person p on p.id = ctp.person_id inner join shipment_status cs on s.current_status_id = cs.id inner join release_code rc on cs.release_code_id = rc.id left join shipment_status ss on ss.shipment_id = s.id where p.id = :personId and s.is_purged = false and rc.number = '9' and cs is not null and cs.date = current_date - 31 order by cs.date desc 1. Why are you quoting the 9 when checking against rc.number? 2. The cs is not null doesn't appear to be qualified - which table? Just assume I have no indexes for the moment because while some of the indexes I made make it work faster, it's still around 250 milliseconds and under heavy load, the query performs very badly (6-7 seconds). 3. If you rewrite the current_date - 31 as a suitable ago(31) function then you can use an index on cs.date 4. Are you familiar with the configuration setting join_collapse_limit? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] bad plan
Gaetano Mendola wrote: running a 7.4.5 engine, I'm facing this bad plan: empdb=# explain analyze SELECT name,url,descr,request_status,url_status,size_mb,estimated_start,request_time_stamp empdb-#FROM v_sc_user_request empdb-#WHERE empdb-# login = 'babinow1' empdb-#LIMIT 10 ; - Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) - Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Merge Cond: (outer.id_program = inner.id_program) The problem to address is in this subquery. That's a total of 31 x (1668.754 - 25.328) = 50seconds (about). Since your query is so simple, I'm guessing v_sc_user_request is a view. Can you provide the definition? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] bad plan
Tom Lane wrote: Putting ORDER BYs in views that you intend to use as components of other views is a bad practice from a performance perspective... There are also a lot of views involved here for very few output columns. Tom - is the planner smart enough to optimise-out unneeded columns from a SELECT * view if it's part of a join/subquery and you only use one or two columns? Secondly, in the original plan we have: - Nested Loop Left Join (cost=1478.82..1716.37 rows=1 width=201) (actual time=3254.483..52847.064 rows=31 loops=1) Now, we've got 31 rows instead of 1 here. The one side of the join ends up as: - Subquery Scan vsp (cost=985.73..1016.53 rows=1103 width=12) (actual time=25.328..1668.754 rows=493 loops=31) - Merge Join (cost=985.73..1011.01 rows=1103 width=130) (actual time=25.321..1666.666 rows=493 loops=31) Would I be right in thinking the planner doesn't materialise the subquery because it's expecting 1 loop not 31? If there were 1 row the plan would seem OK to me. Is there any mileage in the idea of a lazy planner that keeps some alternative paths around in case they're needed? Or a reactive one that can re-plan nodes when assumptions turn out to be wrong? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] What's better: Raid 0 or disk for seperate pg_xlog
Karim Nassar wrote: Thanks to all for the tips. On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote: How critical is your data? How update heavy versus read heavy, etc are you? Large, relatively infrequent uploads, with frequent reads. The application is a web front-end to scientific research data. The scientists have their own copy of the data, so if something went really bad, we could probably get them to upload again. If you have very few updates and your reads aren't mostly from RAM you could be better off with simply mirroring (assuming that gains you read bandwidth). Failing that, use the tablespace feature to balance your read load as far as you can. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend