[PERFORM] Relation of cpu_*_costs?
Dear Gurus, Please feel free to show me to the archives if my question has already been answered. Last week I fumbled with CPU_TUPLE_COST and revealed that 4 out of 4 tested queries improved by 10-60% if I changed it from 0.01 (default) to 0.40 (ugh). Setting it higher did not bring any improvement. %--- cut here ---% QUESTION1: is there a (theoretical or practical) relation between this one and the other cpu costs? Should I also increase those values by the same rate and find a balance that way? As far as I can guess, there should be a linear relation, i.e. cpu_tuple_cost:cpu_index_tuple_cost:cpu_operator_cost should be a constant ratio, but then again, I suspect there is a cause that they have separate entries in the config file ;) %--- cut here ---% The queries were, or contained, something like: SELECT s.qty FROM a, s WHERE a.id = s.a_id AND a.b_id = 1234; where * "a" and "s" are in 1:N relation, * "b" and "a" are in 1:N relation, * a.id is pkey in "a" and b.id is pkey in "b". These queries usually return up to 6-10% of the tuples in s (about 16k of 220k) and the planner chose seq scans on s. Disabling seq scan and some other things finally brought up a plan containing index scans that improved two queries. (I tested the other two after I found out the solution of these, to see if they improve or get worse) Also noticed that the largest gain was from the smallest change on cpu_tuple_cost: the query with the largest improvement (to 32% of orig time) chose the better plan from 0.03, but the other one improved (to 79%) only if set cpu_tuple_cost to 0.40 or higher. %--- cut here ---% QUESTION2: am I right setting cpu_tuple_cost, or may there be another cause of poor plan selection? Also tried lowering random_page_cost, but even 1.0 didn't yield any improvement. %--- cut here ---% CONFIGURATION: PostgreSQL 7.3.4, IBM Xeon 2x2.4GHz HT, 5x36GB 10krpm HW RAID-5. We found out quite early that random page cost is quite low (now we have it at 1.5-- maybe it's still to high) and it's true that tasks that require raw cpu power aren't very much faster than PIII-800. Unfortunately I can't test the same hw on 7.4 yet, since it's a production server. TIA, G. %--- cut here ---% \end ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
Hi, Steve, Here are the results of some benchmarks we did inserting 30k rows into a table, using "time psql -f blubb.sql -q dbname": Filekingfisher skate 30kinsert.sql 39.359s762r/s 335.024s 90r/s 30kcommonce.sql 11.402s 2631r/s 7.086s 4233r/s 30kwithids.sql 10.138s 2959r/s 6.936s 4325r/s 30kprepare.sql 8.173s 3670r/s 5.189s 5781r/s 30kdump.sql 1.286s 23328r/s 0.785s38216r/s 30kdumpseq.sql 1.498s 20026r/s 0.927s32362r/s Kingfisher is the single processor machine I mentioned yesterday night, skate the SMP machine. The table has five rows (bigint, bigint, double, double, timestamp without time zone). The first of them has a "default nextval('sequence' ::text)" definition, and there are no further constraints or indices. The 30kinsert.sql uses simple insert statements with autocommit on, and we insert all but the first column which is filled by the default sequence. With this test, kingfisher seems to have an irrealistic high value of commits (syncs) per second (see what I wrote yesterday) [1], skate has a more realistic value. 30kcommonce.sql, as suspected, gives a rather high boost by encapsulating all into a single commit statement. 30kwithids gives a small boost by inserting pre-calculated sequence numbers, so it seems not worth the effort to move this logic into the application. 30kprepare prepares the insert statement, and then issues 30k EXECUTE statements within one transaction, the speedup is noticeable. 30kdump simply inserts the 30k rows as a dump via COPY FROM STDIN. (as with 30kwithids, the first column is contained in the insert data, so the default value sequence is not used). Naturally, this is by far the fastest method. 30kdumpseq.sql uses COPY, too, but omits the first column and such utilizes the sequence generation again. This gives a noticeable 15% slowdown, but seems to be still fast enough for our purposes. Sadly, it is not available within jdbc. Thanks for your patience. Footnotes: [1] We suspect this to be some strange interaction between ide, cryptoloop and ext3fs, so that the sync() call somehow does not really wait for the data to be physically written to the disk. (I really can't imagine a crypto-looped notebook harddisk to do more syncs/second than a SCSI-Based RAID in a server machine. We did some small benches on the sync() / fsync() calls that seem to prove this conclusion.) -- markus schaber | dipl. informatiker logi-track ag | rennweg 14-16 | ch 8001 zürich phone +41-43-888 62 52 | fax +41-43-888 62 53 mailto:[EMAIL PROTECTED] | www.logi-track.com ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Sat, 5 Jun 2004, Steve Wampler wrote: > > [I want to use copy from JDBC] > I made a patch to the driver to support COPY as a PG extension. The patch required properly encoded and formatted copy data available from an InputStream. Following some feedback from others I began adding the ability to handle different encodings and the ability to read and write objects without requiring any knowledge of the copy data format. I got hung up on the object read/write part because of some issues with how type conversions are done in the driver. At the moment there is a big push being made by Oliver Jowett to get true V3 protocol support implemented which is currently my highest priority. Getting copy support into the JDBC driver is something I'd like to see for 7.5, but I couldn't say if that will happen or how complete it may be. Depending on your needs perhaps the initial patch is sufficient. http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php Kris Jurka ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Postgres function use makes machine crash.
Hi all, I've been dealing with a problem for the past two days where a certain sql statement works 2 out of 5 times, and the other 3 times, it causes the machine (quad Xeon 2.8GHz + 792543232 bytes mem, linux kernel 2.4.26-custom, pg ver 7.3.4) to slow down, and finally grind to a halt. It looks like postgres gets itself into an insane loop, because no matter how much shared memory I give it, it uses it all, and then the kernel starts swapping. I'm pretty sure it's not the kernel, because I've tried four different 2.4.2* stable kernels, and the same happens. I've attached the query, and the functions used inside the query, as well as the table structure and an explain. (I haven't been able to get explain analyze) It seems that when I replace the functions used in the query, with the actual values returned by them (one date in each case), the query runs in 10 seconds. I did vacuum analyze, and reindex seemed to work at one stage, but now it doesn't anymore. Is there some limitation in using functions that I do not know about, or is it a bug? (It seems to be hanging on the max_fpp() function call from inside the fpp_max_ms() function.) Please help. Kind Regards Stefan query.sql Description: Binary data =# EXPLAIN SELECTgroup_code::text AS group_code, -# sku::text AS sku, -# stktype_code::varchar(2) AS stktype_code, -# brn_code::textAS brn_code, -# SUM(overdue)::int4AS overdue, -# SUM(current)::int4AS current, -# SUM(future)::int4 AS future -# FROM ( (# SELECTgroup_code, (# sku, (# stktype_code, (# brn_code, (# CASE WHEN to_date <= max_fpp_ms() THEN (# SUM(out_qty) (# ELSE 0 (# END AS overdue, (# CASE WHEN to_date > max_fpp_ms() (# AND to_date <= max_fpp_me() THEN (# SUM(out_qty) (# ELSE 0 (# END AS current, (# CASE WHEN to_date > max_fpp_me() THEN (# SUM(out_qty) (# ELSE 0 (# END AS future (# FROM gir_outstanding (# GROUP BY group_code, (# sku, (# stktype_code, (# brn_code, (# to_date (# ) AS sub -# GROUP BY group_code, -# sku, -# stktype_code, -# brn_code -# ; QUERY PLAN - Aggregate (cost=15880.41..16055.62 rows=876 width=44) -> Group (cost=15880.41..15989.92 rows=8761 width=44) -> Sort (cost=15880.41..15902.31 rows=8761 width=44) Sort Key: group_code, sku, stktype_code, brn_code -> Subquery Scan sub (cost=13335.57..15306.72 rows=8761 width=44) -> Aggregate (cost=13335.57..15306.72 rows=8761 width=44) -> Group (cost=13335.57..14649.67 rows=87607 width=44) -> Sort (cost=13335.57..13554.58 rows=87607 width=44) Sort Key: group_code, sku, stktype_code, brn_code, to_date -> Seq Scan on gir_outstanding (cost=0.00..4687.07 rows=87607 width=44) (10 rows) functions.sql Description: Binary data =# \d gir_outstanding Table "public.gir_outstanding" Column |Type | Modifiers +-+--- supp_code | text| supp_name | text| supp_brn | text| ord_no | text| due_date | timestamp without time zone | to_date| timestamp without time zone | group_code | text| brn_code | text| desc_short | text| cluster_brn| text| country_code | text| req_doc_no | integer | ops_code | text| sku| text| std_descr | text| acde_code | text| req_qty| double precision| grv_qty| double precision| skul_qty | double precision| pref_date | timestamp without time zone | skul_grv_qty | double precision| out_qty| double precision| skul_or
Re: [PERFORM] Slow join using network address function
Eric, > Nested Loop > (cost=189.00..27359887.76 rows=607947200 width=22) >Join Filter: ("outer".address <<= "inner".address) >-> Seq Scan on clients c >(cost=0.00..2074.76 rows=102176 width=11) >-> Materialize >(cost=189.00..308.00 rows=11900 width=11) > -> Seq Scan on clients_commercial cc > (cost=0.00..189.00 rows=11900 width=11) To help you, we need EXPLAIN ANALYZE, not just EXPLAIN. Thanks! -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Relation of cpu_*_costs?
"=?iso-8859-2?B?U1rbQ1MgR+Fib3I=?=" <[EMAIL PROTECTED]> writes: > Last week I fumbled with CPU_TUPLE_COST and revealed that 4 out of 4 tested > queries improved by 10-60% if I changed it from 0.01 (default) to 0.40 > (ugh). Setting it higher did not bring any improvement. That's pretty hard to believe; particularly on modern machines, I'd think that moving it down would make more sense than moving it up. You're essentially asserting that the CPU time to process one tuple is almost half of the time needed to bring a page in from disk. I suspect that your test cases were toy cases small enough to be fully cached and thus not incur any actual I/O ... > [ trying to get a nestloop indexscan plan to be generated ] I believe that the planner's cost model for nestloops with inner indexscan is wrong: it costs each inner iteration independently, when in fact there should be some savings, because at least the topmost levels of the index will soon be fully cached. However, when I tried to work out a proper model of this effect, I ended up with equations that gave higher indexscan costs than what's in there now :-(. So that didn't seem like it would make anyone happy. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Postgres function use makes machine crash.
Stef <[EMAIL PROTECTED]> writes: > I've been dealing with a problem for the past two days > where a certain sql statement works 2 out of 5 times, and > the other 3 times, it causes the machine (quad Xeon 2.8GHz > + 792543232 bytes mem, linux kernel 2.4.26-custom, pg ver 7.3.4) > to slow down, and finally grind to a halt. IIRC, PG prior to 7.4 had some problems with memory leaks in repeated execution of SQL-language functions ... and your query sure looks like it's going to be doing a lot of repeated execution of those functions. Please try it on 7.4.2 and see if you still have a problem. It seems somewhat interesting that you see the problem only sometimes and not every time, but there's not much point in investigating further if it turns out the problem is already fixed. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres function use makes machine crash.
Tom Lane mentioned : => Please try it on 7.4.2 and see if you still have a problem. Will do, and I'll post the results Thanks! ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] PREPAREing statements versus compiling PLPGSQL
Hi all, I am optimizing some code that does a lot of iterative selects and inserts within loops. Because of the exception handling limitations in postgres and with no ability to twiddle autocommit, just about every operation is standalone. over 5000 odd lines this gets very slow (5-10 minutes including processing). In seeking to speed it up I am PREPARing the most common inserts and selects. I have a few operations already inside plpgsql functions. EXECUTE means something different within a plpgsql funtion, so I am wondering if there is a way to execute a pre-prepared query inside a function. Or is this even necessary - are queries within plpgsql functions automatically prepared when the function is first compiled? On a similar note, is there any benefit in PREPAREing a select from a plpgsql function? Or does anyone have any smart ways to turn off autocommit? (I have already played with commit_delay and commit_siblings). My empirical testing has proven inconclusive (other than turning off fsync which makes a huge difference, but not possible on the live system, or using a fat copmaq raid card). Thanks for any help, Mark. -- Mark Aufflick e: [EMAIL PROTECTED] w: www.pumptheory.com (business) w: mark.aufflick.com (personal) p: +61 438 700 647 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] pl/pgsql and Transaction Isolation
Hello, I have an instance where I have a series of pl/pgsql calls, that report stat results to a common table. When other queries try to hit the stat table (with DML commands; SELECT, INSERT, UPDATE, DELETE etc.) they are forced to wait in a queue until the pl/pgsql has finished executing. will: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; before these DML queries eliminate the locking? -- marcus whitney chief architect : cold feet creative www.coldfeetcreative.com 800.595.4401 cold feet presents emma email marketing for discriminating organizations everywhere visit www.myemma.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Pl/Pgsql Functions running simultaneously
Thanks for your reply. My comments are below. On Friday 04 June 2004 16:39, you wrote: > Uh... I don't think this is necessarily the wrong list, sometimes people > don't have much to chime in. You could try reposting to -sql or -general > I suppose. I'll try one of those. > > As for my take on your questions, I wasn't exactly clear on what the > problem is. If its just that things seem slow, make sure you have done > the appropriate vacuum/analyze/reindex tech and then try adding some > debug info to the function to determine where in the function it is > slowing down. Yeah, I do a fair amount of vacuum/analyze , but I am unclear as to when I should run REINDEX. Is their a way to tell that indexes have become corrupt, or need to be reindexed? > > queries inside plpgsql functions will take locks as needed, but they are > no different than regular statements, just keep in mind that the queries > inside the function will work like an implicit transaction. I've noticed. Thanks for the info. > > Robert Treat > > On Thu, 2004-06-03 at 17:38, Marcus Whitney wrote: > > Am I on the wrong list to ask this question, or does this list usually > > have low activity? Just asking because I am new and I need to know where > > to ask this question. Thanks. > > > > On Wednesday 02 June 2004 16:08, Marcus Whitney wrote: > > > Hello all, > > > > > > I have an import function that I have been working on for some time > > > now, and it performed well up until recently. It is doing a lot, and > > > because the queries are not cached, I am not sure if that is what the > > > problem is. If a function takes a while, does it lock any of the tables > > > it is accessing, even for SELECT? > > > > > > Below is the bulk of the function: > > > > > > -- set sql statement variables > > > create_import_file_sql := ''COPY '' || container_table || '' > > > ('' || filtered_container_columns || '') TO '' || > > > quote_literal(formatted_import_file) || '' WITH NULL AS '' || > > > null_single_quotes; > > > upload_to_import_table_sql := ''COPY '' || import_table || '' > > > ('' > > > > > > || field_names || '') FROM '' || quote_literal(formatted_import_file) > > > || || '' > > > > > > WITH NULL AS '' || null_single_quotes; > > > clean_personalization_fields_sql := ''UPDATE '' || import_table > > > || '' SET emma_member_email = btrim(emma_member_email, '' || > > > quote_literal(quoted_single_quote) || '') , emma_member_name_first = > > > btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote) > > > || '') , emma_member_name_last = btrim(emma_member_name_last, '' || > > > quote_literal(quoted_single_quote) || '') ;''; > > > clean_personalization_fields_sql2 := ''UPDATE '' || > > > import_table || '' SET emma_member_email = btrim(emma_member_email) , > > > emma_member_name_first = btrim(emma_member_name_first) , > > > emma_member_name_last = > > > btrim(emma_member_name_last) ;''; > > > set_account_id_sql := ''UPDATE '' || import_table || '' SET > > > emma_account_id = '' || account_id; > > > set_default_active_status_sql := ''UPDATE '' || import_table || > > > '' SET emma_member_status_id = 1''; > > > set_errors_for_null_email_sql := ''UPDATE '' || import_table || > > > '' SET emma_member_status_id = 2 WHERE emma_member_email IS NULL''; > > > record_null_email_count_sql := ''UPDATE '' || import_history_table > > > > > > || '' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' || > > > > > > import_table || '' WHERE emma_member_email IS NULL) WHERE > > > emma_import_history_id ='' || import_history_id; > > > set_errors_for_invalid_email_sql := ''UPDATE '' || import_table > > > || '' SET emma_member_status_id = 2 WHERE emma_member_email !~* '' || > > > email_regex; record_invalid_email_count_sql := ''UPDATE '' || > > > import_history_table > > > > > > || '' SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM '' > > > || || > > > > > > import_table || '' WHERE emma_member_email !~* '' || email_regex || '' > > > ) WHERE emma_import_history_id ='' || import_history_id; > > > get_dupes_in_import_sql := ''SELECT emma_member_email, > > > emma_member_status_id FROM '' || import_table || '' GROUP BY > > > emma_member_email, emma_member_status_id having count(*) > 1''; > > > insert_dupes_sql := ''INSERT INTO '' || dupe_table || '' > > > SELECT * FROM '' || import_table || '' WHERE LOWER(emma_member_email) = > > > LOWER('' || member_table || ''.emma_member_email)''; > > > record_table_dupe_count_sql := ''UPDATE '' || > > > import_history_table > > > > > > || '' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM '' > > > || || > > > > > > import_table || '' WHERE emma_member_email = LOWER('' || member_table > > > || ''.emma_member_email)) WHERE emma_import_history_id ='' || > > > import_history_id; remove_dupes_from_import_table_sql := ''DELETE FROM > > > '' > > > > > > || import_table > > > || > > > || ''
Re: [PERFORM] Relation of cpu_*_costs?
Dear Tom, Thanks for your response. - Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> Sent: Monday, June 07, 2004 3:51 PM > That's pretty hard to believe; particularly on modern machines, I'd > think that moving it down would make more sense than moving it up. > You're essentially asserting that the CPU time to process one tuple > is almost half of the time needed to bring a page in from disk. That is exactly what I had in mind. We found that 5x10krpm HW RAID 5 array blazing fast, while we were really disappointed about CPU. E.g. * tar'ing 600MB took seconds; gzip'ing it took minutes. * initdb ran so fast that I didn't have time to hit Ctrl+C because I forgot a switch ;) * dumping the DB in or out was far faster than adddepend between 7.2 and 7.3 * iirc index scans returning ~26k rows of ~64k were faster than seq scan. (most suspicious case of disk cache) But whatever is the case with my hardware -- could you tell me something (even a search keyword ;) ) about my theoretical question: i.e. relation of cpu_*_costs? > I suspect that your test cases were toy cases small enough to be > fully cached and thus not incur any actual I/O ... Dunno. The server has 1GB RAM; full DB is ~100MB; largest query was ~7k which moved at least 2 tables of >200k rows and several smaller ones. If it is a "toy case" for such hw, I humbly accept your opinion. BTW its runtime improved from 53 to 48 sec -- all due to changing cpu tuple cost. I ran the query at different costs, in fast succession: runcostsec #10.0153 #20.4 50 #31.0 48 #41.0 48 #50.4 48 #60.0153 For the second result, I'd say disk cache, yes-- but what about the last result? It's all the same as the first one. Must have been some plan change (I can send the exp-ana results if you wish) G. %--- cut here ---% \end ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] is it possible to for the planner to optimize this form?
Right now, I am having trouble getting the planner to optimize queries in the form of select t.key, t.field from t a where ( select count(*) from t b where b.field > a.field ) = k The subplan (either index or seq. scan) executes once for each row in t, which of course takes forever. This query is a way of achieving LIMIT type results (substitute n-1 desired rows for k) using standard SQL, which is desirable in some circumstances. Is it theoretically possible for this to be optimized? Merlin ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 2004-06-07 at 02:26, Kris Jurka wrote: > On Sat, 5 Jun 2004, Steve Wampler wrote: > > > > > [I want to use copy from JDBC] > > > > I made a patch to the driver to support COPY as a PG extension. ... > http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php Thanks Kris - that patch worked beautifully and bumped the insert rate from ~1000 entries/second to ~9000 e/s in my test code. Here's hoping it makes it into 7.5. I do have a little concern about what's happening in the back end during the copy - I suspect the entire table is locked, which may impact the performance when multiple clients are saving entries into the table. Anyone know if that's how COPY works? (For that matter, would that also be true of a transaction consisting of a set of inserts?) Thanks again! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 2004-06-07 at 10:40, Steve Wampler wrote: > Thanks Kris - that patch worked beautifully and bumped the > insert rate from ~1000 entries/second to ~9000 e/s in my > test code. As a followup - that 9000 e/s becomes ~21,000 e/s if I don't have the java code also dump the message to standard output! -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] seq scan woes
A production system has had a query recently degrade in performance. What once took < 1s now takes over 1s. I have tracked down the problem to a working example. Compare http://rafb.net/paste/results/itZIx891.html with http://rafb.net/paste/results/fbUTNF95.html The first shows the query as is, without much change (actually, this query is nested within a larger query, but it demonstrates the problem). The query time is about 1 second. In the second URL, a "SET ENABLE_SEQSCAN TO OFF;" is done, and the time drops to 151ms, which is acceptable. What I don't understand is why the ports table is scanned in the first place. Clues please? -- Dan Langille : http://www.langille.org/ BSDCan - http://www.bsdcan.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [PERFORMANCE] slow small delete on large table
[EMAIL PROTECTED] ("Ed L.") wrote: > A 7.3.4 question... > > I want to "expire" some data after 90 days, but not delete too > much at once so as not to overwhelm a system with precariously > balanced disk I/O and on a table with millions of rows. If I > could say it the way I think for a simple example, it'd be > like this: > > delete from mytable > where posteddatetime < now() - '90 days' > limit 100; > > Of course, that's not legal 7.3.4 syntax. These are both too > slow due to sequential scan of table: > > delete from mytable where key in ( > select key > from mytable > where posteddatetime < now() - '90 days' > limit 100); > or > delete from mytable where exists ( > select m.key > from mytable m > where m.key = mytable.key > and m.posteddatetime < now() - '90 days' > limit 100); > > Tried to use a cursor, but couldn't figure out the syntax > for select-for-delete yet, or find appropriate example on > google. Any clues? I'm hoping that there's an index on posteddatetime, right? There are several approaches that would be quite sensible to consider... 1. Delete records as often as possible, so that the number deleted at any given time stays small. 2. Or find an hour at which the system isn't busy, and blow through a lot of them then. 3. Open a cursor querying records in your acceptable range, e.g. declare nukem cursor for select key from mytable where posteddate < now() - '90 days'::interval; Fetch 100 entries from the cursor, and submit, across another connection, delete requests for the 100 entries, all as one transaction, which you commit. Sleep a bit, and fetch another 100. Note that the cursor will draw groups of 100 entries into memory; it's good to immediately delete them, as they'll be in buffers. Keeping the number of rows deleted small, and sleeping a bit, means you're not trashing buffers too badly. The query doesn't enforce any particular order on things; it effect chews out old entries in any order the query finds them. If you can't keep up with insertions, there could be rather old entries that would linger around... This parallels the "sleepy vacuum" that takes a similar strategy to keeping vacuums from destroying performance. 4. Rotor tables. Have "mytable" be a view on a sequence of tables. create view mytable as select * from mytable1 union all select * from mytable2 union all select * from mytable3 union all select * from mytable4 union all select * from mytable5 union all select * from mytable6 union all select * from mytable7 union all select * from mytable8 union all select * from mytable9 union all select * from mytable10 A rule can choose an appropriate table from the 9 to _actually_ insert into. Every 3 days, you truncate the eldest table and rotate on to insert into the next table. That will take mere moments, which is real helpful to save you I/O on the deletes. There is an unfortunate other problem with this; joins against mytable are pretty bad, and self-joins effectively turn into a union all across 100 joins. (Table 1 against 1-10, Table 2 against 1-10, and so forth...) For this not to suck rather incredibly requires fairly carefully structuring queries on the table. That may or may not be compatible with your needs... -- let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;; http://cbbrowne.com/info/x.html A Linux machine! because a 486 is a terrible thing to waste! -- <[EMAIL PROTECTED]> Joe Sloan ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] seq scan woes
On 7 Jun 2004 at 16:00, Rod Taylor wrote: > On Mon, 2004-06-07 at 15:45, Dan Langille wrote: > > A production system has had a query recently degrade in performance. > > What once took < 1s now takes over 1s. I have tracked down the > > problem to a working example. > > What changes have you made to postgresql.conf? Nothing recently (ie. past few months). Nothing at all really. Perhaps I need to start tuning that. > Could you send explain analyse again with SEQ_SCAN enabled but with > nested loops disabled? See http://rafb.net/paste/results/zpJEvb28.html 13s > Off the cuff? I might hazard a guess that effective_cache is too low or > random_page_cost is a touch too high. Probably the former. I grep'd postgresql.conf: #effective_cache_size = 1000# typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost NOTE: both above are commented out. Thank you -- Dan Langille : http://www.langille.org/ BSDCan - http://www.bsdcan.org/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] is it possible to for the planner to optimize this form?
Merlin, > select t.key, t.field from t a > where > ( > select count(*) from t b > where b.field > a.field > ) = k > > The subplan (either index or seq. scan) executes once for each row in t, > which of course takes forever. > > This query is a way of achieving LIMIT type results (substitute n-1 > desired rows for k) using standard SQL, which is desirable in some > circumstances. Is it theoretically possible for this to be optimized? I don't think so, no. PostgreSQL does have some issues using indexes for count() queires which makes the situation worse. However, with the query you presented, I don't see any way around the planner executing the subquery once for every row in t. Except, of course, for some kind of scheme involving materialized views, if you don't need up-to-the minute data. In that case, you could store in a table the count(*)s of t for each threshold value of b.field. But, dynamically, that would be even slower. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] seq scan woes
On 7 Jun 2004 at 16:38, Rod Taylor wrote: > On Mon, 2004-06-07 at 16:12, Dan Langille wrote: > > On 7 Jun 2004 at 16:00, Rod Taylor wrote: > > > > > On Mon, 2004-06-07 at 15:45, Dan Langille wrote: > > > > A production system has had a query recently degrade in performance. > > > > What once took < 1s now takes over 1s. I have tracked down the > > > > problem to a working example. > > > > > > What changes have you made to postgresql.conf? > > > > Nothing recently (ie. past few months). Nothing at all really. > > Perhaps I need to start tuning that. > > > > > Could you send explain analyse again with SEQ_SCAN enabled but with > > > nested loops disabled? > > > > See http://rafb.net/paste/results/zpJEvb28.html > > This doesn't appear to be the same query as we were shown earlier. My apologies. I should try to cook dinner and paste at the same time. ;) http://rafb.net/paste/results/rVr3To35.html is the right query. > > > Off the cuff? I might hazard a guess that effective_cache is too low or > > > random_page_cost is a touch too high. Probably the former. > > > > I grep'd postgresql.conf: > > > > #effective_cache_size = 1000# typically 8KB each > > #random_page_cost = 4 # units are one sequential page fetch cost > > This would be the issue. You haven't told PostgreSQL anything about your > hardware. The defaults are somewhat modest. > > http://www.postgresql.org/docs/7.4/static/runtime-config.html > > Skim through the run-time configuration parameters that can be set in > postgresql.conf. > > Pay particular attention to: > * shared_buffers (you may be best with 2000 or 4000) > * effective_cache_size (set to 50% of ram size if dedicated db > machine) > * random_page_cost (good disks will bring this down to a 2 from a > 4) I'll have a play with that and report back. Thanks. -- Dan Langille : http://www.langille.org/ BSDCan - http://www.bsdcan.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?
On Mon, 7 Jun 2004, Steve Wampler wrote: > I do have a little concern about what's happening in the > back end during the copy - I suspect the entire table is > locked, which may impact the performance when multiple > clients are saving entries into the table. Anyone know > if that's how COPY works? (For that matter, would that > also be true of a transaction consisting of a set of > inserts?) > The table is not locked in either the copy or the insert case. Kris Jurka ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Join slow on "large" tables
Hello list, Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition: /dev/sda9 29G 8.9G 20G 31% /home2 /dev/sda9 on /home2 type jfs (rw) Version() PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7) I have a view to join two tables inventory details (pkardex) and inventory documents header (pmdoc) this view usually runs pretty slow as indicated in the explain analyze, pkardex is 1943465 rows and its size aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view definition is: SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto, pdc_custid)::text || ' '::text) || (pdc_custname)::text))::character varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN (pkd_es = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, CASE WHEN (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS ksalida, pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk))); Shared memory is: /root: cat /proc/sys/kernel/shmmax 1073741824 and postgresql.conf have this settings: tcpip_socket = true sort_mem = 8190 # min 64, size in KB vacuum_mem = 262144 # min 1024, size in KB checkpoint_segments = 10 max_connections = 256 shared_buffers = 32000 effective_cache_size = 16 # typically 8KB each random_page_cost = 2# units are one sequ The explain analyze is: dbmund=# explain analyze select * from vkardex where kprocode='1017'; Nested Loop (cost=0.00..32155.66 rows=5831 width=114) (actual time=18.223..47983.157 rows=4553 loops=1) -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) Index Cond: ((pkd_pcode)::text = '1017'::text) -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 width=50) (actual time=1.659..1.661 rows=1 loops=5049) Index Cond: (pmdoc.pdc_pk = "outer".doctofk) Total runtime: 47988.067 ms (6 rows) Does anyone can help me how to properly tune postgresql to gain some speed in such queries, some people have mentioned a RAM increase is necesary, about 8GB or more to have postgresql to run smooth, any comment or suggestion. I really appreciate any help. Regards, -- Sinceramente, Josué Maldonado. "Que se me den seis líneas escritas de puño y letra del hombre más honrado del mundo, y hallaré en ellas motivos para hacerle ahorcar." --cardenal Richelieu (Cardenal y político francés. 1.585 - 1.642) ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Join slow on "large" tables
Josue' > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 > rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) Looks to me like there's a problem with index bloat on pkd_pcode_idx. Try REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Join slow on "large" tables
On Mon, 2004-06-07 at 16:19, Josuà Maldonado wrote: > Hello list, > > Server is dual Xeon 2.4, 2GBRAM, Postgresql is running on partition: > /dev/sda9 29G 8.9G 20G 31% /home2 > /dev/sda9 on /home2 type jfs (rw) > > Version() > PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 > 20020903 (Red Hat Linux 8.0 3.2-7) > > I have a view to join two tables inventory details (pkardex) and > inventory documents header (pmdoc) this view usually runs pretty slow as > indicated in the explain analyze, pkardex is 1943465 rows and its size > aprox 659MB, pmdoc is 1183520 rows and its size is aprox 314MB. The view > definition is: > > SELECT pkd_pk AS kpk, (pkd_stamp)::date AS kfecha, pkd_docto AS kdocto, > pdc_custid)::text || ' '::text) || > (pdc_custname)::text))::character > varying(50) AS kclpv, pkd_saldo AS ksaldo, pkd_es AS kes, CASE WHEN > (pkd_es > = 'E'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS kentrada, > CASE WHEN > (pkd_es = 'S'::bpchar) THEN pkd_qtyinv ELSE (0)::numeric END AS > ksalida, > pkd_pcode AS kprocode, pkd_price AS kvalor, pdc_tipdoc AS ktipdoc > FROM (pkardex JOIN pmdoc ON ((pmdoc.pdc_pk = pkardex.doctofk))); > > > Shared memory is: > /root: cat /proc/sys/kernel/shmmax > 1073741824 > > and postgresql.conf have this settings: > tcpip_socket = true > sort_mem = 8190 # min 64, size in KB > vacuum_mem = 262144 # min 1024, size in KB > checkpoint_segments = 10 > max_connections = 256 > shared_buffers = 32000 > effective_cache_size = 16 # typically 8KB each > random_page_cost = 2# units are one sequ > > The explain analyze is: > dbmund=# explain analyze select * from vkardex where kprocode='1017'; > Nested Loop (cost=0.00..32155.66 rows=5831 width=114) (actual > time=18.223..47983.157 rows=4553 loops=1) > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 > rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) > Index Cond: ((pkd_pcode)::text = '1017'::text) > -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 > width=50) (actual time=1.659..1.661 rows=1 loops=5049) > Index Cond: (pmdoc.pdc_pk = "outer".doctofk) > Total runtime: 47988.067 ms > (6 rows) OK, you have to ask yourself a question here. Do I have enough memory to let both postgresql and the kernel to cache this data, or enough memory for only one. Then, you pick one and try it out. But there's some issues here. PostgreSQL's shared buffer are not, and should not generally be thought of as "cache". A cache's job it to hold the whole working set, or as much as possible, ready for access. A buffer's job is to hold all the data we're tossing around right this second. Once we're done with the data, the buffers can and do just drop whatever was in them. PostgreSQL does not have caching, in the classical sense. that may or may not change. The kernel, on the other hand, has both cache and buffer. Ever notice that a Linux top shows the cache usually being much bigger than the buffers? My 512 Meg home box right now has 252968k for cache, and 43276k for buffers. Now, you're tossing around enough data to actually maybe have a use for a huge set of buffers, but this means you'll need to starve your cache to get enough buffers. Which means that if one process does this kind of join, drops connection, and two seconds later, another process connects and does nearly the same thing, it's likely to have to read it all from the hard drives again, as it's not in the postgresql buffer, and not in the kernel cache. Starting a seperate connection, doing a simple select * from table1; sekect * from table 2, dropping the result set returned, and staying connected seems to be enough to get 7.4 to hold onto the data. PostgreSQL's current buffer management algo is dirt simple. The ones in the kernel's cache are quite good. So you can quickly reach a point where PostgreSQL is chasing it's tail where the kernel would have done OK. Your numbers show that you are tossing 659M and 314M against each other, but I don't know if you're harvesting the whole set at once, or just a couple row of each. Indexing help, or is this always gonna be a big seq scan of 90% of both tables? If you are getting the whole thing all the time, and want postgresql to buffer the whole thing (I recommend against it, although a very few circumstances seem to support it) you need to have 973M of buffer. That would be 124544 or we'll just call it 13. This high of a number means you will be getting more than 50% of the RAM for postgreSQL. At that point, it seems you might as well go for broke and grab most of it, ~20 or so. If you're not always mushing the two things against each other, and you've got other datasets to interact with, index it. Oh, in your reply you might to include an explain analyze of the query, and maybe an output of top while the query is runn
Re: [PERFORM] seq scan woes
On 7 Jun 2004 at 16:38, Rod Taylor wrote: > On Mon, 2004-06-07 at 16:12, Dan Langille wrote: > > I grep'd postgresql.conf: > > > > #effective_cache_size = 1000# typically 8KB each > > #random_page_cost = 4 # units are one sequential page fetch cost > > This would be the issue. You haven't told PostgreSQL anything about your > hardware. The defaults are somewhat modest. > > http://www.postgresql.org/docs/7.4/static/runtime-config.html > > Skim through the run-time configuration parameters that can be set in > postgresql.conf. > > Pay particular attention to: > * shared_buffers (you may be best with 2000 or 4000) I do remember increasing this in the past. It was now at 1000 and is now at 2000. see http://rafb.net/paste/results/VbXQcZ87.html > * effective_cache_size (set to 50% of ram size if dedicated db > machine) The machine has 512MB RAM. effective_cache_size was at 1000. So let's try a 256MB cache. Does that the match a 32000 setting? I tried it. The query went to 1.5s. At 8000, the query was 1s. At 2000, the query was about 950ms. This machine is a webserver/database/mail server, but the FreshPorts database is by far its biggest task. > * random_page_cost (good disks will bring this down to a 2 from a > 4) I've got mine set at 4. Increasing it to 6 gave me a 1971ms query. At 3, it was a 995ms. Setting it to 2 gave me a 153ms query. How interesting. For camparison, I reset shared_buffers and effective_cache_size back to their original value (both at 1000). This gave me a 130-140ms query. The disks in question is: ad0: 19623MB [39870/16/63] at ata0-master UDMA100 I guess that might be this disk: http://www.harddrives4less.com/ibmdes6020ua2.html I invite comments upon my findings. Rod: thanks for the suggestions. > > > -- > Rod Taylor > > Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL > PGP Key: http://www.rbt.ca/signature.asc > > -- Dan Langille : http://www.langille.org/ BSDCan - http://www.bsdcan.org/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] seq scan woes
On 7 Jun 2004 at 18:49, Dan Langille wrote: > On 7 Jun 2004 at 16:38, Rod Taylor wrote: > > * random_page_cost (good disks will bring this down to a 2 from a > > 4) > > I've got mine set at 4. Increasing it to 6 gave me a 1971ms query. > At 3, it was a 995ms. Setting it to 2 gave me a 153ms query. > > How interesting. The explain analyse: http://rafb.net/paste/results/pWhHsL86.html -- Dan Langille : http://www.langille.org/ BSDCan - http://www.bsdcan.org/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Join slow on "large" tables
Hi Josh and thanks for your response, El 07/06/2004 4:31 PM, Josh Berkus en su mensaje escribio: Josue' -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..11292.52 rows=5831 width=72) (actual time=18.152..39520.406 rows=5049 loops=1) Looks to me like there's a problem with index bloat on pkd_pcode_idx. Try REINDEXing it, and if that doesn't help, VACUUM FULL on pkardex. Recreated the index (drop then create) and did the vacuum full pkardex and the behavior seems to be the same: dbmund=# explain analyze select * from vkardex where kprocode='1013'; Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual time=846.318..16030.633 rows=3145 loops=1) -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) Index Cond: ((pkd_pcode)::text = '1013'::text) -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 width=50) (actual time=4.482..4.484 rows=1 loops=3544) Index Cond: (pmdoc.pdc_pk = "outer".doctofk) Total runtime: 16033.807 ms (6 rows) At the time the querie was running top returned: 5:11pm up 1:28, 3 users, load average: 0.19, 0.97, 1.41 69 processes: 66 sleeping, 1 running, 2 zombie, 0 stopped CPU0 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU1 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle CPU2 states: 0.1% user, 0.4% system, 0.0% nice, 98.4% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 2069596K av, 1477784K used, 591812K free, 0K shrd,2336K buff Swap: 2096440K av,9028K used, 2087412K free 1388372K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1225 postgres 17 0 257M 257M 255M S 0.6 12.7 7:14 postmaster 1978 postgres 11 0 1044 1044 860 R 0.2 0.0 0:00 top 1 root 9 0 472 444 428 S 0.0 0.0 0:04 init 2 root 8 0 00 0 SW0.0 0.0 0:00 keventd and free returned: /root: free total used free sharedbuffers cached Mem: 20695961477832 591764 0 23201388372 -/+ buffers/cache: 871401982456 Swap: 2096440 90282087412 I'm not a Linux guru, it looks like a memory leak. -- Sinceramente, Josué Maldonado. "Las palabras de aliento después de la censura son como el sol tras el aguacero." ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Join slow on "large" tables
Josue' > dbmund=# explain analyze select * from vkardex where kprocode='1013'; > Nested Loop (cost=0.00..2248.19 rows=403 width=114) (actual > time=846.318..16030.633 rows=3145 loops=1) > -> Index Scan using pkd_pcode_idx on pkardex (cost=0.00..806.27 > rows=403 width=72) (actual time=0.054..87.393 rows=3544 loops=1) > Index Cond: ((pkd_pcode)::text = '1013'::text) > -> Index Scan using pdc_pk_idx on pmdoc (cost=0.00..3.55 rows=1 > width=50) (actual time=4.482..4.484 rows=1 loops=3544) > Index Cond: (pmdoc.pdc_pk = "outer".doctofk) > Total runtime: 16033.807 ms Huh? It is not at all the same. Your index scan is down to 87ms from 27,000! And the total query is down to 16seconds from 47 seconds. Don't you consider that an improvement? -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Join slow on "large" tables
Josh, El 07/06/2004 5:21 PM, Josh Berkus en su mensaje escribio: Huh? It is not at all the same. Your index scan is down to 87ms from 27,000! And the total query is down to 16seconds from 47 seconds. Don't you consider that an improvement? Yes there was an improvement with respect the previus query, but still 16 seconds is too slow for that query. And usually the query takes more than 10 seconds even with small data sets returned. Thanks, -- Sinceramente, Josué Maldonado. "La cultura es capaz de descifrar los enigmas en que nos envuelve la vida." ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] is it possible to for the planner to optimize this
Title: RE: [PERFORM] is it possible to for the planner to optimize this form? I didn't really look that closely at the problem but have you thought of trying: select t.key, t.field from t a , (select count(*) as cntb from t b where b.field > a.field) as dmytbl where cntb = k This is called an inline view or sometimes a nested table. You would be joining table t to this inline view with the join criteria being "cntb = k" where k is in t. -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED]] Sent: Monday, June 07, 2004 1:32 PM To: Merlin Moncure; [EMAIL PROTECTED] Subject: Re: [PERFORM] is it possible to for the planner to optimize this form? Merlin, > select t.key, t.field from t a > where > ( > select count(*) from t b > where b.field > a.field > ) = k > > The subplan (either index or seq. scan) executes once for each row in t, > which of course takes forever. > > This query is a way of achieving LIMIT type results (substitute n-1 > desired rows for k) using standard SQL, which is desirable in some > circumstances. Is it theoretically possible for this to be optimized? I don't think so, no. PostgreSQL does have some issues using indexes for count() queires which makes the situation worse. However, with the query you presented, I don't see any way around the planner executing the subquery once for every row in t. Except, of course, for some kind of scheme involving materialized views, if you don't need up-to-the minute data. In that case, you could store in a table the count(*)s of t for each threshold value of b.field. But, dynamically, that would be even slower. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [SQL] Materialized View Summary
On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: > > I've written a summary of my findings on implementing and using > materialized views in PostgreSQL. I've already deployed eagerly updating > materialized views on several views in a production environment for a > company called RedWeek: http://redweek.com/. As a result, some queries > that were taking longer than 30 seconds to run now run in a fraction of a > millisecond. > > You can view my summary at > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html Interesting (and well written) summary. Even if not a "built in" feature, I'm sure that plenty of people will find this useful. Make sure it gets linked to from techdocs. If you could identify candidate keys on a view, you could conceivably automate the process even more. That's got to be possible in some cases, but I'm not sure how difficult it is to do in all cases. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] [PERFORMANCE] slow small delete on large table
After a long battle with technology, [EMAIL PROTECTED] ("Ed L."), an earthling, wrote: > On Monday February 23 2004 10:23, Tom Lane wrote: >> "Ed L." <[EMAIL PROTECTED]> writes: >> Depending on the size of mytable, you might need an "ANALYZE doomed" >> in there, but I'm suspecting not. A quick experiment suggests that >> you'll get a plan with an inner indexscan on mytable.key, which is >> exactly what you need. > > I didn't mention I'd written a trigger to do delete N rows on each new > insert (with a delay governor preventing deletion avalanches). The > approach looks a little heavy to be done from within a trigger with the > response time I need, but I'll try it. Cantchajust toss in that "limit N" > functionality to delete clauses? How hard could that be? ;) It's nonstandard, which will get you a certain amount of opposition "for free;" the problem with nonstandard behaviour is that sometimes the implications haven't been thought out... >> See also Chris Browne's excellent suggestions nearby, if you are willing >> to make larger readjustments in your thinking... > > I did a search for articles by Chris Browne, didn't see one that > appeared relevant. What is the thread subject to which you refer? It's in the same thread. I suggested having a daemon running a cursor (amounting to a slightly more expensive version of Tom's "doomed temp table" approach), or using "rotor" tables where you could TRUNCATE a table every few days which would be _really_ cheap... -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/emacs.html Expect the unexpected. -- The Hitchhiker's Guide to the Galaxy, page 7023 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] [HACKERS] [SQL] Materialized View Summary
On Tue, 2004-02-24 at 12:11, Richard Huxton wrote: > On Tuesday 24 February 2004 16:11, Jonathan M. Gardner wrote: > > > > I've written a summary of my findings on implementing and using > > materialized views in PostgreSQL. I've already deployed eagerly updating > > materialized views on several views in a production environment for a > > company called RedWeek: http://redweek.com/. As a result, some queries > > that were taking longer than 30 seconds to run now run in a fraction of a > > millisecond. > > > > You can view my summary at > > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html have you done much concurrency testing on your snapshot views? I implemented a similar scheme in one of my databases but found problems when I had concurrent "refresh attempts". I ended up serializing the calls view LOCKing, which was ok for my needs, but I thought potentially problematic in other cases. > > Interesting (and well written) summary. Even if not a "built in" feature, I'm > sure that plenty of people will find this useful. Make sure it gets linked to > from techdocs. Done. :-) > > If you could identify candidate keys on a view, you could conceivably automate > the process even more. That's got to be possible in some cases, but I'm not > sure how difficult it is to do in all cases. > it seems somewhere between Joe Conways work work arrays and polymorphic functions in 7.4 this should be feasible. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Speed up a function?CREATE TABLE readings ( "when" TIMESTAMP DEFAULT timeofday()::timestamp NOT NULL PRIMARY KEY, "barometer" FLOAT DEFAULT NULL,
Hey All, I've implemented a couple of functions ala date_trunc (listed at the bottom). These functions are executed every 5 minutes (date_trunc_minute) and every week (date_trunc_week) across 16 different values. The problem is that they take way too long to execute (nearly 7x the 'regular' date_trunc function). What might be the best way to fix the problem? Use a different function language? Re-write the functions? Re-write the queries? The gist of what I'm doing is filling schema tables with weather summary information for the time period in question. Currently I don't have indexes on these tables. Would indexs on readings."when" and minute.barometer."time" be used with date_trunc? Functional indexes maybe? CREATE TABLE readings ( "when" TIMESTAMP DEFAULT timeofday()::timestamp NOT NULL PRIMARY KEY, "barometer" FLOAT DEFAULT NULL, ); CREATE SCHEMA minute; CREATE TABLE minute.barometer ( "time" TIMESTAMP NOT NULL, min_reading FLOAT NOT NULL, max_reading FLOAT NOT NULL, avg_reading FLOAT NOT NULL ); The "hour" schema is identical to the "minute" schema. weather=# EXPLAIN ANALYZE weather-# SELECT p.period, p.min, p.max, p.avg weather-# FROM (SELECT date_trunc_minute( 'minute'::text, "when" ) AS period, weather(# min( barometer ), max( barometer ), avg( barometer ) weather(# FROM readings weather(# WHERE barometer NOTNULL weather(# GROUP BY period) AS p weather-# WHERE p.period weather-# NOT IN (SELECT "time" FROM minute.barometer ) weather-# AND p.period != date_trunc_minute( 'minute'::text, now()::timestamp ); QUERY PLAN - --- Subquery Scan p (cost=1665.63..2282.47 rows=13708 width=32) (actual time=3318.758..3318.758 rows=0 loops=1) Filter: (NOT (hashed subplan)) -> HashAggregate (cost=1501.61..1775.76 rows=27415 width=16) (actual time=3227.409..3263.367 rows=13918 loops=1) -> Seq Scan on readings (cost=0.00..1227.46 rows=27415 width=16) (actual time=1.075..3028.673 rows =69398 loops=1) Filter: ((barometer IS NOT NULL) AND (date_trunc_minute('minute'::text, "when") <> date_trunc_ minute('minute'::text, (now())::timestamp without time zone))) SubPlan -> Seq Scan on barometer (cost=0.00..144.02 rows=8002 width=8) (actual time=0.008..15.406 rows=13918 l oops=1) Total runtime: 3320.146 ms (8 rows) weather=# EXPLAIN ANALYZE weather-# SELECT p.period, p.min, p.max, p.avg weather-# FROM (SELECT date_trunc( 'hour'::text, "when" ) AS period, weather(# min( barometer ), max( barometer ), avg( barometer ) weather(# FROM readings weather(# WHERE barometer NOTNULL weather(# GROUP BY period) AS p weather-# WHERE p.period weather-# NOT IN (SELECT "time" FROM hour.barometer ) weather-# AND p.period != date_trunc( 'hour'::text, now()::timestamp ); QUERY PLAN - - Subquery Scan p (cost=1524.11..2140.95 rows=13708 width=32) (actual time=551.516..551.516 rows=0 loops=1) Filter: (NOT (hashed subplan)) -> HashAggregate (cost=1501.61..1775.76 rows=27415 width=16) (actual time=544.859..547.605 rows=1173 loo ps=1) -> Seq Scan on readings (cost=0.00..1227.46 rows=27415 width=16) (actual time=0.596..399.344 rows= 69353 loops=1) Filter: ((barometer IS NOT NULL) AND (date_trunc('hour'::text, "when") <> date_trunc('hour'::t ext, (now())::timestamp without time zone))) SubPlan -> Seq Scan on barometer (cost=0.00..20.00 rows=1000 width=8) (actual time=0.007..1.268 rows=1173 loop s=1) Total runtime: 552.137 ms CREATE FUNCTION date_trunc_week( text, timestamp ) RETURNS timestamp AS ' DECLARE reading_time ALIAS FOR $2; year timestamp; dow integer; adjust text; week text; BEGIN year := date_trunc( ''year''::text, reading_time ); week := date_part( ''week'', reading_time ) - 1 || '' week''; dow := date_part( ''dow'', year ); -- If the dow is less than Thursday, then the start week is last year IF dow <= 4 THEN adjust := 1 - dow || '' day''; ELSE adjust := 8 - dow || '' day''; END IF; RETURN year + adjust::interval + week::interval; END; ' LANGUAGE plpgsql IMMUTABLE STRICT; CREATE OR REPLACE FUNCTION date_trunc_minute( text, timestamp ) RETURNS timestam
Re: [PERFORM] pl/pgsql and Transaction Isolation
Marcus Whitney <[EMAIL PROTECTED]> writes: > I have an instance where I have a series of pl/pgsql calls, that report stat > results to a common table. When other queries try to hit the stat table > (with DML commands; SELECT, INSERT, UPDATE, DELETE etc.) they are forced to > wait in a queue until the pl/pgsql has finished executing. This is quite hard to believe, unless your pl/pgsql is doing something as unfriendly as LOCKing the table. Do you want to post a more complete description of your problem? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Join slow on "large" tables
=?ISO-8859-1?Q?Josu=E9_Maldonado?= <[EMAIL PROTECTED]> writes: > Recreated the index (drop then create) and did the vacuum full pkardex > and the behavior seems to be the same: Well, there was a pretty huge improvement in the pkardex scan time, whether you noticed it or not: 39520.406 to 87.393 msec. This definitely suggests that you've been lax about vacuuming this table. I'm wondering whether pmdoc might not be overdue for vacuuming as well. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]