Re: [PERFORM] database bloat,non removovable rows, slow query etc...
Are there open transactions on the table in question? We had the same issue. A 100K row table was so bloated that the system thought there was 1M rows. We had many IDLE transaction that we noticed in TOP, but since we could not track down which process or user was holding the table we had to restart Pg. Once restarted we were able to do a VACUUM FULL and this took care of the issue. hth Patrick Hatcher Development Manager Analytics/MIO Macys.com Matteo Sgalaberni [EMAIL PROTECTED] Sent by: To pgsql-performance pgsql-performance@postgresql.org [EMAIL PROTECTED] cc .org Subject [PERFORM] database bloat,non 09/01/06 05:39 AM removovable rows, slow query etc... Hi, probably this is a very frequenfly question... I read archivies of this list but I didn't found a finally solution for this aspect. I'll explain my situation. PSQL version 8.1.3 configuration of fsm,etcc default autovacuum and statistics activated 22 daemons that have a persistent connection to this database(all connection are in idle(no transaction opened). this is the vacuum output of a table that it's updated frequently: database=# VACUUM ANALYZE verbose cliente; INFO: vacuuming public.cliente INFO: index cliente_pkey now contains 29931 row versions in 88 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index cliente_login_key now contains 29931 row versions in 165 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.01s/0.00u sec elapsed 0.00 sec. INFO: cliente: found 0 removable, 29931 nonremovable row versions in 559 pages DETAIL: 29398 dead row versions cannot be removed yet. There were 9 unused item pointers. 0 pages are entirely empty. CPU 0.01s/0.01u sec elapsed 0.01 sec. INFO: vacuuming pg_toast.pg_toast_370357 INFO: index pg_toast_370357_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_toast_370357: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing public.cliente INFO: cliente: scanned 559 of 559 pages, containing 533 live rows and 29398 dead rows; 533 rows in sample, 533 estimated total rows VACUUM database=# SELECT * from pgstattuple('cliente'); table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent ---+-+---+---+--++++-- 4579328 | 533 | 84522 | 1.85 |29398 | 4279592 | 93.45 | 41852 | 0.91 (1 row) The performance of this table it's degraded now and autovacuum/vacuum full don't remove these dead tuples. Only if I do a CLUSTER of the table the tuples are removed. The same problem is on other very trafficated tables. I think that the problems probably are: - tune the value of my fsm/etc settings in postgresql.conf but i don't understdand how to tune it correctly. - the persistent connections to this db conflict with the autovacuum but i don't understand why. there are no transaction opened, only connections in idle state. Tell me what do you think... Regards, Matteo ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] database model tshirt sizes
We have size and color in the product table itself. It is really an attribute of the product. If you update the availability of the product often, I would split out the quantity into a separate table so that you can truncate and update as needed. Patrick Hatcher Development Manager Analytics/MIO Macys.com NbForYou [EMAIL PROTECTED] .com To Sent by: pgsql-performance@postgresql.org pgsql-performance cc [EMAIL PROTECTED] .org Subject [PERFORM] database model tshirt sizes 03/18/06 07:03 AM Hello, Does anybody know how to build a database model to include sizes for rings, tshirts, etc? the current database is built like: table product = productid int8 PK productname charvar(255) quantity int4 what i want now is that WHEN (not all products have multiple sizes) there are multiple sizes available. The sizes are stored into the database. I was wondering to include a extra table: table sizes: productid int8 FK size varchar(100) but then i have a quantity problem. Because now not all size quantities can be stored into this table, because it allready exist in my product table. How do professionals do it? How do they make their model to include sizes if any available? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow query. Any way to speed up?
Duh sorry. We will eventually move to 8.x, it's just a matter of finding the time: Explain analyze Select gmmid, gmmname, divid, divname, feddept, fedvend,itemnumber as mstyle,amc_week_id, sum(tran_itm_total) as net_dollars FROM public.tbldetaillevel_report a2 join cdm.cdm_ddw_tran_item a1 on a1.item_upc = a2.upc join public.date_dim a3 on a3.date_dim_id = a1.cal_date where a3.date_dim_id between '2005-10-30' and '2005-12-31' and a1.appl_id in ('MCOM','NET') and a1.tran_typ_id in ('S','R') group by 1,2,3,4,5,6,7,8 order by 1,2,3,4,5,6,7,8 GroupAggregate (cost=1648783.47..1650793.74 rows=73101 width=65) (actual time=744556.289..753136.278 rows=168343 loops=1) - Sort (cost=1648783.47..1648966.22 rows=73101 width=65) (actual time=744556.236..746634.566 rows=1185096 loops=1) Sort Key: a2.gmmid, a2.gmmname, a2.divid, a2.divname, a2.feddept, a2.fedvend, a2.itemnumber, a3.amc_week_id - Merge Join (cost=1598067.59..1642877.78 rows=73101 width=65) (actual time=564862.772..636550.484 rows=1185096 loops=1) Merge Cond: (outer.upc = inner.item_upc) - Index Scan using report_upc_idx on tbldetaillevel_report a2 (cost=0.00..47642.36 rows=367309 width=58) (actual time=82.512..65458.137 rows=365989 loops=1) - Sort (cost=1598067.59..1598250.34 rows=73100 width=23) (actual time=564764.506..566529.796 rows=1248862 loops=1) Sort Key: a1.item_upc - Hash Join (cost=94.25..1592161.99 rows=73100 width=23) (actual time=493500.913..548924.039 rows=1248851 loops=1) Hash Cond: (outer.cal_date = inner.date_dim_id) - Seq Scan on cdm_ddw_tran_item a1 (cost=0.00..1547562.88 rows=8754773 width=23) (actual time=14.219..535704.691 rows=10838135 loops=1) Filter: appl_id)::text = 'MCOM'::text) OR ((appl_id)::text = 'NET'::text)) AND ((tran_typ_id = 'S'::bpchar) OR (tran_typ_id = 'R'::bpchar))) - Hash (cost=94.09..94.09 rows=64 width=8) (actual time=362.953..362.953 rows=0 loops=1) - Index Scan using date_date_idx on date_dim a3 (cost=0.00..94.09 rows=64 width=8) (actual time=93.710..362.802 rows=63 loops=1) Index Cond: ((date_dim_id = '2005-10-30'::date) AND (date_dim_id = '2005-12-31'::date)) Total runtime: 753467.847 ms Patrick Hatcher Development Manager Analytics/MIO Macys.com 415-422-1610 Tom Lane [EMAIL PROTECTED] s To Patrick Hatcher 01/05/06 09:07 PM [EMAIL PROTECTED] cc pgsql-performance@postgresql.org Subject Re: [PERFORM] Slow query. Any way to speed up? Patrick Hatcher [EMAIL PROTECTED] writes: The following SQL takes 4+ mins to run. I have indexes on all join fields and I've tried rearranging the table orders but haven't had any luck. Please show EXPLAIN ANALYZE output, not just EXPLAIN. It's impossible to tell whether the planner is making any wrong guesses when you can't see the actual times/rowcounts ... (BTW, 7.4 is looking pretty long in the tooth.) regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] slow update
Thanks. No foreign keys and I've been bitten by the mismatch datatypes and checked that before sending out the message :) Patrick Hatcher Development Manager Analytics/MIO Macys.com Tom Lane [EMAIL PROTECTED] s To Patrick Hatcher 10/13/2005 11:34 [EMAIL PROTECTED] AM cc postgres performance list pgsql-performance@postgresql.org Subject Re: [PERFORM] slow update Patrick Hatcher [EMAIL PROTECTED] writes: Pg 7.4.5 Trying to do a update of fields on 23M row database. Is it normal for this process to take 16hrs and still clocking? Are there foreign keys pointing at the table being updated? If so, failure to index the referencing columns could create this sort of performance problem. Also, in 7.4 you'd better be sure the referencing columns are the same datatype as the referenced column. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Poor SQL performance
Hey there folks. I'm at a loss as to how to increase the speed of this query. It's something I need to run each day, but can't at the rate this runs. Tables are updated 1/day and is vacuum analyzed after each load. select ddw_tran_key, r.price_type_id, t.price_type_id from cdm.cdm_ddw_tran_item_header h JOIN cdm.cdm_ddw_tran_item t on t.appl_xref=h.appl_xref JOIN mdc_upc u ON u.upc = t.item_upc JOIN mdc_price_history r ON r.upc_id = u.keyp_upc and date(r.site_timestamp) = h.first_order_date where cal_date = '2005-08-31' and h.appl_id= 'MCOM' and tran_typ_id='S' limit 1000 My explain is just horrendous: QUERY PLAN -- Limit (cost=288251.71..342657.36 rows=258 width=14) - Merge Join (cost=288251.71..342657.36 rows=258 width=14) Merge Cond: ((outer.appl_xref)::text = inner.?column6?) Join Filter: (date(inner.site_timestamp) = outer.first_order_date) - Index Scan using cdm_ddw_tran_item_header_pkey on cdm_ddw_tran_item_header h (cost=0.00..51188.91 rows=789900 width=21) Filter: ((appl_id)::text = 'MCOM'::text) - Sort (cost=288251.71..288604.31 rows=141038 width=39) Sort Key: (t.appl_xref)::text - Hash Join (cost=29708.54..276188.93 rows=141038 width=39) Hash Cond: (outer.upc_id = inner.keyp_upc) - Seq Scan on mdc_price_history r (cost=0.00..189831.09 rows=11047709 width=16) - Hash (cost=29698.81..29698.81 rows=3892 width=31) - Nested Loop (cost=0.00..29698.81 rows=3892 width=31) - Index Scan using cdm_ddw_tran_item_cal_date on cdm_ddw_tran_item t (cost=0.00..14046.49 rows=3891 width=35) Index Cond: (cal_date = '2005-08-31'::date) Filter: (tran_typ_id = 'S'::bpchar) - Index Scan using mdcupcidx on mdc_upc u (cost=0.00..4.01 rows=1 width=12) Index Cond: (u.upc = outer.item_upc) (18 rows) What I found is that I remove change the line: r.upc_id = u.keyp_upc and date(r.site_timestamp) = h.first_order_date To r.upc_id = u.keyp_upc My query plan drops to: QUERY PLAN --- Limit (cost=33327.39..37227.51 rows=1000 width=14) - Hash Join (cost=33327.39..279027.01 rows=62998 width=14) Hash Cond: (outer.upc_id = inner.keyp_upc) - Seq Scan on mdc_price_history r (cost=0.00..189831.09 rows=11047709 width=8) - Hash (cost=33323.05..33323.05 rows=1738 width=14) - Nested Loop (cost=0.00..33323.05 rows=1738 width=14) - Nested Loop (cost=0.00..26335.62 rows=1737 width=18) - Index Scan using cdm_ddw_tran_item_cal_date on cdm_ddw_tran_item t (cost=0.00..14046.49 rows=3891 width=35) Index Cond: (cal_date = '2005-08-31'::date) Filter: (tran_typ_id = 'S'::bpchar) - Index Scan using cdm_ddw_tran_item_header_pkey on cdm_ddw_tran_item_header h (cost=0.00..3.15 rows=1 width=17) Index Cond: ((outer.appl_xref)::text = (h.appl_xref)::text) Filter: ((appl_id)::text = 'MCOM'::text) - Index Scan using mdcupcidx on mdc_upc u (cost=0.00..4.01 rows=1 width=12) Index Cond: (u.upc = outer.item_upc) (15 rows) Unfortunately, I need this criteria since it contains the first date of the order and is used to pull the correct price. Any suggestions? TIA Patrick ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Sluggish server performance
Pg: 7.4.5 RH 7.3 Raid 0+1 (200G 15k RPM) Quad Xeon 8G ram 95% Read-only 5% - read-write I'm experiencing extreme load issues on my machine anytime I have more than 40 users connected to the database. The majority of the users appear to be in an idle state according TOP, but if more than3 or more queries are ran the system slows to a crawl. The queries don't appear to the root cause because they run fine when the load drops. I also doing routine vacuuming on the tables. Is there some place I need to start looking for the issues bogging down the server? Here are some of my settings. I can provide more as needed: cat /proc/sys/kernel/shmmax 175013888 max_connections = 100 #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 12288# min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 300 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 500 # min 100, ~50 bytes each #--- # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32# min 4, 8KB each # - Checkpoints - checkpoint_segments = 50# in logfile segments, min 1, 16MB each checkpoint_timeout = 1800 # range 30-3600, in seconds # - Planner Cost Constants - effective_cache_size = 262144 # typically 8KB each #effective_cache_size = 625000 # typically 8KB each random_page_cost = 2# units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) Patrick Hatcher ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Improve BULK insertion
I do mass inserts daily into PG. I drop the all indexes except my primary key and then use the COPY FROM command. This usually takes less than 30 seconds. I spend more time waiting for indexes to recreate.PatrickHatcherMacys.Com[EMAIL PROTECTED] wrote: -To: [EMAIL PROTECTED]From: Christopher Browne [EMAIL PROTECTED]Sent by: [EMAIL PROTECTED]Date: 2004-12-04 06:48AMSubject: Re: [PERFORM] Improve BULK insertionIn the last exciting episode, [EMAIL PROTECTED] (Grupos) wrote: Hi ! I need to insert 500.000 records on a table frequently. It´s a bulk insertion from my applicatoin. I am with a very poor performance. PostgreSQL insert very fast until the tuple 200.000 and after it the insertion starts to be really slow. I am seeing on the log and there is a lot of transaction logs, something like : 2004-12-04 11:08:59 LOG: recycled transaction log file "00060012" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060013" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060011" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060015" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060014" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060016" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060017" 2004-12-04 11:24:10 LOG: recycled transaction log file "00060018"It is entirely normal for there to be a lot of transaction log filerecycling when bulk inserts are taking place; that goes through a lotof transaction logs. How can I configure PostgreSQL to have a better performance on this bulk insertions ? I already increased the memory values.Memory is, as likely as not, NOT the issue.Two questions: 1. How are you doing the inserts? Via INSERT statements? Or via COPY statements? What sort of transaction grouping is involved? COPY is way faster than INSERT, and grouping plenty of updates into a single transaction is generally a "win." 2. What is the schema like? Does the table have a foreign key constraint? Does it have a bunch of indices? If there should eventually be lots of indices, it tends to be faster to create the table with none/minimal indices, and add indexes afterwards, as long as your "load" process can be trusted to not break "unique" constraints... If there is some secondary table with a foreign key constraint, and _that_ table is growing, it is possible that a sequential scan is being used to search the secondary table where, if you did an ANALYZE on that table, an index scan would be preferred once it grew to larger size...There isn't a particular reason for PostgreSQL to "hit a wall" uponseeing 200K records; I and coworkers routinely load database dumpsthat have millions of (sometimes pretty fat) records, and they don't"choke." That's true whether talking about loading things onto my(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAIDarray, or higher end stuff involving high end SMP and EMC disk arrays.The latter obviously being orders of magnitude faster than desktopequipment :-).-- (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")http://www3.sympatico.ca/cbbrowne/unix.htmlRules of the Evil Overlord #207. "Employees will have conjugal visittrailers which they may use provided they call in a replacement andsign out on the timesheet. Given this, anyone caught making out in acloset while leaving their station unmonitored will be shot."http://www.eviloverlord.com/---(end of broadcast)---TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Slow update/insert process
Thanks for the help. I found the culprit. The user had created a function within the function ( pm.pm_price_post_inc(prod.keyp_products)). Once this was fixed the time dropped dramatically. Patrick Hatcher Macys.Com Legacy Integration Developer 415-422-1610 office HatcherPT - AIM Patrick Hatcher [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/01/04 11:14 AM To [EMAIL PROTECTED] cc Subject [PERFORM] Slow update/insert process Pg: 7.4.5 RH 7.3 8g Ram 200 g drive space RAID0+1 Tables vacuum on a nightly basis The following process below takes 8 hours to run on 90k records and I'm not sure where to being to look for the bottleneck. This isn't the only updating on this database that seems to take a long time to complete. Is there something I should be looking for in my conf settings? TIA Patrick SQL: ---Bring back only selected records to run through the update process. --Without the function the SQL takes 10secs to return 90,000 records SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon')) FROM mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products and keyf_attribute=22 WHERE upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG HEUER') AND keyf_producttype222 AND prod.action_publish = 1; Function: CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar) RETURNS bool AS 'DECLARE varkeyf_upcALIAS FOR $1; varPasswordALIAS FOR $2; varRealMD5varchar; varDeltaMD5varchar; varLastTouchDatedate; varQuery text; varQuery1 text; varQueryMD5text; varQueryRecordrecord; varFuncStatusboolean := false; BEGIN -- Check the password IF varPassword \'amazon\' THEN Return false; END IF; -- Get the md5 hash for this product SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products)) AS md5 FROM public.mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ; IF NOT FOUND THEN RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc; ELSE varRealMD5:=varQueryRecord.md5; END IF; -- Check that the product is in the delta table and return its hash for comparison SELECT into varQueryRecord md5_hash,last_touch_date FROM pm.pm_delta_master_amazon WHERE keyf_upc = varkeyf_upc LIMIT 1; IF NOT FOUND THEN -- ADD and exit INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date) values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE); varFuncStatus:=true; RETURN varFuncStatus; ELSE --Update the record --- If the hash matches then set the record to HOLD IF varRealMD5 = varQueryRecord.md5_hash THEN UPDATE pm.pm_delta_master_amazon SET status= \'hold\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc AND last_touch_date CURRENT_DATE; varFuncStatus:=true; ELSE -- ELSE mark the item as ADD UPDATE pm.pm_delta_master_amazon SET status= \'add\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc; varFuncStatus:=true; END IF; END IF; RETURN varFuncStatus; END;' LANGUAGE 'plpgsql' IMMUTABLE; TableDef CREATE TABLE pm.pm_delta_master_amazon ( keyf_upc int4 , status varchar(6) , md5_hash varchar(40) , last_touch_datedate ) GO CREATE INDEX status_idx ON pm.pm_delta_master_amazon(status) GO CONF # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32# min 4, 8KB each # - Checkpoints - checkpoint_segments = 50# in logfile segments, min 1, 16MB each checkpoint_timeout = 600# range 30-3600, in seconds #checkpoint_warning = 30# 0 is off, in seconds #commit_delay = 0# range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Patrick Hatcher Macys.Com
[PERFORM] Slow update/insert process
Pg: 7.4.5 RH 7.3 8g Ram 200 g drive space RAID0+1 Tables vacuum on a nightly basis The following process below takes 8 hours to run on 90k records and I'm not sure where to being to look for the bottleneck. This isn't the only updating on this database that seems to take a long time to complete. Is there something I should be looking for in my conf settings? TIA Patrick SQL: ---Bring back only selected records to run through the update process. --Without the function the SQL takes 10secs to return 90,000 records SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon')) FROM mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products and keyf_attribute=22 WHERE upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG HEUER') AND keyf_producttype222 AND prod.action_publish = 1; Function: CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar) RETURNS bool AS 'DECLARE varkeyf_upc ALIAS FOR $1; varPassword ALIAS FOR $2; varRealMD5 varchar; varDeltaMD5 varchar; varLastTouchDatedate; varQuery text; varQuery1 text; varQueryMD5 text; varQueryRecordrecord; varFuncStatusboolean := false; BEGIN -- Check the password IF varPassword \'amazon\' THEN Return false; END IF; -- Get the md5 hash for this product SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products)) AS md5 FROM public.mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ; IF NOT FOUND THEN RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc; ELSE varRealMD5:=varQueryRecord.md5; END IF; -- Check that the product is in the delta table and return its hash for comparison SELECT into varQueryRecord md5_hash,last_touch_date FROM pm.pm_delta_master_amazon WHERE keyf_upc = varkeyf_upc LIMIT 1; IF NOT FOUND THEN -- ADD and exit INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date) values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE); varFuncStatus:=true; RETURN varFuncStatus; ELSE --Update the record --- If the hash matches then set the record to HOLD IF varRealMD5 = varQueryRecord.md5_hash THEN UPDATE pm.pm_delta_master_amazon SET status= \'hold\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc AND last_touch_date CURRENT_DATE; varFuncStatus:=true; ELSE -- ELSE mark the item as ADD UPDATE pm.pm_delta_master_amazon SET status= \'add\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc; varFuncStatus:=true; END IF; END IF; RETURN varFuncStatus; END;' LANGUAGE 'plpgsql' IMMUTABLE; TableDef CREATE TABLE pm.pm_delta_master_amazon ( keyf_upc int4 , status varchar(6) , md5_hash varchar(40) , last_touch_date date ) GO CREATE INDEX status_idx ON pm.pm_delta_master_amazon(status) GO CONF # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 50 # in logfile segments, min 1, 16MB each checkpoint_timeout = 600 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Patrick Hatcher Macys.Com
Re: [PERFORM] vacuum full max_fsm_pages question
I upgraded to 7.4.3 this morning and did a vacuum full analyze on the problem table and now the indexes show the correct number of records Patrick Hatcher Macys.Com Josh Berkus [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 09/21/04 10:49 AM To Patrick Hatcher [EMAIL PROTECTED] cc Robert Treat [EMAIL PROTECTED], [EMAIL PROTECTED] Subject Re: [PERFORM] vacuum full max_fsm_pages question Patrick, Sorry. I wrote PG 7.4.2 and then I erased it to write something else and then forgot to add it back. Odd. You shouldn't be having to re-vacuum on 7.4. And thanks for the Page info. I was getting frustrated and looked in the wrong place. So it's probably best to drop and readd the indexes then? Well, I have to wonder if you've not run afoul of the known 7.4.2 bug regarding indexes. This system hasn't had an improper database shutdown or power-out in the last few weeks, has it? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] vacuum full max_fsm_pages question
Sorry. I wrote PG 7.4.2 and then I erased it to write something else and then forgot to add it back. And thanks for the Page info. I was getting frustrated and looked in the wrong place. So it's probably best to drop and readd the indexes then? - Original Message - From: Robert Treat [EMAIL PROTECTED] To: Patrick Hatcher [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, September 20, 2004 11:12 PM Subject: Re: [PERFORM] vacuum full max_fsm_pages question On Tuesday 21 September 2004 00:01, Patrick Hatcher wrote: Hello. Couple of questions: - Q1: Today I decided to do a vacuum full verbose analyze on a large table that has been giving me slow performance. And then I did it again. I noticed that after each run the values in my indexes and estimate row version changed. What really got me wondering is the fact my indexes report more rows than are in the table and then the estimated rows is less than the actual amount. The table is a read-only table that is updated 1/wk. After updating it is vacuumed full. I've also tried reindexing but the numbers still change. Is this normal? Below is a partial output for 4 consecutive vacuum full analyzes. No data was added nor was there anyone in the table. This looks normal to me for a pre 7.4 database, if I am right your running on 7.2? Basically your indexes are overgrown, so each time you run vacuum you are shrinking the number of pages involved, which will change the row counts, and correspondingly change the count on the table as the sampled pages change. - Q2: I have about a dozen 5M plus row tables. I currently have my max_fsm_pages set to 300,000. As you can see in vacuum full output I supplied, one table is already over this amount. Is there a limit on the size of max_fsm_pages? The limit is based on your memory... each page = 6 bytes. But according to the output below you are not over 30 pages yet on that table (though you might be on some other tables.) CONF settings: # - Memory - shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 12288# min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 30 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 500 # min 100, ~50 bytes each Vacuum full information #after second vacuum full INFO: index emaildat_fkey now contains 8053743 row versions in 25764 pages DETAIL: 1895 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 2.38s/0.42u sec elapsed 11.11 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65882 pages, 3000 rows sampled, 392410 estimated total rows #after third vacuum full INFO: index emaildat_fkey now contains 8052738 row versions in 25769 pages DETAIL: 890 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 2.08s/0.32u sec elapsed 4.36 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65874 pages, 3000 rows sampled, 392363 estimated total rows #after REINDEX and vacuum full INFO: index emaildat_fkey now contains 8052369 row versions in 25771 pages DETAIL: 521 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.37s/0.35u sec elapsed 4.79 sec. INFO: analyzing cdm.cdm_email_data INFO: cdm_email_data: 65869 pages, 3000 rows sampled, 392333 estimated total rows #After vacuum full(s) mdc_oz=# select count(*) from cdm.cdm_email_data; count - 5433358 (1 row) I do think the count(*) seems a bit off based on the vacuum output above. I'm guessing you either have blocking transactions in the way or your not giving us a complete copy/paste of the session involved. -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] vacuum full max_fsm_pages question
Hello. Couple of questions: - Q1: Today I decided to do a vacuum full verbose analyze on a large table that has been giving me slow performance. And then I did itagain. I noticed that after each run the values in my indexes and estimate row versionchanged. What really got me wondering is the fact my indexes report more rows than are in the table and then the estimated rows is less than the actual amount. The table is a read-only table that is updated 1/wk. After updating it is vacuumed full. I've also tried reindexing but the numbers still change. Is this normal? Below is a partial output for 4 consecutive vacuum full analyzes. No data was added nor was there anyone in the table. - Q2: I have about a dozen 5M plus row tables. I currently have my max_fsm_pages set to 300,000. As you can see in vacuum full output I supplied, one table is already over this amount. Is there a limit on the size of max_fsm_pages? CONF settings: # - Memory - shared_buffers = 2000 # min 16, at least max_connections*2, 8KB eachsort_mem = 12288 # min 64, size in KB#vacuum_mem = 8192 # min 1024, size in KB # - Free Space Map - max_fsm_pages = 30 # min max_fsm_relations*16, 6 bytes eachmax_fsm_relations = 500 # min 100, ~50 bytes each Vacuum full information #after second vacuum full INFO: index "emaildat_fkey" now contains 8053743 row versions in 25764 pagesDETAIL: 1895 index row versions were removed.0 index pages have been deleted, 0 are currently reusable.CPU 2.38s/0.42u sec elapsed 11.11 sec.INFO: analyzing "cdm.cdm_email_data"INFO: "cdm_email_data": 65882 pages, 3000 rows sampled, 392410 estimated total rows #after third vacuum full INFO: index "emaildat_fkey" now contains 8052738 row versions in 25769 pagesDETAIL: 890 index row versions were removed.0 index pages have been deleted, 0 are currently reusable.CPU 2.08s/0.32u sec elapsed 4.36 sec.INFO: analyzing "cdm.cdm_email_data"INFO: "cdm_email_data": 65874 pages, 3000 rows sampled, 392363 estimated total rows #after REINDEX and vacuum full INFO: index "emaildat_fkey" now contains 8052369 row versions in 25771 pagesDETAIL: 521 index row versions were removed.0 index pages have been deleted, 0 are currently reusable.CPU 1.37s/0.35u sec elapsed 4.79 sec.INFO: analyzing "cdm.cdm_email_data"INFO: "cdm_email_data": 65869 pages, 3000 rows sampled, 392333 estimated total rows #After vacuum full(s) mdc_oz=# select count(*) from cdm.cdm_email_data; count-5433358(1 row) TIA Patrick
[PERFORM] vacuum full 100 mins plus?
Should I be concerned that my vacuum process has taken upwards of 100 + minutes to complete? I dropped all indexes before starting and also increased the vacuum_mem before starting. Looking at the output below, it appears that a vacuum full hasn't been done on this table for quite sometime. Would I be better off exporting the data vacuuming the table and reimporting the data? I cannot drop the table do to views attached to the table mdc_oz=# set vacuum_mem = 10240; SET mdc_oz=# vacuum full verbose cdm.cdm_ddw_Tran_item; INFO: vacuuming cdm.cdm_ddw_tran_item INFO: cdm_ddw_tran_item: found 15322404 removable, 10950460 nonremovable row versions in 934724 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 233 to 308 bytes long. There were 1081 unused item pointers. Total free space (including removable row versions) is 4474020460 bytes. 544679 pages are or will become empty, including 0 at the end of the table. 692980 pages containing 4433398408 free bytes are potential move destinations. CPU 29.55s/4.13u sec elapsed 107.82 sec. TIA Patrick Hatcher ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] vacuum full 100 mins plus?
Answered my own question. I gave up the vacuum full after 150 mins. I was able to export to a file, vacuum full the empty table, and reimport in less than 10 mins. I suspect the empty item pointers and the sheer number of removable rows was causing an issue. Patrick Hatcher [EMAIL PROTECTED] omTo Sent by: [EMAIL PROTECTED] pgsql-performance cc [EMAIL PROTECTED] .org Subject [PERFORM] vacuum full 100 mins plus? 07/14/2004 02:29 PM Should I be concerned that my vacuum process has taken upwards of 100 + minutes to complete? I dropped all indexes before starting and also increased the vacuum_mem before starting. Looking at the output below, it appears that a vacuum full hasn't been done on this table for quite sometime. Would I be better off exporting the data vacuuming the table and reimporting the data? I cannot drop the table do to views attached to the table mdc_oz=# set vacuum_mem = 10240; SET mdc_oz=# vacuum full verbose cdm.cdm_ddw_Tran_item; INFO: vacuuming cdm.cdm_ddw_tran_item INFO: cdm_ddw_tran_item: found 15322404 removable, 10950460 nonremovable row versions in 934724 pages DETAIL: 0 dead row versions cannot be removed yet. Nonremovable row versions range from 233 to 308 bytes long. There were 1081 unused item pointers. Total free space (including removable row versions) is 4474020460 bytes. 544679 pages are or will become empty, including 0 at the end of the table. 692980 pages containing 4433398408 free bytes are potential move destinations. CPU 29.55s/4.13u sec elapsed 107.82 sec. TIA Patrick Hatcher ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Slow vacuum performance
Pg: 7.4.2 RedHat 7.3 Ram: 8gig I have 6 million row table that I vacuum full analyze each night. The time seems to be streching out further and further as I add more rows. I read the archives and Josh's annotated pg.conf guide that setting the FSM higher might help. Currently, my memory settings are set as such. Does this seem low? Last reading from vaccum verbose: INFO: analyzing cdm.cdm_ddw_customer INFO: cdm_ddw_customer: 209106 pages, 3000 rows sampled, 6041742 estimated total rows I think I should now set my max FSM to at least 21 but wanted to make sure shared_buffers = 2000 # min 16, at least max_connections*2, 8KB each sort_mem = 12288# min 64, size in KB # - Free Space Map - max_fsm_pages = 10 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each TIA Patrick Hatcher Macys.Com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] *very* slow query to summarize data for a month ...
here's the URL: http://techdocs.postgresql.org/techdocs/pgsqladventuresep2.php Patrick Hatcher Macys.Com Legacy Integration Developer 415-422-1610 office HatcherPT - AIM Patrick Hatcher/MCOM/FDD To 11/10/2003 12:31 PMMarc G. Fournier [EMAIL PROTECTED]@FDS-NOTES cc [EMAIL PROTECTED], [EMAIL PROTECTED] rg Subject Re: [PERFORM] *very* slow query to summarize data for a month ... (Document link: Patrick Hatcher) Do you have an index on ts.bytes? Josh had suggested this and after I put it on my summed fields, I saw a speed increase. I can't remember the article was that Josh had written about index usage, but maybe he'll chime in and supply the URL for his article. hth Patrick Hatcher Marc G. Fournier [EMAIL PROTECTED] .orgTo Sent by: [EMAIL PROTECTED] pgsql-performance-o cc [EMAIL PROTECTED] Subject [PERFORM] *very* slow query to 11/10/2003 12:18 PMsummarize data for a month ... Table structure is simple: CREATE TABLE traffic_logs ( company_id bigint, ip_id bigint, port integer, bytes bigint, runtime timestamp without time zone ); runtime is 'day of month' ... I need to summarize the month, per company, with a query as: explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic FROM company c, traffic_logs ts WHERE c.company_id = ts.company_id AND month_trunc(ts.runtime) = '2003-10-01' GROUP BY company_name,ts.company_id; and the explain looks like: QUERY PLAN -- Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual time=32983.36..47586.17 rows=144 loops=1) - Group (cost=32000.94..32062.54 rows=8213 width=41) (actual time=32957.40..42817.88 rows=462198 loops=1) - Sort (cost=32000.94..32021.47 rows=8213 width=41) (actual time=32957.38..36261.31 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id - Merge Join (cost=31321.45..31466.92 rows=8213 width=41) (actual time=13983.07..22642.14 rows=462198 loops=1) Merge Cond: (outer.company_id = inner.company_id) - Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.52..7.40 rows=348 loops=1) Sort Key: c.company_id - Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1) - Sort (cost=31297.04..31317.57 rows=8213 width=16) (actual time=13977.49..16794.41 rows=462198 loops=1) Sort Key: ts.company_id - Index Scan using tl_month on traffic_logs ts (cost=0.00
Re: [PERFORM] *very* slow query to summarize data for a month ...
Do you have an index on ts.bytes? Josh had suggested this and after I put it on my summed fields, I saw a speed increase. I can't remember the article was that Josh had written about index usage, but maybe he'll chime in and supply the URL for his article. hth Patrick Hatcher Marc G. Fournier [EMAIL PROTECTED] .orgTo Sent by: [EMAIL PROTECTED] pgsql-performance-o cc [EMAIL PROTECTED] Subject [PERFORM] *very* slow query to 11/10/2003 12:18 PMsummarize data for a month ... Table structure is simple: CREATE TABLE traffic_logs ( company_id bigint, ip_id bigint, port integer, bytes bigint, runtime timestamp without time zone ); runtime is 'day of month' ... I need to summarize the month, per company, with a query as: explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic FROM company c, traffic_logs ts WHERE c.company_id = ts.company_id AND month_trunc(ts.runtime) = '2003-10-01' GROUP BY company_name,ts.company_id; and the explain looks like: QUERY PLAN -- Aggregate (cost=32000.94..32083.07 rows=821 width=41) (actual time=32983.36..47586.17 rows=144 loops=1) - Group (cost=32000.94..32062.54 rows=8213 width=41) (actual time=32957.40..42817.88 rows=462198 loops=1) - Sort (cost=32000.94..32021.47 rows=8213 width=41) (actual time=32957.38..36261.31 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id - Merge Join (cost=31321.45..31466.92 rows=8213 width=41) (actual time=13983.07..22642.14 rows=462198 loops=1) Merge Cond: (outer.company_id = inner.company_id) - Sort (cost=24.41..25.29 rows=352 width=25) (actual time=5.52..7.40 rows=348 loops=1) Sort Key: c.company_id - Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1) - Sort (cost=31297.04..31317.57 rows=8213 width=16) (actual time=13977.49..16794.41 rows=462198 loops=1) Sort Key: ts.company_id - Index Scan using tl_month on traffic_logs ts (cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25 rows=462198 loops=1) Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time zone) Total runtime: 47587.82 msec (14 rows) the problem is that we're only taking a few months worth of data, so I don't think there is much of a way of 'improve performance' on this, but figured I'd ask quickly before I do something rash ... Note that without the month_trunc() index, the Total runtime more then doubles: QUERY PLAN Aggregate (cost=39578.63..39660.76 rows=821 width=41) (actual time=87805.47..101251.35 rows=144 loops=1) - Group (cost=39578.63..39640.23 rows=8213 width=41) (actual time=87779.56..96824.56 rows=462198 loops=1) - Sort (cost=39578.63..39599.17 rows=8213 width=41) (actual time=87779.52..90781.48 rows=462198 loops=1) Sort Key: c.company_name, ts.company_id - Merge Join (cost=38899.14..39044.62 rows=8213 width=41) (actual time=64073.98..72783.68 rows=462198 loops=1) Merge Cond: (outer.company_id = inner.company_id) - Sort (cost=24.41..25.29 rows=352 width=25) (actual time=64.66..66.55 rows=348 loops=1) Sort Key: c.company_id - Seq Scan on company c (cost=0.00..9.52 rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1) - Sort (cost=38874.73..38895.27 rows=8213 width=16) (actual
Re: [PERFORM] SELECT's take a long time compared to other DBMS
Relaxin, I can't remember during this thread if you said you were using ODBC or not. If you are, then your problem is with the ODBC driver. You will need to check the Declare/Fetch box or you will definitely bring back the entire recordset. For small a small recordset this is not a problem, but the larger the recordset the slower the data is return to the client. I played around with the cache size on the driver and found a value between 100 to 200 provided good results. HTH Patrick Hatcher Relaxin [EMAIL PROTECTED] Sent by: To: [EMAIL PROTECTED] [EMAIL PROTECTED] cc: gresql.org Subject: Re: [PERFORM] SELECT's take a long time compared to other DBMS 09/04/2003 07:13 PM Thank you Christopher. Change fsync to true (you want your data to survive, right?) and increase shared buffers to something that represents ~10% of your system memory, in blocks of 8K. I turned it off just in the hope that things would run faster. None of this is likely to substantially change the result of that one query, however, and it seems quite likely that it is because PostgreSQL is honestly returning the whole result set of ~100K rows at once, whereas the other DBMSes are probably using cursors to return only the few rows of the result that you actually looked at. Finally, someone who will actually assume/admit that it is returning the entire result set to the client. Where as other DBMS manage the records at the server. I hope PG could fix/enhance this issue. There are several issues that's stopping our company from going with PG (with paid support, if available), but this seems to big the one at the top of the list. The next one is the handling of BLOBS. PG handles them like no other system I have ever come across. After that is a native Windows port, but we would deal cygwin (for a very little while) if these other issues were handled. Thanks Christopher Browne [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] A long time ago, in a galaxy far, far away, Relaxin [EMAIL PROTECTED] wrote: Have you changed any of the settings yet in postgresql.conf, specifically the shared_buffers setting? fsync = false tcpip_socket = true shared_buffers = 128 Change fsync to true (you want your data to survive, right?) and increase shared buffers to something that represents ~10% of your system memory, in blocks of 8K. So, if you have 512MB of RAM, then the total blocks is 65536, and it would likely be reasonable to increase shared_buffers to 1/10 of that, or about 6500. What is the value of effective_cache_size? That should probably be increased a whole lot, too. If you are mainly just running the database on your system, then it would be reasonable to set it to most of memory, or (* 1/2 (/ (* 512 1024 1024) 8192)) 32768. None of this is likely to substantially change the result of that one query, however, and it seems quite likely that it is because PostgreSQL is honestly returning the whole result set of ~100K rows at once, whereas the other DBMSes are probably using cursors to return only the few rows of the result that you actually looked at. -- cbbrowne,@,cbbrowne.com http://www3.sympatico.ca/cbbrowne/linuxdistributions.html Rules of the Evil Overlord #14. The hero is not entitled to a last kiss, a last cigarette, or any other form of last request. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly