Re: [PERFORM] Slow update/insert process
Thanks for the help. I found the culprit. The user had created a function within the function ( pm.pm_price_post_inc(prod.keyp_products)). Once this was fixed the time dropped dramatically. Patrick Hatcher Macys.Com Legacy Integration Developer 415-422-1610 office HatcherPT - AIM Patrick Hatcher [EMAIL PROTECTED] Sent by: [EMAIL PROTECTED] 10/01/04 11:14 AM To [EMAIL PROTECTED] cc Subject [PERFORM] Slow update/insert process Pg: 7.4.5 RH 7.3 8g Ram 200 g drive space RAID0+1 Tables vacuum on a nightly basis The following process below takes 8 hours to run on 90k records and I'm not sure where to being to look for the bottleneck. This isn't the only updating on this database that seems to take a long time to complete. Is there something I should be looking for in my conf settings? TIA Patrick SQL: ---Bring back only selected records to run through the update process. --Without the function the SQL takes 10secs to return 90,000 records SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon')) FROM mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products and keyf_attribute=22 WHERE upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG HEUER') AND keyf_producttype222 AND prod.action_publish = 1; Function: CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar) RETURNS bool AS 'DECLARE varkeyf_upcALIAS FOR $1; varPasswordALIAS FOR $2; varRealMD5varchar; varDeltaMD5varchar; varLastTouchDatedate; varQuery text; varQuery1 text; varQueryMD5text; varQueryRecordrecord; varFuncStatusboolean := false; BEGIN -- Check the password IF varPassword \'amazon\' THEN Return false; END IF; -- Get the md5 hash for this product SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products)) AS md5 FROM public.mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ; IF NOT FOUND THEN RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc; ELSE varRealMD5:=varQueryRecord.md5; END IF; -- Check that the product is in the delta table and return its hash for comparison SELECT into varQueryRecord md5_hash,last_touch_date FROM pm.pm_delta_master_amazon WHERE keyf_upc = varkeyf_upc LIMIT 1; IF NOT FOUND THEN -- ADD and exit INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date) values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE); varFuncStatus:=true; RETURN varFuncStatus; ELSE --Update the record --- If the hash matches then set the record to HOLD IF varRealMD5 = varQueryRecord.md5_hash THEN UPDATE pm.pm_delta_master_amazon SET status= \'hold\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc AND last_touch_date CURRENT_DATE; varFuncStatus:=true; ELSE -- ELSE mark the item as ADD UPDATE pm.pm_delta_master_amazon SET status= \'add\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc; varFuncStatus:=true; END IF; END IF; RETURN varFuncStatus; END;' LANGUAGE 'plpgsql' IMMUTABLE; TableDef CREATE TABLE pm.pm_delta_master_amazon ( keyf_upc int4 , status varchar(6) , md5_hash varchar(40) , last_touch_datedate ) GO CREATE INDEX status_idx ON pm.pm_delta_master_amazon(status) GO CONF # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32# min 4, 8KB each # - Checkpoints - checkpoint_segments = 50# in logfile segments, min 1, 16MB each checkpoint_timeout = 600# range 30-3600, in seconds #checkpoint_warning = 30# 0 is off, in seconds #commit_delay = 0# range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Patrick Hatcher Macys.Com
[PERFORM] Performance suggestions for an update-mostly database?
I'm putting together a system where the operation mix is likely to be 95% update, 5% select on primary key. I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Cheers, Steve ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance suggestions for an update-mostly database?
Steve, I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Minimal/no indexes on the table(s).Raise checkpoint_segments and consider using commit_siblings/commit_delay if it's a multi-stream application. Figure out ways to do inserts instead of updates where possible, and COPY instead of insert, where possible. Put your WAL on its own disk resource. I'm a little curious as to what kind of app would be 95% writes. A log? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Performance suggestions for an update-mostly database?
On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote: Steve, I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Minimal/no indexes on the table(s).Raise checkpoint_segments and consider using commit_siblings/commit_delay if it's a multi-stream application. Figure out ways to do inserts instead of updates where possible, and COPY instead of insert, where possible. Put your WAL on its own disk resource. Thanks. I'm a little curious as to what kind of app would be 95% writes. A log? It's the backend to a web application. The applications mix of queries is pretty normal, but it uses a large, in-core, write-through cache between the business logic and the database. It has more than usual locality on queries over short time periods, so the vast majority of reads should be answered out of the cache and not touch the database. In some ways something like Berkeley DB might be a better match to the frontend, but I'm comfortable with PostgreSQL and prefer to have the power of SQL commandline for when I need it. Cheers, Steve ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote: Transparent query caching is the industry standard for how these things are handled. However, Postgres' lack of this feature has made me consider other approaches, and I'm starting to wonder if the standard query caching -- where a materialized query result, or some reduction thereof, is cached in database memory -- isn't the best way to cache things. I'm going to abbreviate it SQC for the rest of this e-mail. Not to quibble, but are you sure that's the standard? Oracle and DB2 don't do this, and I didn't think MSSQL did either. What they do do is cache query *plans*. This is a *huge* deal in Oracle; search http://asktom.oracle.com for 'soft parse'. In any case, I think a means of marking some specific queries as being cachable is an excellent idea; perfect for 'static data' scenarios. What I don't know is how much will be saved. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance suggestions for an update-mostly database?
And obviously make sure you're vacuuming frequently. On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote: Steve, I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Minimal/no indexes on the table(s).Raise checkpoint_segments and consider using commit_siblings/commit_delay if it's a multi-stream application. Figure out ways to do inserts instead of updates where possible, and COPY instead of insert, where possible. Put your WAL on its own disk resource. I'm a little curious as to what kind of app would be 95% writes. A log? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] would number of fields in a table affect search-query time?
would the number of fields in a table significantly affect the search-query time? (meaning: less fields = much quicker response?) I have this database table of items with LOTS of properties per-item, that takes a LONG time to search. So as I was benchmarking it against SQLite, MySQL and some others, I exported just a few fields for testing, into all three databases. What surprised me the most is that the subset, even in the original database, gave search results MUCH faster than the full table! I know I'm being vague, but does anyone know if this is just common knowledge (duh! of course!) or if I should be looking at is as a problem to fix? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] would number of fields in a table affect search-query time?
Miles Keaton [EMAIL PROTECTED] writes: What surprised me the most is that the subset, even in the original database, gave search results MUCH faster than the full table! The subset table's going to be physically much smaller, so it could just be that this reflects smaller I/O load. Hard to tell without a lot more detail about what case you were testing. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] would number of fields in a table affect search-query time?
On Mon, Oct 04, 2004 at 04:27:51PM -0700, Miles Keaton wrote: would the number of fields in a table significantly affect the search-query time? More fields = larger records = fewer records per page = if you read in everything, you'll need more I/O. I have this database table of items with LOTS of properties per-item, that takes a LONG time to search. It's a bit hard to say anything without seeing your actual tables and queries; I'd guess you either have a lot of matches or you're doing a sequential scan. You might want to normalize your tables, but again, it's hard to say anything without seeing your actual data. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] would number of fields in a table affect search-query time?
Miles, would the number of fields in a table significantly affect the search-query time? Yes. In addition to the issues mentioned previously, there is the issue of criteria; an OR query on 8 fields is going to take longer to filter than an OR query on 2 fields. Anyway, I think maybe you should tell us more about your database design. Often the fastest solution involves a more sophisticated approach toward querying your tables. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org