Re: [PERFORM] Slow update/insert process
Thanks for the help. I found the culprit. The user had created a function within the function ( pm.pm_price_post_inc(prod.keyp_products)). Once this was fixed the time dropped dramatically. Patrick Hatcher Macys.Com Legacy Integration Developer 415-422-1610 office HatcherPT - AIM Patrick Hatcher <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 10/01/04 11:14 AM To <[EMAIL PROTECTED]> cc Subject [PERFORM] Slow update/insert process Pg: 7.4.5 RH 7.3 8g Ram 200 g drive space RAID0+1 Tables vacuum on a nightly basis The following process below takes 8 hours to run on 90k records and I'm not sure where to being to look for the bottleneck. This isn't the only updating on this database that seems to take a long time to complete. Is there something I should be looking for in my conf settings? TIA Patrick SQL: ---Bring back only selected records to run through the update process. --Without the function the SQL takes < 10secs to return 90,000 records SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon')) FROM mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products and keyf_attribute=22 WHERE upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG HEUER') AND keyf_producttype<>222 AND prod.action_publish = 1; Function: CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, "varchar") RETURNS bool AS 'DECLARE varkeyf_upc ALIAS FOR $1; varPassword ALIAS FOR $2; varRealMD5 varchar; varDeltaMD5 varchar; varLastTouchDate date; varQuery text; varQuery1 text; varQueryMD5 text; varQueryRecord record; varFuncStatus boolean := false; BEGIN -- Check the password IF varPassword <> \'amazon\' THEN Return false; END IF; -- Get the md5 hash for this product SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products)) AS md5 FROM public.mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ; IF NOT FOUND THEN RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc; ELSE varRealMD5:=varQueryRecord.md5; END IF; -- Check that the product is in the delta table and return its hash for comparison SELECT into varQueryRecord md5_hash,last_touch_date FROM pm.pm_delta_master_amazon WHERE keyf_upc = varkeyf_upc LIMIT 1; IF NOT FOUND THEN -- ADD and exit INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date) values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE); varFuncStatus:=true; RETURN varFuncStatus; ELSE --Update the record --- If the hash matches then set the record to HOLD IF varRealMD5 = varQueryRecord.md5_hash THEN UPDATE pm.pm_delta_master_amazon SET status= \'hold\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc AND last_touch_date <> CURRENT_DATE; varFuncStatus:=true; ELSE -- ELSE mark the item as ADD UPDATE pm.pm_delta_master_amazon SET status= \'add\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc; varFuncStatus:=true; END IF; END IF; RETURN varFuncStatus; END;' LANGUAGE 'plpgsql' IMMUTABLE; TableDef CREATE TABLE pm.pm_delta_master_amazon ( keyf_upc int4 , status varchar(6) , md5_hash varchar(40) , last_touch_date date ) GO CREATE INDEX status_idx ON pm.pm_delta_master_amazon(status) GO CONF # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync # the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 50 # in logfile segments, min 1, 16MB each checkpoint_timeout = 600 # range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Patrick Hatcher Macys.Com
[PERFORM] Performance suggestions for an update-mostly database?
I'm putting together a system where the operation mix is likely to be >95% update, <5% select on primary key. I'm used to performance tuning on a select-heavy database, but this will have a very different impact on the system. Does anyone have any experience with an update heavy system, and have any performance hints or hardware suggestions? Cheers, Steve ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Performance suggestions for an update-mostly database?
Steve, > I'm used to performance tuning on a select-heavy database, but this > will have a very different impact on the system. Does anyone have any > experience with an update heavy system, and have any performance hints > or hardware suggestions? Minimal/no indexes on the table(s).Raise checkpoint_segments and consider using commit_siblings/commit_delay if it's a multi-stream application. Figure out ways to do inserts instead of updates where possible, and COPY instead of insert, where possible. Put your WAL on its own disk resource. I'm a little curious as to what kind of app would be 95% writes. A log? -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] Performance suggestions for an update-mostly database?
On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote: > Steve, > > > I'm used to performance tuning on a select-heavy database, but this > > will have a very different impact on the system. Does anyone have any > > experience with an update heavy system, and have any performance hints > > or hardware suggestions? > > Minimal/no indexes on the table(s).Raise checkpoint_segments and consider > using commit_siblings/commit_delay if it's a multi-stream application. > Figure out ways to do inserts instead of updates where possible, and COPY > instead of insert, where possible. Put your WAL on its own disk resource. Thanks. > I'm a little curious as to what kind of app would be 95% writes. A log? It's the backend to a web application. The applications mix of queries is pretty normal, but it uses a large, in-core, write-through cache between the business logic and the database. It has more than usual locality on queries over short time periods, so the vast majority of reads should be answered out of the cache and not touch the database. In some ways something like Berkeley DB might be a better match to the frontend, but I'm comfortable with PostgreSQL and prefer to have the power of SQL commandline for when I need it. Cheers, Steve ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance suggestions for an update-mostly database?
Steve, > In some ways something like Berkeley DB might be a better match to the > frontend, but I'm comfortable with PostgreSQL and prefer to have the > power of SQL commandline for when I need it. Well, if data corruption is not a concern, you can always turn off checkpointing. This will save you a fair amount of overhead. You could also look at telegraphCQ. It's not prodcucton yet, but their idea of "streams" as data sources really seems to fit with what you're talking about. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Caching of Queries
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote: > Transparent "query caching" is the "industry standard" for how these things > are handled. However, Postgres' lack of this feature has made me consider > other approaches, and I'm starting to wonder if the "standard" query caching > -- where a materialized query result, or some reduction thereof, is cached in > database memory -- isn't the best way to cache things. I'm going to > abbreviate it "SQC" for the rest of this e-mail. Not to quibble, but are you sure that's the standard? Oracle and DB2 don't do this, and I didn't think MSSQL did either. What they do do is cache query *plans*. This is a *huge* deal in Oracle; search http://asktom.oracle.com for 'soft parse'. In any case, I think a means of marking some specific queries as being cachable is an excellent idea; perfect for 'static data' scenarios. What I don't know is how much will be saved. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance suggestions for an update-mostly database?
And obviously make sure you're vacuuming frequently. On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote: > Steve, > > > I'm used to performance tuning on a select-heavy database, but this > > will have a very different impact on the system. Does anyone have any > > experience with an update heavy system, and have any performance hints > > or hardware suggestions? > > Minimal/no indexes on the table(s).Raise checkpoint_segments and consider > using commit_siblings/commit_delay if it's a multi-stream application. > Figure out ways to do inserts instead of updates where possible, and COPY > instead of insert, where possible. Put your WAL on its own disk resource. > > I'm a little curious as to what kind of app would be 95% writes. A log? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---(end of broadcast)--- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED]) > -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] would number of fields in a table affect search-query time?
would the number of fields in a table significantly affect the search-query time? (meaning: less fields = much quicker response?) I have this database table of items with LOTS of properties per-item, that takes a LONG time to search. So as I was benchmarking it against SQLite, MySQL and some others, I exported just a few fields for testing, into all three databases. What surprised me the most is that the subset, even in the original database, gave search results MUCH faster than the full table! I know I'm being vague, but does anyone know if this is just common knowledge ("duh! of course!") or if I should be looking at is as a problem to fix? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] would number of fields in a table affect search-query time?
Miles Keaton <[EMAIL PROTECTED]> writes: > What surprised me the most is that the subset, even in the original > database, gave search results MUCH faster than the full table! The subset table's going to be physically much smaller, so it could just be that this reflects smaller I/O load. Hard to tell without a lot more detail about what case you were testing. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] would number of fields in a table affect search-query time?
On Mon, Oct 04, 2004 at 04:27:51PM -0700, Miles Keaton wrote: > would the number of fields in a table significantly affect the > search-query time? More fields = larger records = fewer records per page = if you read in everything, you'll need more I/O. > I have this database table of items with LOTS of properties per-item, > that takes a LONG time to search. It's a bit hard to say anything without seeing your actual tables and queries; I'd guess you either have a lot of matches or you're doing a sequential scan. You might want to normalize your tables, but again, it's hard to say anything without seeing your actual data. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] would number of fields in a table affect search-query time?
Miles, > would the number of fields in a table significantly affect the > search-query time? Yes. In addition to the issues mentioned previously, there is the issue of criteria; an OR query on 8 fields is going to take longer to filter than an OR query on 2 fields. Anyway, I think maybe you should tell us more about your database design. Often the fastest solution involves a more sophisticated approach toward querying your tables. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Comparing user attributes with bitwise operators
Sorry I have taken this long to reply, Greg, but here are the results of the personals site done with contrib/intarray: The first thing I did was add a serial column to the attributes table. So instead of having a unique constraint on (attribute_id,value_id), every row has a unique value: datingsite=> \d attribute_names Table "public.attribute_names" Column | Type | Modifiers +---+--- attribute_id | integer | not null default nextval('public.attribute_names_attribute_id_seq'::text) attribute_name | character varying(50) | not null Indexes: "attribute_names_pkey" PRIMARY KEY, btree (attribute_id) "attribute_names_attribute_id_key" UNIQUE, btree (attribute_id, attribute_name an example insert: insert into attribute_names (attribute_name) values ('languages'); datingsite=> \d attribute_values Table "public.attribute_values" Column| Type | Modifiers --++ attribute_id | integer| not null order_id | integer| not null default (nextval('order_id_seq'::text) - 1) label| character varying(255) | not null value_id | integer| not null default nextval('public.attribute_values_value_id_seq'::text) Indexes: "attribute_values_pkey" PRIMARY KEY, btree (value_id) Foreign-key constraints: "attribute_values_attribute_id_fkey" FOREIGN KEY (attribute_id) REFERENCES attribute_names(attribute_id) an example insert (22 is the attribute_id of "languages"): insert into attribute_values (attribute_id, label) values (22, 'English'); The "value_id" column is where the integers inside the int[] arrays will reference. Even age (between 18-99) and height (between 48-84) have rows for every possible choice, as well as "Ask me!" where a user could choose to leave that blank. Here is "the int[] table": create table people_attributes ( person_id int references people (person_id) on delete cascade initially deferred, askmecount int not null default 0, age int not null references attribute_values(value_id) on delete restrict, gender int not null references attribute_values(value_id) on delete restrict, bodytype int not null references attribute_values(value_id) on delete restrict, children int not null references attribute_values(value_id) on delete restrict, drinking int not null references attribute_values(value_id) on delete restrict, education int not null references attribute_values(value_id) on delete restrict, ethnicity int not null references attribute_values(value_id) on delete restrict, eyecolor int not null references attribute_values(value_id) on delete restrict, haircolor int not null references attribute_values(value_id) on delete restrict, hairstyle int not null references attribute_values(value_id) on delete restrict, height int not null references attribute_values(value_id) on delete restrict, income int not null references attribute_values(value_id) on delete restrict, languages int[] not null, occupation int not null references attribute_values(value_id) on delete restrict, orientation int not null references attribute_values(value_id) on delete restrict, relation int not null references attribute_values(value_id) on delete restrict, religion int not null references attribute_values(value_id) on delete restrict, smoking int not null references attribute_values(value_id) on delete restrict, want_children int not null references attribute_values(value_id) on delete restrict, weight int not null references attribute_values(value_id) on delete restrict, seeking int[] not null, primary key (person_id) ) without oids; If you'll notice that "seeking" and "languages" are both int[] types. I did this because those will be multiple choice. The index was created like so: create index people_attributes_search on people_attributes using gist ( (array[ age, gender, orientation, children, drinking, education, ethnicity, eyecolor, haircolor, hairstyle, height, income, occupation, relation, religion, smoking, want_children, weight ] + seeking + languages) gist__int_ops ); seeking and languages are appended with the intarray + op. I'm not going to go too in depth on how this query was generated since that was mostly done with the PHP side of things, but from the structure it