Re: [PERFORM] Slow update/insert process

2004-10-04 Thread Patrick Hatcher

Thanks for the help.
I found the culprit. The user
had created a function within the function (
pm.pm_price_post_inc(prod.keyp_products)).
Once this was fixed the time dropped dramatically.


Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM





Patrick Hatcher [EMAIL PROTECTED]

Sent by: [EMAIL PROTECTED]
10/01/04 11:14 AM




To
[EMAIL PROTECTED]


cc



Subject
[PERFORM] Slow update/insert
process









Pg: 7.4.5 
RH 7.3 
8g Ram 
200 g drive space 
RAID0+1 
Tables vacuum on a nightly basis 

The following process below takes 8 hours to run on 90k records and I'm
not sure where to being to look for the bottleneck. This isn't the
only updating on this database that seems to take a long time to complete.
Is there something I should be looking for in my conf settings? 


TIA 
Patrick 


SQL: 
---Bring back only selected records to run through the update process.

--Without the function the SQL takes  10secs to return 90,000 records

SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon'))

FROM mdc_upc upc 
JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products

JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product

JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products
and keyf_attribute=22 
WHERE 
upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG
HEUER') 
AND keyf_producttype222 
AND prod.action_publish = 1; 


Function: 

CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar)
 RETURNS bool AS
'DECLARE
  varkeyf_upcALIAS
FOR $1;
  varPasswordALIAS
FOR $2;
  varRealMD5varchar;
  varDeltaMD5varchar;
  varLastTouchDatedate;
  varQuery
text;
  varQuery1
text;
  varQueryMD5text;
  varQueryRecordrecord;
  varFuncStatusboolean := false;
  
BEGIN

-- Check the password
 IF varPassword  \'amazon\' THEN
  Return false;
 END IF;


-- Get the md5 hash for this product
 SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products))
AS md5
  FROM public.mdc_upc upc
  JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products
  JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products
  WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ;
 

 IF NOT FOUND THEN
  RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc;
 ELSE
  varRealMD5:=varQueryRecord.md5;
 END IF;

-- Check that the product is in the delta table and return its hash
for comparison 
 SELECT into varQueryRecord md5_hash,last_touch_date 
  FROM pm.pm_delta_master_amazon
  WHERE keyf_upc = varkeyf_upc LIMIT 1;

 IF NOT FOUND THEN
  -- ADD and exit
  INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date)
  values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE);
  varFuncStatus:=true;
  RETURN varFuncStatus;
 ELSE
  --Update the record 
   --- If the hash matches then set the record to HOLD
  IF varRealMD5 = varQueryRecord.md5_hash THEN
UPDATE pm.pm_delta_master_amazon
 SET status= \'hold\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc AND last_touch_date
 CURRENT_DATE; 
varFuncStatus:=true;
  ELSE
-- ELSE mark the item as ADD
UPDATE pm.pm_delta_master_amazon
 SET status= \'add\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc;
varFuncStatus:=true;
  END IF;
 END IF;

 RETURN varFuncStatus;
END;'
 LANGUAGE 'plpgsql' IMMUTABLE;



TableDef 
CREATE TABLE pm.pm_delta_master_amazon ( 
  keyf_upc   
int4 , 
  status
varchar(6) , 
  md5_hash   
varchar(40) , 
  last_touch_datedate 
  ) 
GO 

CREATE INDEX status_idx 
  ON pm.pm_delta_master_amazon(status)

GO 




CONF 
 
# WRITE AHEAD LOG 
#---


# - Settings - 

#fsync = true 
# turns forced synchronization on or off 
#wal_sync_method = fsync# the default varies
across platforms: 
  
 # fsync, fdatasync, open_sync,
or open_datasync 
wal_buffers = 32#
min 4, 8KB each 

# - Checkpoints - 

checkpoint_segments = 50# in logfile segments,
min 1, 16MB each 
checkpoint_timeout = 600# range 30-3600, in
seconds 
#checkpoint_warning = 30# 0 is off, in seconds

#commit_delay = 0# range
0-10, in microseconds 
#commit_siblings = 5  # range 1-1000



Patrick Hatcher
Macys.Com


[PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Steve Atkins
I'm putting together a system where the operation mix is likely to be
95% update, 5% select on primary key.

I'm used to performance tuning on a select-heavy database, but this
will have a very different impact on the system. Does anyone have any
experience with an update heavy system, and have any performance hints
or hardware suggestions?

Cheers,
  Steve

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Josh Berkus
Steve,

 I'm used to performance tuning on a select-heavy database, but this
 will have a very different impact on the system. Does anyone have any
 experience with an update heavy system, and have any performance hints
 or hardware suggestions?

Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
using commit_siblings/commit_delay if it's a multi-stream application.   
Figure out ways to do inserts instead of updates where possible, and COPY 
instead of insert, where possible.  Put your WAL on its own disk resource.

I'm a little curious as to what kind of app would be 95% writes.  A log?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Steve Atkins
On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote:
 Steve,
 
  I'm used to performance tuning on a select-heavy database, but this
  will have a very different impact on the system. Does anyone have any
  experience with an update heavy system, and have any performance hints
  or hardware suggestions?
 
 Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
 using commit_siblings/commit_delay if it's a multi-stream application.   
 Figure out ways to do inserts instead of updates where possible, and COPY 
 instead of insert, where possible.  Put your WAL on its own disk resource.

Thanks.

 I'm a little curious as to what kind of app would be 95% writes.  A log?

It's the backend to a web application. The applications mix of queries
is pretty normal, but it uses a large, in-core, write-through cache
between the business logic and the database. It has more than usual
locality on queries over short time periods, so the vast majority of
reads should be answered out of the cache and not touch the database.

In some ways something like Berkeley DB might be a better match to the
frontend, but I'm comfortable with PostgreSQL and prefer to have the
power of SQL commandline for when I need it.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Caching of Queries

2004-10-04 Thread Jim C. Nasby
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote:
 Transparent query caching is the industry standard for how these things 
 are handled.   However, Postgres' lack of this feature has made me consider 
 other approaches, and I'm starting to wonder if the standard query caching 
 -- where a materialized query result, or some reduction thereof, is cached in 
 database memory -- isn't the best way to cache things.  I'm going to 
 abbreviate it SQC for the rest of this e-mail.
 
Not to quibble, but are you sure that's the standard? Oracle and DB2
don't do this, and I didn't think MSSQL did either. What they do do is
cache query *plans*. This is a *huge* deal in Oracle; search
http://asktom.oracle.com for 'soft parse'.

In any case, I think a means of marking some specific queries as being
cachable is an excellent idea; perfect for 'static data' scenarios. What
I don't know is how much will be saved.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Jim C. Nasby
And obviously make sure you're vacuuming frequently.

On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote:
 Steve,
 
  I'm used to performance tuning on a select-heavy database, but this
  will have a very different impact on the system. Does anyone have any
  experience with an update heavy system, and have any performance hints
  or hardware suggestions?
 
 Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
 using commit_siblings/commit_delay if it's a multi-stream application.   
 Figure out ways to do inserts instead of updates where possible, and COPY 
 instead of insert, where possible.  Put your WAL on its own disk resource.
 
 I'm a little curious as to what kind of app would be 95% writes.  A log?
 
 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Miles Keaton
would the number of fields in a table significantly affect the
search-query time?

(meaning: less fields = much quicker response?)

I have this database table of items with LOTS of properties per-item,
that takes a LONG time to search.

So as I was benchmarking it against SQLite, MySQL and some others, I
exported just a few fields for testing, into all three databases.

What surprised me the most is that the subset, even in the original
database, gave search results MUCH faster than the full table!

I know I'm being vague, but does anyone know if this is just common
knowledge (duh! of course!) or if I should be looking at is as a
problem to fix?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Tom Lane
Miles Keaton [EMAIL PROTECTED] writes:
 What surprised me the most is that the subset, even in the original
 database, gave search results MUCH faster than the full table!

The subset table's going to be physically much smaller, so it could just
be that this reflects smaller I/O load.  Hard to tell without a lot more
detail about what case you were testing.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Steinar H. Gunderson
On Mon, Oct 04, 2004 at 04:27:51PM -0700, Miles Keaton wrote:
 would the number of fields in a table significantly affect the
 search-query time?

More fields = larger records = fewer records per page = if you read in
everything, you'll need more I/O.

 I have this database table of items with LOTS of properties per-item,
 that takes a LONG time to search.

It's a bit hard to say anything without seeing your actual tables and
queries; I'd guess you either have a lot of matches or you're doing a
sequential scan.

You might want to normalize your tables, but again, it's hard to say anything
without seeing your actual data.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Josh Berkus
Miles,

 would the number of fields in a table significantly affect the
 search-query time?

Yes.

In addition to the issues mentioned previously, there is the issue of 
criteria; an OR query on 8 fields is going to take longer to filter than an 
OR query on 2 fields.

Anyway, I think maybe you should tell us more about your database design.   
Often the fastest solution involves a more sophisticated approach toward 
querying your tables.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org