Re: [PERFORM] Slow update/insert process

2004-10-04 Thread Patrick Hatcher

Thanks for the help.
I found the culprit.  The user
had created a function within the function (
pm.pm_price_post_inc(prod.keyp_products)).
Once this was fixed the time dropped dramatically.


Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM





Patrick Hatcher <[EMAIL PROTECTED]>

Sent by: [EMAIL PROTECTED]
10/01/04 11:14 AM




To
<[EMAIL PROTECTED]>


cc



Subject
[PERFORM] Slow update/insert
process









Pg: 7.4.5 
RH 7.3 
8g Ram 
200 g drive space 
RAID0+1 
Tables vacuum on a nightly basis 

The following process below takes 8 hours to run on 90k records and I'm
not sure where to being to look for the bottleneck.  This isn't the
only updating on this database that seems to take a long time to complete.
 Is there something I should be looking for in my conf settings?  


TIA 
Patrick 


SQL: 
---Bring back only selected records to run through the update process.

--Without the function the SQL takes < 10secs to return 90,000 records

SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon'))

FROM mdc_upc upc 
JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products

JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product

JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products
and keyf_attribute=22 
WHERE 
upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG
HEUER') 
AND keyf_producttype<>222 
AND prod.action_publish = 1; 


Function: 

CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, "varchar")
 RETURNS bool AS
'DECLARE
   varkeyf_upc                ALIAS
FOR $1;
   varPassword                ALIAS
FOR $2;
   varRealMD5                varchar;
   varDeltaMD5                varchar;
   varLastTouchDate        date;
   varQuery                
text;
   varQuery1                
text;
   varQueryMD5                text;
   varQueryRecord        record;
   varFuncStatus        boolean := false;
   
BEGIN

-- Check the password
 IF varPassword <> \'amazon\' THEN
   Return false;
 END IF;


--  Get the md5 hash for this product
 SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products))
AS md5
   FROM public.mdc_upc upc
   JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products
   JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products
   WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ;
 

 IF NOT FOUND THEN
   RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc;
 ELSE
   varRealMD5:=varQueryRecord.md5;
 END IF;

--  Check that the product is in the delta table and return its hash
for comparison 
 SELECT into varQueryRecord md5_hash,last_touch_date 
   FROM pm.pm_delta_master_amazon
   WHERE keyf_upc = varkeyf_upc LIMIT 1;

 IF NOT FOUND THEN
   -- ADD and exit
   INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date)
   values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE);
   varFuncStatus:=true;
   RETURN varFuncStatus;
 ELSE
   --Update the record 
     --- If the hash matches then set the record to HOLD
   IF varRealMD5 = varQueryRecord.md5_hash THEN
       UPDATE pm.pm_delta_master_amazon
         SET status= \'hold\',
         last_touch_date =  CURRENT_DATE
       WHERE keyf_upc =  varkeyf_upc AND last_touch_date
<> CURRENT_DATE; 
       varFuncStatus:=true;
   ELSE
       --  ELSE mark the item as ADD
       UPDATE pm.pm_delta_master_amazon
         SET status= \'add\',
         last_touch_date =  CURRENT_DATE
       WHERE keyf_upc =  varkeyf_upc;
       varFuncStatus:=true;
   END IF;
  END IF;

 RETURN varFuncStatus;
END;'
 LANGUAGE 'plpgsql' IMMUTABLE;



TableDef 
CREATE TABLE pm.pm_delta_master_amazon ( 
    keyf_upc              
int4 , 
    status                
varchar(6) , 
    md5_hash              
varchar(40) , 
    last_touch_date        date 
    ) 
GO 

CREATE INDEX status_idx 
    ON pm.pm_delta_master_amazon(status)

GO 




CONF 
 
# WRITE AHEAD LOG 
#---


# - Settings - 

#fsync = true                  
# turns forced synchronization on or off 
#wal_sync_method = fsync        # the default varies
across platforms: 
                    
           # fsync, fdatasync, open_sync,
or open_datasync 
wal_buffers = 32                #
min 4, 8KB each 

# - Checkpoints - 

checkpoint_segments = 50        # in logfile segments,
min 1, 16MB each 
checkpoint_timeout = 600        # range 30-3600, in
seconds 
#checkpoint_warning = 30        # 0 is off, in seconds

#commit_delay = 0               # range
0-10, in microseconds 
#commit_siblings = 5            # range 1-1000



Patrick Hatcher
Macys.Com


[PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Steve Atkins
I'm putting together a system where the operation mix is likely to be
>95% update, <5% select on primary key.

I'm used to performance tuning on a select-heavy database, but this
will have a very different impact on the system. Does anyone have any
experience with an update heavy system, and have any performance hints
or hardware suggestions?

Cheers,
  Steve

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Josh Berkus
Steve,

> I'm used to performance tuning on a select-heavy database, but this
> will have a very different impact on the system. Does anyone have any
> experience with an update heavy system, and have any performance hints
> or hardware suggestions?

Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
using commit_siblings/commit_delay if it's a multi-stream application.   
Figure out ways to do inserts instead of updates where possible, and COPY 
instead of insert, where possible.  Put your WAL on its own disk resource.

I'm a little curious as to what kind of app would be 95% writes.  A log?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Steve Atkins
On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote:
> Steve,
> 
> > I'm used to performance tuning on a select-heavy database, but this
> > will have a very different impact on the system. Does anyone have any
> > experience with an update heavy system, and have any performance hints
> > or hardware suggestions?
> 
> Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
> using commit_siblings/commit_delay if it's a multi-stream application.   
> Figure out ways to do inserts instead of updates where possible, and COPY 
> instead of insert, where possible.  Put your WAL on its own disk resource.

Thanks.

> I'm a little curious as to what kind of app would be 95% writes.  A log?

It's the backend to a web application. The applications mix of queries
is pretty normal, but it uses a large, in-core, write-through cache
between the business logic and the database. It has more than usual
locality on queries over short time periods, so the vast majority of
reads should be answered out of the cache and not touch the database.

In some ways something like Berkeley DB might be a better match to the
frontend, but I'm comfortable with PostgreSQL and prefer to have the
power of SQL commandline for when I need it.

Cheers,
  Steve

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Josh Berkus
Steve,

> In some ways something like Berkeley DB might be a better match to the
> frontend, but I'm comfortable with PostgreSQL and prefer to have the
> power of SQL commandline for when I need it.

Well, if data corruption is not a concern, you can always turn off 
checkpointing.   This will save you a fair amount of overhead.

You could also look at telegraphCQ.   It's not prodcucton yet, but their idea 
of "streams" as data sources really seems to fit with what you're talking 
about.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Caching of Queries

2004-10-04 Thread Jim C. Nasby
On Fri, Oct 01, 2004 at 10:10:40AM -0700, Josh Berkus wrote:
> Transparent "query caching" is the "industry standard" for how these things 
> are handled.   However, Postgres' lack of this feature has made me consider 
> other approaches, and I'm starting to wonder if the "standard" query caching 
> -- where a materialized query result, or some reduction thereof, is cached in 
> database memory -- isn't the best way to cache things.  I'm going to 
> abbreviate it "SQC" for the rest of this e-mail.
 
Not to quibble, but are you sure that's the standard? Oracle and DB2
don't do this, and I didn't think MSSQL did either. What they do do is
cache query *plans*. This is a *huge* deal in Oracle; search
http://asktom.oracle.com for 'soft parse'.

In any case, I think a means of marking some specific queries as being
cachable is an excellent idea; perfect for 'static data' scenarios. What
I don't know is how much will be saved.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Performance suggestions for an update-mostly database?

2004-10-04 Thread Jim C. Nasby
And obviously make sure you're vacuuming frequently.

On Mon, Oct 04, 2004 at 10:38:14AM -0700, Josh Berkus wrote:
> Steve,
> 
> > I'm used to performance tuning on a select-heavy database, but this
> > will have a very different impact on the system. Does anyone have any
> > experience with an update heavy system, and have any performance hints
> > or hardware suggestions?
> 
> Minimal/no indexes on the table(s).Raise checkpoint_segments and consider 
> using commit_siblings/commit_delay if it's a multi-stream application.   
> Figure out ways to do inserts instead of updates where possible, and COPY 
> instead of insert, where possible.  Put your WAL on its own disk resource.
> 
> I'm a little curious as to what kind of app would be 95% writes.  A log?
> 
> -- 
> Josh Berkus
> Aglio Database Solutions
> San Francisco
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Miles Keaton
would the number of fields in a table significantly affect the
search-query time?

(meaning: less fields = much quicker response?)

I have this database table of items with LOTS of properties per-item,
that takes a LONG time to search.

So as I was benchmarking it against SQLite, MySQL and some others, I
exported just a few fields for testing, into all three databases.

What surprised me the most is that the subset, even in the original
database, gave search results MUCH faster than the full table!

I know I'm being vague, but does anyone know if this is just common
knowledge ("duh! of course!") or if I should be looking at is as a
problem to fix?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Tom Lane
Miles Keaton <[EMAIL PROTECTED]> writes:
> What surprised me the most is that the subset, even in the original
> database, gave search results MUCH faster than the full table!

The subset table's going to be physically much smaller, so it could just
be that this reflects smaller I/O load.  Hard to tell without a lot more
detail about what case you were testing.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Steinar H. Gunderson
On Mon, Oct 04, 2004 at 04:27:51PM -0700, Miles Keaton wrote:
> would the number of fields in a table significantly affect the
> search-query time?

More fields = larger records = fewer records per page = if you read in
everything, you'll need more I/O.

> I have this database table of items with LOTS of properties per-item,
> that takes a LONG time to search.

It's a bit hard to say anything without seeing your actual tables and
queries; I'd guess you either have a lot of matches or you're doing a
sequential scan.

You might want to normalize your tables, but again, it's hard to say anything
without seeing your actual data.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] would number of fields in a table affect search-query time?

2004-10-04 Thread Josh Berkus
Miles,

> would the number of fields in a table significantly affect the
> search-query time?

Yes.

In addition to the issues mentioned previously, there is the issue of 
criteria; an OR query on 8 fields is going to take longer to filter than an 
OR query on 2 fields.

Anyway, I think maybe you should tell us more about your database design.   
Often the fastest solution involves a more sophisticated approach toward 
querying your tables.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Comparing user attributes with bitwise operators

2004-10-04 Thread Patrick Clery
Sorry I have taken this long to reply, Greg, but here are the results of the 
personals site done with contrib/intarray:

The first thing I did was add a serial column to the attributes table. So 
instead of having a unique constraint on (attribute_id,value_id), every row 
has a unique value:

datingsite=> \d attribute_names
   Table "public.attribute_names"
 Column | Type  | 
Modifiers 
+---+---
 attribute_id   | integer   | not null default 
nextval('public.attribute_names_attribute_id_seq'::text)
 attribute_name | character varying(50) | not null
Indexes:
"attribute_names_pkey" PRIMARY KEY, btree (attribute_id)
"attribute_names_attribute_id_key" UNIQUE, btree (attribute_id, 
attribute_name

an example insert:
insert into attribute_names (attribute_name) values ('languages');



datingsite=> \d attribute_values
Table "public.attribute_values"
Column|  Type  |   
Modifiers
--++
 attribute_id | integer| not null
 order_id | integer| not null default 
(nextval('order_id_seq'::text) - 1)
 label| character varying(255) | not null
 value_id | integer| not null default 
nextval('public.attribute_values_value_id_seq'::text)
Indexes:
"attribute_values_pkey" PRIMARY KEY, btree (value_id)
Foreign-key constraints:
"attribute_values_attribute_id_fkey" FOREIGN KEY (attribute_id) REFERENCES 
attribute_names(attribute_id)

an example insert (22 is the attribute_id of "languages"):
insert into attribute_values (attribute_id, label) values (22, 'English');


The "value_id" column is where the integers inside the int[] arrays will 
reference. Even age (between 18-99) and height (between 48-84) have rows for 
every possible choice, as well as "Ask me!" where a user could choose to 
leave that blank.

Here is "the int[] table":

create table people_attributes (
person_id int references people (person_id) on delete cascade initially 
deferred,
askmecount int not null default 0,
age int not null references attribute_values(value_id) on delete restrict,
gender int not null references attribute_values(value_id) on delete 
restrict,
bodytype int not null references attribute_values(value_id) on delete 
restrict,
children int not null references attribute_values(value_id) on delete 
restrict,
drinking int not null references attribute_values(value_id) on delete 
restrict,
education int not null references attribute_values(value_id) on delete 
restrict,
ethnicity int not null references attribute_values(value_id) on delete 
restrict,
eyecolor int not null references attribute_values(value_id) on delete 
restrict,
haircolor int not null references attribute_values(value_id) on delete 
restrict,
hairstyle int not null references attribute_values(value_id) on delete 
restrict,
height int not null references attribute_values(value_id) on delete 
restrict,
income int not null references attribute_values(value_id) on delete 
restrict,
languages int[] not null,
occupation int not null references attribute_values(value_id) on delete 
restrict,
orientation int not null references attribute_values(value_id) on delete 
restrict,
relation int not null references attribute_values(value_id) on delete 
restrict,
religion int not null references attribute_values(value_id) on delete 
restrict,
smoking int not null references attribute_values(value_id) on delete 
restrict,
want_children int not null references attribute_values(value_id) on delete 
restrict,
weight int not null references attribute_values(value_id) on delete 
restrict,

seeking int[] not null,

primary key (person_id)
)
without oids;


If you'll notice that "seeking" and "languages" are both int[] types. I did 
this because those will be multiple choice. The index was created like so:

create index people_attributes_search on people_attributes using gist (
(array[
age,
gender,
orientation,
children,
drinking,
education,
ethnicity,
eyecolor,
haircolor,
hairstyle,
height,
income,
occupation,
relation,
religion,
smoking,
want_children,
weight
] + seeking + languages) gist__int_ops
);

seeking and languages are appended with the intarray + op.


I'm not going to go too in depth on how this query was generated since that 
was mostly done with the PHP side of things, but from the structure it