Re: [PERFORM] Where to start for performance problem?

2003-11-25 Thread Rob Fielding
The problems with giving suggestions about increasing performance is 
that one persons increase is another persons decrease.

having said that, there are a few general suggestions :

Set-up some shared memory, about a tenth of your available RAM, and 
configure shared_memory and max_clients correctly. I've used the 
following formula, ripped off the net from somewhere. It's not entirely 
acurate, as other settings steal a little shared memory, but it works 
for the most part :

((1024*RAM_SIZE) - (14.2 * max_connections) - 250) / 8.2

as I say, it should get you a good value, otherwise lower it bit by bit 
if you have trouble starting your db.

Increase effective_cache (50%-70% avail ram) and sort_mem (about 1/20th 
ram) and lower you random_page_cost to around 2 or less (as low as 0.3) 
if you have fast SCSI drives in a RAID10 set-up - this was a big speedup ;)

But this might not be the answer though. The values detailed above are 
when tuning an already stable setup.

Perhaps you need to look at your system resource usage. If you're 
degrading performance over time it sounds to me like you are slowly 
running out of memory and swap ?

Generall if I take something over, I'll try and get it onto my terms. 
Have you tried importing the DB to a fresh installation, one where you 
know sensible defaults are set, so you aren't inheriting any cruft from 
the previous sysadmin.

To be honest tho, I've never run pg so that it actually shutdown because 
it was running so badly - i just wouldn't think it would do that.

--

Rob Fielding
[EMAIL PROTECTED]
www.dsvr.co.uk  Development Designer Servers Ltd

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Where to start for performance problem?

2003-11-25 Thread Bruno Wolff III
On Mon, Nov 24, 2003 at 16:03:17 -0600,
  MK Spam [EMAIL PROTECTED] wrote:
 
 The archives of this list provides many ideas for improving performance, but the 
 problem we are having is gradually degrading performance ending in postgres shutting 
 down.  So it's not a matter of optimizing a complex query to take 5 seconds instead 
 of 60 seconds.  From what I can tell we are using the VACUUM command on a schedule 
 but it doesn't seem to prevent the database from becoming congested as we refer to 
 it.  :]  Anyway, the only way I know to fix the problem is to export (pg_dump) the 
 db, drop the database, recreate the database and import the dump.  This seems to 
 return performance back to normal but obviously isn't a very good solution.  The 
 slowdown and subsequent crash can take as little as 1 week for databases with a lot 
 of data or go as long as a few weeks to a month for smaller data sets.

A couple of things you might look for are index bloat and having FSM set too
small for your plain vacuums. Upgrading to 7.4 may help with index bloat
if that is your problem.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Where to start for performance problem?

2003-11-25 Thread Tom Lane
MK Spam [EMAIL PROTECTED] wrote:
 ... the problem we are having is gradually degrading
 performance ending in postgres shutting down.

As someone else commented, that's not an ordinary sort of performance
problem.  What exactly happens when the database shuts down?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] design question: general db performance

2003-11-25 Thread shane hill
Hi folks,

Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
too loudly,  you can laugh, just not too loudly and please do not point. :)

I am working on an Automated Installer Testing System for Adobe Systems 
and I am doing a DB redesign of the current postgres db:

1. We are testing a matrix of over 900 Acrobat installer configurations 
and we are tracking every file and registry entry that is affected by an 
installation.

2. a single file or registry entry that is affected by any test is 
stored in the db as a record.

3. a typical record is about 12 columns of string data. the data is all 
information about a file (mac or windows) or windows registry entry [ 
file or regkey name, file size,  modification date, checksum, 
permissions, owner, group, and in the case of a mac, we are getting all 
the hfs atts as well].

4. A typical test produces anywhere from 2000 - 5000 records.

Our db is getting to be a respectable size (about 10GB right now) and is 
growing slower and slower. I have been charged with making it faster and 
with a smaller footprint while retaining all of the current 
functionality.  here is one of my ideas.  Please tell me if I am crazy:

The strings that we are storing (mentioned in 3 above) are extremely 
repetitive. for example, there are a limited number of permissions for 
the files in the acrobat installer and we are storing this information 
over and over again in the tables. The same goes for filenames, registry 
key names and almost all of the data we are storing.  So it seems to me 
that to create a smaller and faster database we could assign an integer 
to each string and just store the integer representation of the string 
rather than the string itself.  Then we would just store the strings in 
a separate table one time and do join queries against the tables that 
are holding the strings and the main data tables.  for example,

a table that would hold unique permissions strings would look like

table:  perms_strs

string |  id
-
'drwxr-xr-x'   |   1
'-rw---'|   2
'drwxrwxr-x'  |   3
'-rw-r--r--' |   4
then in my data I would just store 1,2,3 or 4 instead of the whole 
permissions string.

it seems to me that we would save lots of space and over time not see 
the same performance degradation.

anyways,  please tell me if this makes sense and make any other 
suggestions that you can think of.  I am just now starting this analysis 
so I cannot give specifics as to where we are seeing poor performance 
just yet.  just tell me if my concepts are correct.

thanks for your time and for suffering this email.

chao,

-Shane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] design question: general db performance

2003-11-25 Thread Jord Tanner
[small chuckle]

By George, I think he's got it!

You are on the right track. Have a look at this link on database
normalization for more info:

http://databases.about.com/library/weekly/aa080501a.htm 



On Tue, 2003-11-25 at 10:42, shane hill wrote:
 Hi folks,
 
 Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
 too loudly,  you can laugh, just not too loudly and please do not point. :)
 

[snip]

-- 
Jord Tanner [EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] design question: general db performance

2003-11-25 Thread Josh Berkus
Shane,

 Disclaimer:  I am relatively new to RDBMSs, so please do not laugh at me 
 too loudly,  you can laugh, just not too loudly and please do not point. :)

Hey, we all started somewhere.  Nobody was born knowing databases.  Except 
maybe Neil Conway.

 I am working on an Automated Installer Testing System for Adobe Systems 
 and I am doing a DB redesign of the current postgres db:

Cool!We're going to want to talk to you about a case study later, if you 
can get your boss to authorize it 

 Our db is getting to be a respectable size (about 10GB right now) and is 
 growing slower and slower. 

Slower and slower?   Hmmm ... what's your VACUUM. ANALYZE  REINDEX schedule?  
What PG version?  What are your postgresql.conf settings?   Progressive 
performance loss may indicate a problem with one or more of these things ...

 then in my data I would just store 1,2,3 or 4 instead of the whole 
 permissions string.
 
 it seems to me that we would save lots of space and over time not see 
 the same performance degradation.

Yes, this is a good idea.   Abstracting other repetitive data is good too.  
Also keep in mind that the permissions themselves can be represented as octal 
numbers instead of strings, which takes less space.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread LIANHE SHAO
Hi all,

I want to use index on the gene_symbol column in my
query and gene_symbol is indexed. but when I use
lower (gene_symbol) like lower('%mif%'), the index
is not used. While when I change to
lower(gene_symbol) = lower('mif'), the index is used
and index scan works, but this is not what I like. I
want all the gene_symbols  containing substring
'mif' are pulled out, and not necessarily exactly match.

could anybody give me some hints how to deal with 
this. If I do not used index, it take too long for
the query.

   
PGA explain select distinct probeset_id, chip,
gene_symbol, title, sequence_description, pfam from
affy_array_annotation where lower(gene_symbol) like
upper('%mif%');
   QUERY PLAN
-
 Unique  (cost=29576.44..29591.44 rows=86 width=265)
   -  Sort  (cost=29576.44..29578.59 rows=857
width=265)
 Sort Key: probeset_id, chip, gene_symbol,
title, sequence_description, pfam
 -  Seq Scan on affy_array_annotation 
(cost=0.00..29534.70 rows=857 width=265)
   Filter: (lower((gene_symbol)::text)
~~ 'MIF%'::text)
(5 rows)


PGA= explain select distinct probeset_id, chip,
gene_symbol, title, sequence_description, pfam from
affy_array_annotation where lower(gene_symbol) =
upper('%mif%');
   
 QUERY PLAN
-
 Unique  (cost=3433.44..3448.44 rows=86 width=265)
   -  Sort  (cost=3433.44..3435.58 rows=857 width=265)
 Sort Key: probeset_id, chip, gene_symbol,
title, sequence_description, pfam
 -  Index Scan using gene_symbol_idx_fun1
on affy_array_annotation  (cost=0.00..3391.70
rows=857 width=265)
   Index Cond:
(lower((gene_symbol)::text) = '%MIF%'::text)
(5 rows)





Regards,
William


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Dror Matalon

Hi,

Searches with like or regexes often can't use the index. Think of the index as
a sorted list of your items. It's easy to find an item when you know it
starts with mif so ('mif%' should use the index). But when you use a
'like' that starts with '%' the index is useless and the search needs to
do a sequential scan.

Regards,

Dror

On Tue, Nov 25, 2003 at 07:48:49PM +, LIANHE SHAO wrote:
 Hi all,
 
 I want to use index on the gene_symbol column in my
 query and gene_symbol is indexed. but when I use
 lower (gene_symbol) like lower('%mif%'), the index
 is not used. While when I change to
 lower(gene_symbol) = lower('mif'), the index is used
 and index scan works, but this is not what I like. I
 want all the gene_symbols  containing substring
 'mif' are pulled out, and not necessarily exactly match.
 
 could anybody give me some hints how to deal with 
 this. If I do not used index, it take too long for
 the query.
 

 PGA explain select distinct probeset_id, chip,
 gene_symbol, title, sequence_description, pfam from
 affy_array_annotation where lower(gene_symbol) like
 upper('%mif%');
QUERY PLAN
 -
  Unique  (cost=29576.44..29591.44 rows=86 width=265)
-  Sort  (cost=29576.44..29578.59 rows=857
 width=265)
  Sort Key: probeset_id, chip, gene_symbol,
 title, sequence_description, pfam
  -  Seq Scan on affy_array_annotation 
 (cost=0.00..29534.70 rows=857 width=265)
Filter: (lower((gene_symbol)::text)
 ~~ 'MIF%'::text)
 (5 rows)
 
 
 PGA= explain select distinct probeset_id, chip,
 gene_symbol, title, sequence_description, pfam from
 affy_array_annotation where lower(gene_symbol) =
 upper('%mif%');

  QUERY PLAN
 -
  Unique  (cost=3433.44..3448.44 rows=86 width=265)
-  Sort  (cost=3433.44..3435.58 rows=857 width=265)
  Sort Key: probeset_id, chip, gene_symbol,
 title, sequence_description, pfam
  -  Index Scan using gene_symbol_idx_fun1
 on affy_array_annotation  (cost=0.00..3391.70
 rows=857 width=265)
Index Cond:
 (lower((gene_symbol)::text) = '%MIF%'::text)
 (5 rows)
 
 
 
 
 
 Regards,
 William
 
 
 ---(end of broadcast)---
 TIP 8: explain analyze is your friend

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] [Fwd: Re: Optimize]

2003-11-25 Thread Gaetano Mendola
Torsten Schulz wrote:
Hi,

You can see doing   select * from pg_stat_activity the
queries that are currently running on your server, and
do a explain analize on it to see which one is the
bottleneck. If you are running the 7.4 you can see on
the log the total ammount for each query.


with this query I see how much queries running, but the field 
current_query are free, so i can't see which queries are very slow.
You must perform that query with permission of  super_user.

Regards
Gaetano Mendola


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] why index scan not working when using 'like'?

2003-11-25 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 In regular text fields containing words, your problem is solvable with full 
 text indexing (FTI).   Unfortunately, FTI is not designed for arbitrary 
 non-language strings.  It could be adapted, but would require a lot of 
 hacking.

I'm not sure why you say that FTI isn't a usable solution.  As long as
the gene symbols are separated by whitespace or some other non-letters
(eg, foo mif bar not foomifbar), I'd think FTI would work.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Impossibly slow DELETEs

2003-11-25 Thread Stefan Champailler
Dear You all,

(please tell me if this has already been discussed, I was unable to find any 
convincing information)

I'm developing a small application, tied to a PG 7.4 beta 5 (i didn't 
upgrade). The DB i use is roughly 20 tales each of them containing at most 30 
records (I'm still in development). I can provide a whole dump if necessary.
I access the DB throug IODBC (Suse Linux 8.1), through PHP. The machine 
everything runs on is 512M of Ram, 2.5GHz speed. So I assume it should be 
blazingly fast.

So here's my trouble : some DELETE statement take up to 1 minute to complete 
(but not always, sometimes it's fast, sometimes it's that slow). Here's a 
typical one : DELETE FROM response_bool WHERE response_id = '125'
The response_bool table has no foreing key and no index on response_id column. 
No foreign key reference the response_bool table. There are 6 rows in the 
table (given that size, I assumed that an index was not necessary).

So 1 minute to complete look like I did something REALLY bad.

It is my feeling that doing the same query with psql works without problem, 
but I can't be sure. The rest of my queries (inserts, updates) just work fine 
and pretty fast.

Can someone help me or point me to a place where I can find help ? I didn't do 
any in deep debugging though.

thx,

stF



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [Fwd: Re: Optimize]

2003-11-25 Thread Torsten Schulz
Gaetano Mendola wrote:

Torsten Schulz wrote:

Hi,

You can see doing   select * from pg_stat_activity the
queries that are currently running on your server, and
do a explain analize on it to see which one is the
bottleneck. If you are running the 7.4 you can see on
the log the total ammount for each query.




with this query I see how much queries running, but the field 
current_query are free, so i can't see which queries are very slow.


You must perform that query with permission of  super_user.

I've made it in root-account with psql -U postgres - but i can't see the 
query

Regards
Torsten Schulz
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Wierd context-switching issue on Xeon

2003-11-25 Thread Josh Berkus
Tom,

 Strictly a WAG ... but what this sounds like to me is disastrously bad
 behavior of the spinlock code under heavy contention.  We thought we'd
 fixed the spinlock code for SMP machines awhile ago, but maybe
 hyperthreading opens some new vistas for misbehavior ...

Yeah, I thought of that based on the discussion on -Hackers.  But we tried 
turning off hyperthreading, with no change in behavior.

 If you can't try 7.4, or want to gather more data first, it would be
 good to try to confirm or disprove the theory that the context switches
 are coming from spinlock delays.  If they are, they'd be coming from the
 select() calls in s_lock() in s_lock.c.  Can you strace or something to
 see what kernel calls the context switches occur on?

Might be worth it ... will suggest that.  Will also try 7.4.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Impossibly slow DELETEs

2003-11-25 Thread Neil Conway
Stefan Champailler [EMAIL PROTECTED] writes:
 So here's my trouble : some DELETE statement take up to 1 minute to
 complete (but not always, sometimes it's fast, sometimes it's that
 slow). Here's a typical one : DELETE FROM response_bool WHERE
 response_id = '125' The response_bool table has no foreing key and
 no index on response_id column. No foreign key reference the
 response_bool table. 

I'm skeptical that PostgreSQL is causing the performance problem
here -- 1 minute for a DELETE on a single-page table is absurdly
slow. If you enable the log_min_duration_statement configuration
variable, you should be able to get an idea of how long it actually
takes PostgreSQL to execute each query -- do you see some 60 second
queries in the log?

What is the system load like when the query takes a long time? For
example, `vmstat 1` output around this point in time would be
helpful.

Does PostgreSQL consume a lot of CPU time or do a lot of disk I/O?

Can you confirm this problem using psql?

 There are 6 rows in the table (given that size, I assumed that an
 index was not necessary).

That's a reasonable assumption.

-Neil


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Impossibly slow DELETEs

2003-11-25 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 There are 6 rows in the table (given that size, I assumed that an
 index was not necessary).

 That's a reasonable assumption.

But if he's updated those rows a few hundred thousand times and never
VACUUMed, he could be having some problems ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Optimize

2003-11-25 Thread Shridhar Daithankar
Torsten Schulz wrote:

Chester Kustarz wrote:

On Mon, 24 Nov 2003, Torsten Schulz wrote:
shared_buffers = 5000# 2*max_connections, min 16
that looks pretty small. that would only be 40MBytes (8k/page * 
5000pages).
http://www.varlena.com/GeneralBits/Tidbits/perf.html
Ok, thats it. I've set it to 51200, now it seems to be very fast.
Whoa..That is too much. You acn get still better performance at something low 
like 10,000 or even 5000.

Bumping up shared buffers stops being useful after a point and later it actually 
degrades the performance..

 Shridhar

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match