Re: [PERFORM] Simply join in PostrgeSQL takes too long

2004-04-27 Thread Nick Barr
Vitaly Belman wrote:
Hello pgsql-performance,
  I discussed the whole subject for some time in DevShed and didn't
  achieve much (as for results). I wonder if any of you guys can help
  out:
  http://forums.devshed.com/t136202/s.html
So cutting and pasting:
- SCHEMA -
CREATE TABLE bv_bookgenres (
book_id INT NOT NULL,
genre_id INT NOT NULL
);
CREATE TABLE bv_genre_children (
genre_id INT,
genre_child_id INT
);
---
- QUERY -
select DISTINCT
  book_id
from
  bookgenres,
  genre_children
WHERE
 bookgenres.genre_id = genre_children.genre_child_id AND
 genre_children.genre_id = 1
LIMIT 10
-
- EXPLAIN ANALYZE -
QUERY PLAN
Limit  (cost=6503.51..6503.70 rows=10 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
  -  Unique  (cost=6503.51..6738.20 rows=12210 width=4) (actual 
time=703.000..703.000 rows=10 loops=1)
-  Sort  (cost=6503.51..6620.85 rows=46937 width=4) (actual 
time=703.000..703.000 rows=24 loops=1)
  Sort Key: bv_bookgenres.book_id
  -  Merge Join  (cost=582.45..2861.57 rows=46937 width=4) 
(actual time=46.000..501.000 rows=45082 loops=1)
Merge Cond: (outer.genre_id = inner.genre_child_id)
-  Index Scan using genre_id on bv_bookgenres 
(cost=0.00..1462.84 rows=45082 width=8) (actual time=0.000..158.000 
rows=45082 loops=1)
-  Sort  (cost=582.45..598.09 rows=6256 width=2) 
(actual time=46.000..77.000 rows=49815 loops=1)
  Sort Key: bv_genre_children.genre_child_id
  -  Index Scan using genre_id2 on 
bv_genre_children  (cost=0.00..187.98 rows=6256 width=2) (actual 
time=0.000..31.000 rows=6379 loops=1)
Index Cond: (genre_id = 1)
Total runtime: 703.000 ms
---

- CONF SETTINGS -
shared_buffers = 1000   # min 16, at least max_connections*2, 8KB each
sort_mem = 1
#work_mem = 1024# min 64, size in KB
#maintenance_work_mem = 16384   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB
-
Have you VACUUM ANALYZED recently. If not do that then rerun the EXPLAIN 
ANALYZE.

You might wanna bump shared_buffers. You have 512MB RAM right? You 
probably want to bump shared_buffers to 1, restart PG then run a 
VACUUM ANALYZE. Then rerun the EXPLAIN ANALYZE.

If that doesnt help try doing a
ALTER TABLE bv_genre_children ALTER COLUMN genre_child_id SET STATISTICS 
100;

followed by a:
VACUUM ANALYZE bv_genre_children;
You might also want to be tweaking the effective_cache_size parameter in 
 postgresql.conf, but I am unsure how this would work on Windows. Does 
Windows have a kernel disk cache anyone?


HTH
Nick


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] slow seqscan

2004-04-21 Thread Nick Barr
Edoardo Ceccarelli wrote:

My first post to this list :)

Scenario:
I have a database used only with search queries with only one table that
holds about 450.000/500.000 records.
The table is well indexed so that most of the queries are executed with
index scan but since there is a big text field in the table (360chars)
some search operation (with certain filters) ends up with seq scans.
This table is not written during normal operation: twice per week there
is a batch program that insert about 35.000 records and updates another
40.000.
last friday morning, after that batch has been executed, the database 
started responding really slowly to queries (expecially seq scans), 
after a vacuum full analize things did get something better.
Yesterday the same: before the batch everything was perfect, after 
every query was really slow, I've vacuum it again and now is ok.
Since now the db was working fine, it's 4 month's old with two updates 
per week and I vacuum about once per month.

I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
about setting this table in a kind of  read-only mode to improve
performance, is this possible?
Thank you for your help
Edoardo Ceccarelli
---(end of broadcast)---
TIP 8: explain analyze is your friend
In general we are going to need more information, like what kind of 
search filters you are using on the text field and an EXPLAIN ANALYZE. 
But can you try and run the following, bearing in mind it will take a 
while to complete.

REINDEX TABLE table_name

From what I remember there were issues with index space not being 
reclaimed in a vacuum. I believe this was fixed in 7.4. By not 
reclaiming the space the indexes grow larger and larger over time, 
causing PG to prefer a sequential scan over an index scan (I think).

Hope that helps

Nick



---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] MySQL vs PG TPC-H benchmarks

2004-04-21 Thread Nick Barr
Hi,

Has anyone had a look at:

http://people.ac.upc.es/zgomez/

I realize that MySQL  PG cannot really be compared (especially when you 
consider the issues that MySQL has with things like data integrity) but 
still surely PG would perform better than the stats show (i.e. #7 31.28 
seconds versus 42 minutes!!!).

On a side note it certainly looks like linux kernel 2.6 is quite a bit 
faster in comparision to 2.4.

Nick



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Slow join using network address function

2004-02-24 Thread Nick Barr
Tom Lane wrote:

Eric Jain [EMAIL PROTECTED] writes:
 

http://word-to-the-wise.com/ipr.tgz is a datatype that contains 
a range of IPv4 addresses, and which has the various operators to 
make it GIST indexable.
 

 

Great, this looks very promising.
   

 

No cast operators between ipr and inet types.
 

 

Any way to work around this, short of dumping and reloading tables?
   

Wouldn't it be better to implement the GIST indexing operators of that
package on the standard datatypes?  It wasn't apparent to me what range
of IP addresses does for you that isn't covered by CIDR subnet for
real-world cases.
			regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend
 

We currently only allow access to one of our apps based on IP address. 
These IPs are stored one per row in a single table, but often represent 
a contiguous piece of IP space, but does not represent a full subnet. 
The current CIDR subnet has the limitation that it will only allow full 
subnets, i.e. every IP address in 192.168.1.0/24. For example:

192.168.1.15 - 192.168.1.31

This range cannot be represented by a CIDR subnet, or it might be able 
to but I really dont want to figure it out each time. However this new 
type allows us to store this range as one row. It allows an arbitrary 
range of IP addresses, not just those in a specific subnet. I would see 
this as a useful inclusion whether in the main src tree or in contrib 
and we will probably be using it when we get to mess with the database 
schema for this app in the next few months, in fact I have already 
inserted it into our PG source tree ;-).

Nick

P.S. We are not responsible for the IP address ranges, we just get told 
what they are.





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Postgresql on SAN

2004-02-19 Thread Nick Barr
Josh Berkus wrote:

Anjan,

 

Has anyone designed/implemented postgresql server on storage networks?
   

Yes, Zapatec.com runs their stuff this way.  Probably others as well.

 

Are there any design considerations?
   

I don't know.   Probably.

 

Are there any benchmarks for storage products (HBAs, Switches, Storage
Arrays)?
   

Not specific to PostgreSQL.I'm sure there are generic benchmarks.   Keep 
in mind that PostgreSQL needs lots of 2-way I/O, batch writes, and random 
reads.

 

Any recommendation on the design, resources, references, keeping PG in
mind?
   

See above.   Also keep in mind that PostgreSQL's use of I/O should improve 
100% in version 7.5.

 

We run PG on a SAN array. We currently have it setup so a single PG 
instance runs off of a single LUN, this includes the WAL logs. Apart 
from that we have made no other special considerations; we just treat it 
as a fast RAID array. We haven't got to the stage where the speed of the 
SAN is a problem as load hasn't increased as expected. This will change, 
when it does I am sure the performance list will be hearing from us ;-). 
Out current limitations, as I see it, are amount of memory and then 
processing power. The only problem we have had was a dodgy set of kernel 
modules (drivers) for the fibre cards, this was because they were beta 
drivers and obviously still had a few bugs. This was solved by reverting 
to an older version. Everything has run smoothly since then (uptime is 
153 days :-)).

Nick



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Linux / Clariion

2004-01-28 Thread Nick Barr
Cott Lang wrote:

Anybody used Linux with EMC Clariions for PG databases?

Any good war stories, pros, cons, performance results ?

I'm wearing thin on my 6 disk 0+1 configuration and looking for
something beefy, possibly for clustering, and I'm wondering what the net
wisdom is. :)
thanks!



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match
 

Heya,

We are currently using a Dell badged Clariion FC4500 running at 1Gb 
fibre channel. It has 20 HDD in there at the moment, and another 5 disks 
are coming in the next couple of weeks. These disks are split into 
several RAID 5 arrays, each about 200Gb. We are using QLogic HBA's under 
RH Linux. We have had a couple of problems, one driver based which was 
resolved by a kernel upgrade. They also dont seem to like changing IP 
addresses of the servers, Navisphere wanted the servers to be 
reregistered before it started working properly.

In terms of performance, its 1Gb fibre to the disk, U320 SCSI hot swap, 
10k disks. They run very fast and apart from the configuration issues 
above have never given us any grief. The LUN's have been running 
flawlessly for over a year (touch wood). We just need some beefier boxes 
to take advantage of their speed. I am thinking of proposing one or more 
Quad Opterons with 32Gb RAM ;-) That should do the trick i reckon.

We can try and run some benchmarks on one of the spare machines if you 
want, if so send through some examples of pg_bench parameters that are 
appropriate. Anyone got any useful sets? We are currently running PG 7.3 
(7.3.1 or 7.3.2 I think) at the moment on that box.

Trouble is with this model of Clariion, EMC is apparently setting EOL 
for 2005 sometime, which we aint to pleased about. Never mind, hopefully 
we will have some more money by that time..

Any other info just say and I will see what I can dig up.

Nick Barr



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Persistent Connections

2004-01-25 Thread Nick Barr
Hi,

[EMAIL PROTECTED] wrote:

Hi

I have a php script and i make a pg_pconnect
If i want to make 4-10 pg_query in that script
Have i to close the connection at end of the script?
(i would say yes, is it right?)


If you want to make multiple pg_query's in a page you can, and you can 
use the same connection. You dont have to use persistent connections for 
this. Just open the connection and fire off the different queries. The 
persistent connection remains open between different pages loading, 
which is supposedly faster because you dont have the overhead of opening 
the connection.

If you want to use a persistent connection then definitely dont close it 
at the bottom of the page. If you want to use the other connection 
(pg_connect, non-persistent) then you dont have to close this connection 
at the bottom of the page because PHP does it for you, although you can 
if you are feeling nice ;-).

Sorry I m a little bit confused about the persistent thing!!
Is it smart to use persistent connections at all if i expect 100K 
Users to hit the script in an hour and the script calls up to 10-15 pg 
functions?
I have at the mom one function but the server needs 500 ms, its a 
little bit too much i think, and it crashed when i had 20K users

Use the persistent connection but make sure the parameters in 
postgresql.conf match up with the Apache config. The specific settings 
are MaxClients in httpd.conf and max_connections in postgresql.conf. 
Make sure that max_connections is at least as big as MaxClients for 
every database that your PHP scripts connect to.

Thanks
Bye




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] 100 simultaneous connections, critical limit?

2004-01-14 Thread Nick Barr
 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Jón Ragnarsson
 Sent: 14 January 2004 13:44
 Cc: [EMAIL PROTECTED]
 Subject: Re: [PERFORM] 100 simultaneous connections, critical limit?
 
 Ok, connection pooling was the thing that I thought of first, but I
 haven't found any docs regarding pooling with PHP+Postgres.
 OTOH, I designed the application to be as independent from the DB as
 possible. (No stored procedures or other Postgres specific stuff)
 Thanks,
 J.

As far as I know PHP supports persistent connections to a PG database.
See pg_pconnect instead of pg_connect. Each of the db connections are
tied to a particular Apache process and will stay open for the life of
that process. So basically make sure your Apache config file
(httpd.conf) and PG config file (postgresql.conf) agree on the maximum
number of connections otherwise some pages will not be able to connect
to your database.

This may not be a problem for small sites but on large sites it is, with
heavy loads and large number of concurrent users. For example, consider
a site that must support 500 concurrent connections. If persistent
connections are used at least 500 concurrent connections to PG would be
required, which I guess is probably not recommended.

The way I would like Apache/PHP to work is to have a global pool of
connections to a postgres server, which can be shared around all Apache
processes. This pool can be limited to say 50 or 100 connections.
Problems occur under peak load where all 500 concurrent connections are
in use, but all that should happen is there is a bit of a delay.

Hope that (almost) makes sense,


Kind Regards,

Nick Barr
WebBased Ltd.


 Christopher Browne wrote:
 
  Clinging to sanity, [EMAIL PROTECTED] (Jón Ragnarsson) mumbled
into
 her beard:
 
 I am writing a website that will probably have some traffic.
 Right now I wrap every .php page in pg_connect() and pg_close().
 Then I read somewhere that Postgres only supports 100 simultaneous
 connections (default). Is that a limitation? Should I use some other
 method when writing code for high-traffic website?
 
 
  I thought the out-of-the-box default was 32.
 
  If you honestly need a LOT of connections, you can configure the
  database to support more.  I upped the limit on one system to have
  512 the other week; certainly supportable, if you have the RAM for
it.
 
  It is, however, quite likely that the connect()/close() cuts down on
  the efficiency of your application.  If PHP supports some form of
  connection pooling, you should consider using that, as it will cut
  down _dramatically_ on the amount of work done establishing/closing
  connections, and should let your apps use somewhat fewer connections
  more effectively.
 
 
 ---(end of
broadcast)---
 TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] IDE Hardware RAID Controller

2003-11-14 Thread Nick Barr
Heya,

FYI just spotted this and thought I would pass it on, for all those who are
looking at new boxes.

http://www.theinquirer.net/?article=12665
http://www.promise.com/product/product_detail_eng.asp?productId=112familyId
=2

Looks like a four-channel hot-swap IDE (SATA) hardware RAID controller with
up to 256Mb onboard RAM.


Nick





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-10 Thread Nick Barr
Heya Guys n Gals,

Having been following the thread on go for a script! / ex: PostgreSQL vs.
MySQL. I thought I would throw something together in Perl. My current issue
is that I only have access to a RH Linux box and so cannot make it
cross-platform on my own :-(. Anyhow please find it attached. It runs fine
on my box, it doesnt actually write to postgresql.conf because I didnt want
to mess it up, it does however write to postgresql.conf.new for the moment.
The diffs seem to be writing correctly. There are a set of parameters at the
top which may need to get tweaked for your platform. I can also carry on
posting to this list new versions if people want. Clearly this lot is open
source, so please feel free to play with it and post patches/new features
back either to the list or my email directly. In case you cant see my email
address, it is nicky at the domain below.

 I will also post it on me website and as I develop it further new versions
will appear there

http://www.chuckie.co.uk/postgresql/pg_autoconfig.pl

Is this a useful start?


Nick


pg_autoconfig.pl
Description: Binary data

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] go for a script! / ex: PostgreSQL vs. MySQL

2003-10-10 Thread Nick Barr
- Original Message -
From: Nick Barr [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Friday, October 10, 2003 1:35 PM
Subject: go for a script! / ex: PostgreSQL vs. MySQL


  I will also post it on me website and as I develop it further new
versions
 will appear there

 http://www.chuckie.co.uk/postgresql/pg_autoconfig.pl

Make that

http://www.chuckie.co.uk/postgresql/pg_autoconfig.txt


Nick





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Effective Cache Size

2003-09-17 Thread Nick Barr
Hi,

I have been following a thread on this list Inconsistent performance
and had a few questions especially the bits about effective_cache_size.
I have read some of the docs, and some other threads on this setting,
and it seems to used by the planner to either choose a sequential or
index scan. So it will not necessarily increase performance I suppose
but instead choose the most optimal plan. Is this correct?

We are not that we are suffering massive performance issues at the
moment but it is expected that our database is going to grow
considerably in the next couple of years, both in terms of load and
size.

Also what would an appropriate setting be? 

From what I read of Scott Marlowes email, and from the information below
I reckon it should be somewhere in the region of 240,000. 

Danger maths ahead. Beware

maths
  141816K  buff
+ 1781764K cached
-
  1923580K total

effective_cache_size = 1923580 / 8 = 240447.5
/maths

Here is some information on the server in question. If any more
information is required then please say. It is a dedicated PG machine
with no other services being hosted off it. As you can see from the
uptime, its load average is 0.00, and is currently so chilled its almost
frozen! That will change though :-(


Hardware

Dual PIII 1.4GHz
2Gb RAM
1Tb SAN with hardware RAID 5 using 1Gbps Fibre channel.


OS
==
Linux webbasedth5 2.4.18-18.7.xsmp #1 SMP Wed Nov 13 19:01:42 EST 2002
i686
Red Hat Linux release 7.3 (Valhalla)


PG
==
PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC 2.96


Database

This includes all indexes and tables. I can provide more information on
how this is chopped up if needed.

Size   : 1,141.305 Mb
Tuples : 13,416,397


Uptime
==
11:15am  up 197 days, 16:50,  1 user,  load average: 0.00, 0.00, 0.00


Top
===
Mem:  2064836K av, 2018648K used,   46188K free,   0K shrd,  141816K
buff
Swap: 2096472K av,4656K used, 2091816K free 1781764K
cached


Postgresql.conf (all defaults except)
=
max_connections = 1000
shared_buffers = 16000 (128 Mb)
max_fsm_relations = 5000
max_fsm_pages = 50
vacuum_mem = 65535



Kind Regards,

Nick Barr


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.






---(end of broadcast)---
TIP 8: explain analyze is your friend