Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
> I have a dual processor system that can support over 150 concurrent 
> connections handling normal traffic and load.   Now suppose I setup 
> Apache to spawn all of it's children instantly, what will 
...
> This will spawn 150 children in a short order of time and as 
> this takes 

"Doctor, it hurts when I do this!"
"Well, don't do that then..."

Sorry, couldn't resist ;-)

Our Apache/PG driven website also needs to be able to deal with occasional
large peaks, so what we do is:

StartServers 15 # Don't create too many children initially
MinSpareServers 10  # Always have at least 10 spares lying around
MaxSpareServers 20  # But no more than 20
MaxClients 150  # Up to 150 - the default 256 is too much for our
RAM


So on server restart 15 Apache children are created, then one new child
every second up to a maximum of 150.

Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
scope for queuing inbound requests while we wait for sufficient children to
be spawned.

In addition we (as _every_ high load site should) run Squid as an
accelerator, which dramatically increases the number of client connections
that can be handled.  Across 2 webservers at peak times we've had 50,000
concurrently open http & https client connections to Squid, with 150 Apache
children doing the work that squid can't (i.e. all the dynamic stuff), and
PG (on a separate box of course) whipping through nearly 800 mixed selects,
inserts and updates per second - and then had to restart Apache on one of
the servers for a config change...  Not a problem :-)

One little tip - if you run squid on the same machine as apache, and use a
dual-proc box, then because squid is single-threaded it will _never_ take
more than half the CPU - nicely self balancing in a way.

M


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] index not used if using IN or OR

2004-11-04 Thread Richard Huxton
Mario Ivankovits wrote:
Hello !
Sorry if this has been discussed before, it is just hard to find in the 
archives using the words "or" or "in" :-o

I use postgres-8.0 beta4 for windows.
I broke down my problem to a very simple table - two columns 
"primary_key" and "secondary_key". Creates and Insert you will find below.

If I query the _empty_ freshly created table I get the following explain 
result:

select * from tt where seckey = 1;
Index Scan using seckey_key on tt  (cost=0.00..17.07 rows=5 width=12)
 Index Cond: (seckey = 1)
If I use "OR" (or IN) things get worse:
select * from tt where seckey = 1 or seckey = 2
Seq Scan on tt  (cost=0.00..0.00 rows=1 width=12)
 Filter: ((seckey = 1) OR (seckey = 2))
Note the "Seq Scan" instead of using the index.
But as you said, your table is *empty* - why would an index be faster? 
Try running EXPLAIN ANALYSE on these queries and look at the actual times.

After populating the table with 8920 records and "analyze" the scenario 
gets even worser:

select * from tt where seckey = 1;
Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual 
time=0.000..15.000 rows=1784 loops=1)
 Filter: (seckey = 1)
Total runtime: 31.000 ms

Now also this simple query uses a "Seq Scan".
Well, it thinks it's going to be returning 1669 rows. If that's roughly 
right, then scanning the table probably is faster.

Run the queries again with EXPLAIN ANALYSE. Also try issuing
  set enable_seqscan=false;
This will force the planner to use any indexes it finds. Compare the 
times with and without, and don't forget to account for the effects of 
caching.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Markus Schaber
Hi, Leeuw,

On Thu, 21 Oct 2004 12:44:10 +0200
"Leeuw van der, Tim" <[EMAIL PROTECTED]> wrote:

> (I'm not sure if it's a good idea to create a PG-specific FS in your
> OS of choice, but it's certainly gonna be easier than getting FS code
> inside of PG)

I don't think PG really needs a specific FS. I rather think that PG
could profit from some functionality that's missing in traditional UN*X
file systems.

posix_fadvise(2) may be a candidate. Read/Write bareers another pone, as
well asn syncing a bunch of data in different files with a single call
(so that the OS can determine the best write order). I can also imagine
some interaction with the FS journalling system (to avoid duplicate
efforts).

We should create a list of those needs, and then communicate those to
the kernel/fs developers. Then we (as well as other apps) can make use
of those features where they are available, and use the old way
everywhere else.

Maybe Reiser4 is a step into the right way, and maybe even a postgres
plugin for Reiser4 will be worth the effort. Maybe XFS/JFS etc. already
have such capabilities. Maybe that's completely wrong.

cheers,
Markus

-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Simon Riggs
On Wed, 2004-11-03 at 21:25, Martin Foster wrote:
> Simon Riggs wrote:
> > On Tue, 2004-11-02 at 23:52, Martin Foster wrote:
> > 
> >>Is there a way to restrict how much load a PostgreSQL server can take 
> >>before dropping queries in order to safeguard the server?I was 
> >>looking at the login.conf (5) man page and while it allows me to limit 
> >>by processor time this seems to not fit my specific needs.
> >>
> >>Essentially, I am looking for a sort of functionality similar to what 
> >>Sendmail and Apache have.   Once the load of the system reaches a 
> >>certain defined limit the daemon drops tasks until such a time that it 
> >>can resume normal operation.
> > 
> > 
> > Sounds great... could you give more shape to the idea, so people can
> > comment on it?
> > 
> > What limit? Measured how? Normal operation is what?
> > 
> > Drop what? How to tell?
> > 
> > 
> 
> Let's use the example in Apache, there is the Apache::LoadAvgLimit 
> mod_perl module which allows one to limit based on the system load 
> averages.   Here is an example of the configuration one would find:
> 
>
>  PerlInitHandler Apache::LoadAvgLimit
>  PerlSetVar LoadAvgLimit_1 3.00
>  PerlSetVar LoadAvgLimit_5 2.00
>  PerlSetVar LoadAvgLimit_15 1.50
>  PerlSetVar LoadAvgRetryAfter 120
>
> 
> The end state is simple, once the load average moves above 3.00 for the 
> 1 minute average the web server will not process the CGI scripts or 
> mod_perl applications under that directory.  Instead it will return a 
> 503 error and save the system from being crushed by ever increasing load 
> averages.
> 
> Only once the load average is below the defined limits will the server 
> process requests as normal.   This is not necessarily the nicest or 
> cleanest way or doing things, but it does allow the Apache web server to 
> prevent a collapse.
> 
> There are ways of restricting the size of files, number of concurrent 
> processes and even memory being used by a daemon.  This can be done 
> through ulimit or the login.conf file if your system supports it. 
> However, there is no way to restrict based on load averages, only 
> processor time which is ineffective for a perpetually running daemon 
> like PostgreSQL has.
> 

All workloads are not created equally, so mixing them can be tricky.
This will be better in 8.0 because seq scans don't spoil the cache.

Apache is effectively able to segregate the workloads because each
workload is "in a directory". SQL isn't stored anywhere for PostgreSQL
to say "just those ones please", so defining which statements are in
which workload is the tricky part.

PostgreSQL workload management could look at userid, tables, processor
load (?) and estimated cost to decide what to do.

There is a TODO item on limiting numbers of connections per
userid/group, in addition to the max number of sessions per server.

Perhaps the easiest way would be to have the Apache workloads segregated
by PostgreSQL userid, then limit connections to each.

> For example using Apache::DBI or pgpool the DBMS may be required to 
> spawn a great deal of child processed in a short order of time.   This 
> in turn can cause a major spike in processor load and if unchecked by 
> running as high demand queries the system can literally increase in load 
> until the server buckles.
> 

That's been nicely covered off by John and Matt on the other threads, so
you're sorted out for now and doesn't look like a bug in PostgreSQL.

> Of course, I am not blaming PostgreSQL, there are probably some 
> instabilities in the AMD64 port of FreeBSD 5.2.1 for dual processor 
> systems that lead to an increased chance of failure instead of recovery. 

Good!

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Pierre-Frédéric Caillaud

posix_fadvise(2) may be a candidate. Read/Write bareers another pone, as
well asn syncing a bunch of data in different files with a single call
(so that the OS can determine the best write order). I can also imagine
some interaction with the FS journalling system (to avoid duplicate
efforts).
	There is also the fact that syncing after every transaction could be  
changed to syncing every N transactions (N fixed or depending on the data  
size written by the transactions) which would be more efficient than the  
current behaviour with a sleep. HOWEVER suppressing the sleep() would lead  
to postgres returning from the COMMIT while it is in fact not synced,  
which somehow rings a huge alarm bell somewhere.

	What about read order ?
	This could be very useful for SELECT queries involving indexes, which in  
case of a non-clustered table lead to random seeks in the table.
	There's fadvise to tell the OS to readahead on a seq scan (I think the OS  
detects it anyway), but if there was a system call telling the OS "in the  
next seconds I'm going to read these chunks of data from this file (gives  
a list of offsets and lengths), could you put them in your cache in the  
most efficient order without seeking too much, so that when I read() them  
in random order, they will be in the cache already ?". This would be an  
asynchronous call which would return immediately, just queuing up the data  
somewhere in the kernel, and maybe sending a signal to the application  
when a certain percentage of the data has been cached.
	PG could take advantage of this with not much code changes, simply by  
putting a fifo between the index scan and the tuple fetches, to wait the  
time necessary for the OS to have enough reads to cluster them efficiently.
	On very large tables this would maybe not gain much, but on tables which  
are explicitely clustered, or naturally clustered like accessing an index  
on a serial primary key in order, it could be interesting.

Just a thought.
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
I have a dual processor system that can support over 150 concurrent 
connections handling normal traffic and load.   Now suppose I setup 
Apache to spawn all of it's children instantly, what will 
...
This will spawn 150 children in a short order of time and as 
this takes 

"Doctor, it hurts when I do this!"
"Well, don't do that then..."
Sorry, couldn't resist ;-)
Our Apache/PG driven website also needs to be able to deal with occasional
large peaks, so what we do is:
StartServers 15 # Don't create too many children initially
MinSpareServers 10  # Always have at least 10 spares lying around
MaxSpareServers 20  # But no more than 20
MaxClients 150  # Up to 150 - the default 256 is too much for our
RAM
So on server restart 15 Apache children are created, then one new child
every second up to a maximum of 150.
Apache's 'ListenBackLog' is around 500 by default, so there's plenty of
scope for queuing inbound requests while we wait for sufficient children to
be spawned.
In addition we (as _every_ high load site should) run Squid as an
accelerator, which dramatically increases the number of client connections
that can be handled.  Across 2 webservers at peak times we've had 50,000
concurrently open http & https client connections to Squid, with 150 Apache
children doing the work that squid can't (i.e. all the dynamic stuff), and
PG (on a separate box of course) whipping through nearly 800 mixed selects,
inserts and updates per second - and then had to restart Apache on one of
the servers for a config change...  Not a problem :-)
One little tip - if you run squid on the same machine as apache, and use a
dual-proc box, then because squid is single-threaded it will _never_ take
more than half the CPU - nicely self balancing in a way.
M
I've heard of the merits of Squid in the use as a reverse proxy. 
However, well over 99% of my traffic is dynamic, hence why I may be 
experiencing behavior that people normally do not expect.

As I have said before in previous threads, the scripts are completely 
database driven and at the time the database averaged 65 queries per 
second under MySQL before a migration, while the webserver was averaging 
2 to 4.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Simon Riggs wrote

All workloads are not created equally, so mixing them can be tricky.
This will be better in 8.0 because seq scans don't spoil the cache.
Apache is effectively able to segregate the workloads because each
workload is "in a directory". SQL isn't stored anywhere for PostgreSQL
to say "just those ones please", so defining which statements are in
which workload is the tricky part.
PostgreSQL workload management could look at userid, tables, processor
load (?) and estimated cost to decide what to do.
There is a TODO item on limiting numbers of connections per
userid/group, in addition to the max number of sessions per server.
Perhaps the easiest way would be to have the Apache workloads segregated
by PostgreSQL userid, then limit connections to each.
Apache has a global setting for load average limits, the above was just 
a module which extended the capability.  It might also make sense to 
have limitations set on schema's which can be used in a similar way to 
Apache directories.

While for most people the database protecting itself against a sudden 
surge of high traffic would be undesirable.   It can help those who run 
dynamically driven sites and get slammed by Slashdot for example.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] preloading indexes

2004-11-04 Thread Andrew Sullivan
On Wed, Nov 03, 2004 at 03:53:16PM -0500, Andrew Sullivan wrote:
> and may bust your query out of the cache.  Also, we'd need some more

Uh, the data you're querying, of course.  Queries themselves aren't
cached.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
--J.D. Baldwin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] index not used if using IN or OR

2004-11-04 Thread Tom Lane
Mario Ivankovits <[EMAIL PROTECTED]> writes:
> After populating the table with 8920 records and "analyze" the scenario 
> gets even worser:

> select * from tt where seckey = 1;
> Seq Scan on tt  (cost=0.00..168.50 rows=1669 width=12) (actual 
> time=0.000..15.000 rows=1784 loops=1)
>   Filter: (seckey = 1)
> Total runtime: 31.000 ms

> Now also this simple query uses a "Seq Scan".

Which is exactly what it *should* do, considering that it is selecting
1784 out of 8920 records.  Indexscans only win for small selectivities
--- the rule of thumb is that retrieving more than about 1% of the
records should use a seqscan.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Kevin Barnard wrote:
I am generally interested in a good solution for this.  So far our
solution has been to increase the hardware to the point of allowing
800 connections to the DB.
I don't have the mod loaded for Apache, but we haven't had too many
problems there.  The site is split pretty good between dynamic and
non-dynamic, it's largely Flash with several plugins to the DB. 
However we still can and have been slammed and up to point of the 800
connections.

What I don't get is why not use pgpool?  This should eliminate the
rapid fire forking of postgres instanaces in the DB server.  I'm
assuming you app can safely handle a failure to connect to the DB
(i.e. exceed number of DB connections).  If not it should be fairly
simple to send a 503 header when it's unable to get the connection.
Note, that I am not necessarily looking for a PostgreSQL solution to the 
matter.  Just a way to prevent the database from killing off the server 
it sits on, but looking at the load averages.

I have attempted to make use of pgpool and have had some very poor 
performance.   There were constant error messages being sounded, load 
averages on that machine seemed to skyrocket and it just seemed to not 
be suited for my needs.

Apache::DBI overall works better to what I require, even if it is not a 
pool per sey.   Now if pgpool supported variable rate pooling like 
Apache does with it's children, it might help to even things out.  That 
and you'd still get the spike if you have to start the webserver and 
database server at or around the same time.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Chris Browne
[EMAIL PROTECTED] (Pierre-Frédéric Caillaud) writes:
>> posix_fadvise(2) may be a candidate. Read/Write bareers another pone, as
>> well asn syncing a bunch of data in different files with a single call
>> (so that the OS can determine the best write order). I can also imagine
>> some interaction with the FS journalling system (to avoid duplicate
>> efforts).
>
>   There is also the fact that syncing after every transaction
> could be  changed to syncing every N transactions (N fixed or
> depending on the data  size written by the transactions) which would
> be more efficient than the  current behaviour with a sleep. HOWEVER
> suppressing the sleep() would lead  to postgres returning from the
> COMMIT while it is in fact not synced,  which somehow rings a huge
> alarm bell somewhere.
>
>   What about read order ?
>   This could be very useful for SELECT queries involving
> indexes, which in  case of a non-clustered table lead to random seeks
> in the table.

Another thing that would be valuable would be to have some way to say:

  "Read this data; don't bother throwing other data out of the cache
   to stuff this in."

Something like a "read_uncached()" call...

That would mean that a seq scan or a vacuum wouldn't force useful data
out of cache.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/linuxxian.html
A VAX is virtually a computer, but not quite.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
> Apache::DBI overall works better to what I require, even if 
> it is not a 
> pool per sey.   Now if pgpool supported variable rate pooling like 
> Apache does with it's children, it might help to even things 
> out.  That 
> and you'd still get the spike if you have to start the webserver and 
> database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
> Case in point: A first time visitor hits your home page.  A 
> dynamic page is generated (in about 1 second) and served 
> (taking 2 more seconds) which contains links to 20 additional 

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.

I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Pierre-Frédéric Caillaud
	Myself, I like a small Apache with few modules serving static files (no  
dynamic content, no db connections), and with a mod_proxy on a special  
path directed to another Apache which generates the dynamic pages (few  
processes, persistent connections...)
	You get the best of both, static files do not hog DB connections, and the  
second apache sends generated pages very fast to the first which then  
trickles them down to the clients.


Case in point: A first time visitor hits your home page.  A
dynamic page is generated (in about 1 second) and served
(taking 2 more seconds) which contains links to 20 additional
The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content  
(which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be  
a
_huge_ gain.

I think Martin's pages (dimly recalling another thread) take a pretty  
long
time to generate though, so he may not see quite such a significant gain.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if  
your
  joining column's datatypes do not match


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Case in point: A first time visitor hits your home page.  A 
dynamic page is generated (in about 1 second) and served 
(taking 2 more seconds) which contains links to 20 additional 

The gain from an accelerator is actually even more that that, as it takes
essentially zero seconds for Apache to return the generated content (which
in the case of a message board could be quite large) to Squid, which can
then feed it slowly to the user, leaving Apache free again to generate
another page.  When serving dialup users large dynamic pages this can be a
_huge_ gain.
I think Martin's pages (dimly recalling another thread) take a pretty long
time to generate though, so he may not see quite such a significant gain.

Correct the 75% of all hits are on a script that can take anywhere from 
a few seconds to a half an hour to complete.The script essentially 
auto-flushes to the browser so they get new information as it arrives 
creating the illusion of on demand generation.

A squid proxy would probably cause severe problems when dealing with a 
script that does not complete output for a variable rate of time.

As for images, CSS, javascript and such the site makes use of it, but in 
the grand scheme of things the amount of traffic they tie up is 
literally inconsequential.   Though I will probably move all of that 
onto another server just to allow the main server the capabilities of 
dealing with almost exclusively dynamic content.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Apache::DBI overall works better to what I require, even if 
it is not a 
pool per sey.   Now if pgpool supported variable rate pooling like 
Apache does with it's children, it might help to even things 
out.  That 
and you'd still get the spike if you have to start the webserver and 
database server at or around the same time.

I still don't quite get it though - you shouldn't be getting more than one
child per second being launched by Apache, so that's only one PG postmaster
per second, which is really a trivial load.  That is unless you have
'StartServers' set high, in which case the 'obvious' answer is to lower it.
Are you launching multiple DB connections per Apache process as well?
I have start servers set to a fairly high limit.   However this would 
make little different overall if I restarted the webservers to load in 
new modules during a high load time.When I am averaging 145 
concurrent connections before a restart, I can expect that many request 
to hit the server once Apache begins to respond.

As a result, it will literally cause a spike on both machines as new 
connections are initiated at a high rate.   In my case I don't always 
have the luxury of waiting till 0300 just to test a change.

Again, not necessarily looking for a PostgreSQL solution.  I am looking 
for a method that would allow the database or the OS itself to protect 
the system it's hosted on.If both the database and the apache server 
were on the same machine this type of scenario would be unstable to say 
the least.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark
> Correct the 75% of all hits are on a script that can take 
> anywhere from 
> a few seconds to a half an hour to complete.The script 
> essentially 
> auto-flushes to the browser so they get new information as it arrives 
> creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.

> A squid proxy would probably cause severe problems when 
> dealing with a 
> script that does not complete output for a variable rate of time.

No, it's fine, squid gives it to the client as it gets it, but can receive
from the server faster.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Simon Riggs
On Thu, 2004-11-04 at 15:47, Chris Browne wrote:

> Another thing that would be valuable would be to have some way to say:
> 
>   "Read this data; don't bother throwing other data out of the cache
>to stuff this in."
> 
> Something like a "read_uncached()" call...
> 
> That would mean that a seq scan or a vacuum wouldn't force useful data
> out of cache.

ARC does almost exactly those two things in 8.0.

Seq scans do get put in cache, but in a way that means they don't spoil
the main bulk of the cache.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Steinar H. Gunderson
On Thu, Nov 04, 2004 at 10:47:31AM -0500, Chris Browne wrote:
> Another thing that would be valuable would be to have some way to say:
> 
>   "Read this data; don't bother throwing other data out of the cache
>to stuff this in."
> 
> Something like a "read_uncached()" call...

You mean, like, open(filename, O_DIRECT)? :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2004-11-04 at 15:47, Chris Browne wrote:
>> Something like a "read_uncached()" call...
>> 
>> That would mean that a seq scan or a vacuum wouldn't force useful data
>> out of cache.

> ARC does almost exactly those two things in 8.0.

But only for Postgres' own shared buffers.  The kernel cache still gets
trashed, because we have no way to suggest to the kernel that it not
hang onto the data read in.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Pierre-Frédéric Caillaud
On Thu, 4 Nov 2004 18:20:18 -, Matt Clark <[EMAIL PROTECTED]> wrote:
Correct the 75% of all hits are on a script that can take
anywhere from
a few seconds to a half an hour to complete.The script
essentially
auto-flushes to the browser so they get new information as it arrives
creating the illusion of on demand generation.
Er, do you mean that :
	1- You have a query that runs for half an hour and you spoon feed the  
results to the client ?
	(argh)

	2- Your script looks for new data every few seconds, sends a packet, then  
sleeps, and loops ?

If it's 2 I have a readymade solution for you, just ask.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:
Correct the 75% of all hits are on a script that can take 
anywhere from 
a few seconds to a half an hour to complete.The script 
essentially 
auto-flushes to the browser so they get new information as it arrives 
creating the illusion of on demand generation.

This is more like a streaming data server, which is a very different beast
from a webserver, and probably better suited to the job.  Usually either
multithreaded or single-process using select() (just like Squid).  You could
probably build one pretty easily.  Using a 30MB Apache process to serve one
client for half an hour seems like a hell of a waste of RAM.
These are CGI scripts at the lowest level, nothing more and nothing 
less.  While I could probably embed a small webserver directly into the 
perl scripts and run that as a daemon, it would take away the 
portability that the scripts currently offer.

This should be my last question on the matter, does squid report the 
proper IP address of the client themselves?That's a critical 
requirement for the scripts.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Simon Riggs
On Thu, 2004-11-04 at 19:34, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2004-11-04 at 15:47, Chris Browne wrote:
> >> Something like a "read_uncached()" call...
> >> 
> >> That would mean that a seq scan or a vacuum wouldn't force useful data
> >> out of cache.
> 
> > ARC does almost exactly those two things in 8.0.
> 
> But only for Postgres' own shared buffers.  The kernel cache still gets
> trashed, because we have no way to suggest to the kernel that it not
> hang onto the data read in.

I guess a difference in viewpoints. I'm inclined to give most of the RAM
to PostgreSQL, since as you point out, the kernel is out of our control.
That way, we can do what we like with it - keep it or not, as we choose.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2004-11-04 at 19:34, Tom Lane wrote:
>> But only for Postgres' own shared buffers.  The kernel cache still gets
>> trashed, because we have no way to suggest to the kernel that it not
>> hang onto the data read in.

> I guess a difference in viewpoints. I'm inclined to give most of the RAM
> to PostgreSQL, since as you point out, the kernel is out of our control.
> That way, we can do what we like with it - keep it or not, as we choose.

That's always been a Bad Idea for three or four different reasons, of
which ARC will eliminate no more than one.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Steinar H. Gunderson
On Thu, Nov 04, 2004 at 03:30:19PM -0500, Martin Foster wrote:
> This should be my last question on the matter, does squid report the 
> proper IP address of the client themselves?That's a critical 
> requirement for the scripts.

AFAIK it's in some header; I believe they're called "X-Forwarded-For". If
you're using caching, your script will obviously be called fewer times than
usual, though, so be careful about relying too much on side effects. :-)
(This is, of course, exactly the same if the client side uses a caching
proxy. Saying anything more is impossible without knowing exactly what you
are doing, though :-) )

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

1- You have a query that runs for half an hour and you spoon feed 
the  results to the client ?
(argh)

2- Your script looks for new data every few seconds, sends a 
packet, then  sleeps, and loops ?

If it's 2 I have a readymade solution for you, just ask.
I'm guessing (2) - PG doesn't give the results of a query in a stream. 

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

These are CGI scripts at the lowest level, nothing more and nothing 
less.  While I could probably embed a small webserver directly into 
the perl scripts and run that as a daemon, it would take away the 
portability that the scripts currently offer.
If they're CGI *scripts* then they just use the CGI environment, not 
Apache, so a daemon that accepts the inbound connections, then compiles 
the scripts a-la Apache::Registry, but puts each in a separate thread 
would be, er, relatively easy for someone better at multithreaded stuff 
than me.

This should be my last question on the matter, does squid report the 
proper IP address of the client themselves?That's a critical 
requirement for the scripts.

In the X-Forwarded-For header.  Not that you can be sure you're seeing 
the true client IP anyway if they've gone through an ISP proxy beforehand.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Better Hardware, worst Results

2004-11-04 Thread Alvaro Nunes Melo
Hi,

I have a very tricky situation here. A client bought a Dell dual-machine
to be used as Database Server, and we have a cheaper machine used in
development. With identical databases, configuration parameters and
running the same query, our machine is almost 3x faster.

I tried to increase the shared_buffers and other parameters, but the
result is still the same. I would like to know what can I do to check
what can be "holding" the Dell server (HD, memory, etc). Both machines
run Debian Linux.

I'll post configuration details below, so you'll can figure my scenario
better.

==> Dell PowerEdge:
HD: SCSI
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 9
cpu MHz : 2791.292
cache size  : 512 KB

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Xeon(TM) CPU 2.80GHz
stepping: 9
cpu MHz : 2791.292
cache size  : 512 KB

# free -m
 total   used   free sharedbuffers
cached
Mem:  1010996 14  0 98   
506

==> Other machine:
HD: IDE
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 15
model   : 2
model name  : Intel(R) Pentium(R) 4 CPU 2.26GHz
stepping: 5
cpu MHz : 2262.166
cache size  : 512 KB

#free -m
 total   used   free sharedbuffers
cached
Mem:   439434  4  0 16   
395


-- 
+---+
|  Alvaro Nunes MeloAtua Sistemas de Informacao |
| [EMAIL PROTECTED]www.atua.com.br   |
|UIN - 42722678(54) 327-1044|
+---+


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Rod Taylor
On Thu, 2004-11-04 at 16:06, Alvaro Nunes Melo wrote:
> Hi,
> 
> I have a very tricky situation here. A client bought a Dell dual-machine
> to be used as Database Server, and we have a cheaper machine used in
> development. With identical databases, configuration parameters and
> running the same query, our machine is almost 3x faster.

Please send an explain analyze from both.



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Tom Lane
Alvaro Nunes Melo <[EMAIL PROTECTED]> writes:
> I have a very tricky situation here. A client bought a Dell dual-machine
> to be used as Database Server, and we have a cheaper machine used in
> development. With identical databases, configuration parameters and
> running the same query, our machine is almost 3x faster.

> ==> Dell PowerEdge:
> HD: SCSI

> ==> Other machine:
> HD: IDE

I'll bet a nickel that the IDE drive is lying about write completion,
thereby gaining a significant performance boost at the cost of probable
data corruption during a power failure.  SCSI drives generally tell the
truth about this, but consumer-grade IDE gear is usually configured to
lie.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Pierre-Frédéric Caillaud

I'm guessing (2) - PG doesn't give the results of a query in a stream.
In 1- I was thinking about a cursor...
but I think his problem is more like 2-
	In that case one can either code a special purpose server or use the  
following hack :

	In your webpage include an iframe with a Javascript to refresh it every  
five seconds. The iframe fetches a page from the server which brings in  
the new data in form of generated JavaScript which writes in the parent  
window. Thus, you get a very short request every 5 seconds to fetch new  
data, and it is displayed in the client's window very naturally.

	I've used this technique for another application and find it very cool.  
It's for selection lists, often you'll see a list of things to be checked  
or not, which makes a big form that people forget to submit. Thus I've  
replaced the checkboxes with clickable zones which trigger the loading of  
a page in a hidden iframe, which does appropriate modifications in the  
database, and updates the HTML in the parent page, changing texts here and  
there... it feels a bit like it's not a webpage but rather a standard GUI.  
Very neat. Changes are recorded without needing a submit button... I  
should write a framework for making that easy to do.

	I did not use a frame because frames suck, but iframes are convenient.  
Yeah, it does not work with Lynx... it needs JavaScript... but it works  
well.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] appropriate indexing

2004-11-04 Thread T E Schmitz
Hello,
I am seeking some advice on appropriate indexing. I think I have a rough 
idea where to place my indices but would be grateful for some tips from 
more experienced people.
The following example shows what is probably the most complex query of 
the application.

A few points to give you a rough indicator about the DB:
- application is more query than update intensive
- each table has a surrogate PK (serial)
- access of tables ITEM and PRODUCT always involves join on BRAND, 
MODEL, TYPE
- CATEGORY,SECTION,CONDITION are pretty much static and have no more 
than 30 rows
- PRODUCT table will eventually contain a few thousand records
- ITEM table will, grow, grow, grow (sold items are not deleted)
- PRODUCT_FK, TYPE_FK, MODEL_FK, BRAND_FK are never NULL
- PRODUCT_LENS... columns are only NOT NULL where  CATEGORY_PK=2
- ITEM.STATUS = available, sold, reserved ..., never NULL
- ITEM.KIND = secondhand, commission, new, never NULL

=
My understanding is:
- index the FK columns used for joins
- index columns typically used in WHERE clause
- index on e.g. PRODUCT.CATEGORY_FK prevents seq scan of CATEGORY
- as CATEGORY contains few rows it's not worth indexing CATEGORY_FK
Questions:
- Does the order of the JOIN clauses make a difference?
- Does the order of the WHERE clauses make a difference?
=
SELECT
BRAND.BRAND_NAME,
MODEL.MODEL_NAME,
TYPE.TYPE_NAME,
ITEM.RETAIL_PRICE,
CONDITION.ABBREVIATION
FROM ITEM
LEFT JOIN PRODUCT ON ITEM.PRODUCT_FK=PRODUCT.PRODUCT_PK
LEFT JOIN TYPE ON PRODUCT.TYPE_FK=TYPE.TYPE_PK
LEFT JOIN MODEL ON TYPE.MODEL_FK=MODEL.MODEL_PK
LEFT JOIN BRAND ON MODEL.BRAND_FK=BRAND.BRAND_PK
LEFT JOIN CATEGORY ON PRODUCT.CATEGORY_FK=CATEGORY.CATEGORY_PK
LEFT JOIN SECTION SECTION ON PRODUCT.SECTION_USED_FK=SECTION.SECTION_PK
LEFT JOIN CONDITION ON ITEM.CONDITION_FK=CONDITION.CONDITION_PK
WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and 
(ITEM.KIND=2 or ITEM.KIND=3)

ORDER BY SECTION.POSITION, CATEGORY.POSITION,
PRODUCT.LENS_FOCAL_LEN_FROM,PRODUCT.LENS_FOCAL_LEN_TO IS NOT NULL,
PRODUCT.LENS_FOCAL_LEN_TO,
PRODUCT.LENS_SPEED_FROM,PRODUCT.LENS_SPEED_TO,
TYPE.TYPE_NAME, CONDITION.POSITION
I'd appreciate a few pointers based on this example. Thanks in advance.
--
Regards/Gruß,
Tarlika Elisabeth Schmitz
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

In your webpage include an iframe with a Javascript to refresh it 
every  five seconds. The iframe fetches a page from the server which 
brings in  the new data in form of generated JavaScript which writes 
in the parent  window. Thus, you get a very short request every 5 
seconds to fetch new  data, and it is displayed in the client's window 
very naturally.

...
Yup.  If you go the JS route then you can do even better by using JS to 
load data into JS objects in the background and manipulate the page 
content directly, no need for even an Iframe.  Ignore the dullards who 
have JS turned off - it's essential for modern web apps, and refusing JS 
conflicts absolutely with proper semantic markup.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good 
starting point.

It's clear that this discussion has moved way away from PG!  Although in 
the context of DB backed web apps I guess in remains a bit on-topic...

M
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] appropriate indexing

2004-11-04 Thread Matt Clark

- ITEM table will, grow, grow, grow (sold items are not deleted)
WHERE PRODUCT.SECTION_USED_FK IS NOT NULL AND ITEM.STATUS=1 and 
(ITEM.KIND=2 or ITEM.KIND=3)

Partial index on item.status ?
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread al_nunes
Citando Rod Taylor <[EMAIL PROTECTED]>:
> Please send an explain analyze from both.
I'm sendin three explains. In the first the Dell machine didn't use existing
indexes, so I turn enable_seqscan off (this is the second explain). The total
cost decreased, but the total time not. The third explain refers to the cheaper
(and faster) machine. The last thing is the query itself.


 Nested Loop  (cost=9008.68..13596.97 rows=1 width=317) (actual
time=9272.803..65287.304 rows=2604 loops=1)
   ->  Hash Join  (cost=9008.68..13590.91 rows=1 width=319) (actual
time=9243.294..10560.330 rows=2604 loops=1)
 Hash Cond: ("outer".cd_tipo_pagamento = "inner".cd_tipo_pagamento)
 ->  Hash Join  (cost=9007.59..13589.81 rows=1 width=317) (actual
time=9243.149..10529.765 rows=2604 loops=1)
   Hash Cond: ("outer".cd_condicao = "inner".cd_condicao)
   ->  Nested Loop  (cost=9006.46..13588.62 rows=8 width=315)
(actual time=9243.083..10497.385 rows=2604 loops=1)
 ->  Merge Join  (cost=9006.46..13540.44 rows=8 width=290)
(actual time=9242.962..10405.245 rows=2604 loops=1)
   Merge Cond: ("outer".cd_pessoa = "inner".cd_pessoa)
   ->  Nested Loop Left Join  (cost=4658.37..9183.72
rows=375 width=286) (actual time=9210.101..10327.003 rows=23392 loops=1)
 ->  Merge Left Join  (cost=4658.37..6924.15
rows=375 width=274) (actual time=9209.952..9981.475 rows=23392 loops=1)
   Merge Cond: ("outer".cd_pessoa =
"inner".cd_pessoa)
   ->  Merge Left Join 
(cost=3366.00..5629.19 rows=375 width=255) (actual time=9158.705..9832.781
rows=23392 loops=1)
 Merge Cond: ("outer".cd_pessoa =
"inner".cd_pessoa)
 ->  Nested Loop Left Join 
(cost=2073.63..4334.24 rows=375 width=236) (actual time=8679.698..9152.213
rows=23392 loops=
1)
   ->  Merge Left Join 
(cost=2073.63..2075.94 rows=375 width=44) (actual time=8679.557..8826.898
rows=23392 loops=1
)
 Merge Cond:
("outer".cd_pessoa = "inner".cd_pessoa)
 ->  Sort 
(cost=1727.15..1728.09 rows=375 width=40) (actual time=8580.391..8611.842
rows=23392 loops=1)
   Sort Key:
p.cd_pessoa
   ->  Seq Scan on
pessoa p  (cost=0.00..1711.12 rows=375 width=40) (actual time=0.371..8247.028
rows=50
412 loops=1)
 Filter:
(cliente_liberado(cd_pessoa) = 1)
 ->  Sort 
(cost=346.47..346.69 rows=85 width=8) (actual time=99.121..120.706 rows=16470
loops=1)
   Sort Key:
e.cd_pessoa
   ->  Seq Scan on
endereco e  (cost=0.00..343.75 rows=85 width=8) (actual time=0.070..30.558
rows=16858
 loops=1)
 Filter:
(id_tipo_endereco = 2)
   ->  Index Scan using
pk_pessoa_juridica on pessoa_juridica pj  (cost=0.00..6.01 rows=1 width=196)
(actual time=0.
007..0.008 rows=1 loops=23392)
 Index Cond:
(pj.cd_pessoa = "outer".cd_pessoa)
 ->  Sort  (cost=1292.37..1293.18
rows=325 width=23) (actual time=478.963..522.701 rows=33659 loops=1)
   Sort Key: t.cd_pessoa
   ->  Seq Scan on telefone t 
(cost=0.00..1278.81 rows=325 width=23) (actual time=0.039..120.256 rows=59572
loops=1
)
 Filter: (id_principal =
1::smallint)
   ->  Sort  (cost=1292.37..1293.18 rows=325
width=23) (actual time=51.205..53.662 rows=3422 loops=1)
 Sort Key: tf.cd_pessoa
 ->  Seq Scan on telefone tf 
(cost=0.00..1278.81 rows=325 width=23) (actual time=0.024..43.192 rows=3885
loops=1)
   Filter: (id_tipo =
4::smallint)
 ->  Index Scan using pk_cep on cep c 
(cost=0.00..6.01 rows=1 width=20) (actual time=0.007..0.009 rows=1 loops=23392)
   Index Cond: (c.cd_cep = "outer".cd_cep)
   ->  Sort  (cost=4348.08..4351.89 rows=1524 width=4)
(actual time=13.182..18.069 rows=2619 loops=1)
 Sort Key: cgv.cd_pessoa
   

Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Matt Clark






[EMAIL PROTECTED] wrote:

  Citando Rod Taylor <[EMAIL PROTECTED]>:
  
  
Please send an explain analyze from both.

  
  I'm sendin three explains. In the first the Dell machine didn't use existing
indexes, so I turn enable_seqscan off (this is the second explain). The total
cost decreased, but the total time not. The third explain refers to the cheaper
(and faster) machine. The last thing is the query itself.


 Nested Loop  (cost=9008.68..13596.97 rows=1 width=317) (actual
time=9272.803..65287.304 rows=2604 loops=1)
 Nested Loop  (cost=5155.51..19320.20 rows=1 width=317) (actual
time=480.311..62530.121 rows=2604 loops=1)
 Hash Join  (cost=2.23..11191.77 rows=9 width=134) (actual
time=341.708..21868.167 rows=2604 loops=1)

  

Well the plan is completely different on the dev machine.  Therefore
either the PG version or the postgresql.conf is different.  No other
possible answer.

M




Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Rod Taylor
On Thu, 2004-11-04 at 17:42, [EMAIL PROTECTED] wrote:
> Citando Rod Taylor <[EMAIL PROTECTED]>:
> > Please send an explain analyze from both.
> I'm sendin three explains. In the first the Dell machine didn't use existing
> indexes, so I turn enable_seqscan off (this is the second explain). The total
> cost decreased, but the total time not. The third explain refers to the cheaper
> (and faster) machine. The last thing is the query itself.

All 3 plans have crappy estimates.

Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Bruno Wolff III
On Thu, Nov 04, 2004 at 22:37:06 +,
  Matt Clark <[EMAIL PROTECTED]> wrote:
> >...
> 
> Yup.  If you go the JS route then you can do even better by using JS to 
> load data into JS objects in the background and manipulate the page 
> content directly, no need for even an Iframe.  Ignore the dullards who 
> have JS turned off - it's essential for modern web apps, and refusing JS 
> conflicts absolutely with proper semantic markup.

Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Better Hardware, worst Results

2004-11-04 Thread Matt Clark

All 3 plans have crappy estimates.
Run ANALYZE in production, then send another explain analyze (as an
attachment please, to avoid linewrap).
 

Er, no other possible answer except Rod's :-)
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

Javascript is too powerful to turn for any random web page. It is only
essential for web pages because people write their web pages to only
work with javascript.
 

Hmm... I respectfully disagree.  It is so powerful that it is impossible 
to ignore when implementing a sophisticated app.  And it is not 
dangerous to the user so long as they have a popup blocker.  
Commercially, I can ignore the people who turn it off, and I can gain a 
huge benefit from knowing that 95% of people have it turned on, because 
it gives my users a hugely better experience than the equivalent XHTML 
only page (which I deliver, and which works, but which is a fairly 
depressing experience compared to the JS enabled version).

It is _amazing_ how much crud you can take out of a page if you let JS 
do the dynamic stuff (with CSS still in full control of the styling).  
Nice, clean, semantically sensible XHTML, that can be transformed for 
multiple devices - it's great.

An example:
/previews/foo.wmv
But we want it to appear in a popup when viewed in certain devices  
Easy - Attach an 'onclick' event handler (or just set the target 
attribute) when the device has a suitable screen & media player, but 
leave the markup clean for the rest of the world.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Pierre-Frédéric Caillaud
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
Yup.  If you go the JS route then you can do even better by using JS to  
load data into JS objects in the background and manipulate the page  
content directly, no need for even an Iframe.  Ignore the dullards who  
have JS turned off - it's essential for modern web apps, and refusing JS  
conflicts absolutely with proper semantic markup.

http://developer.apple.com/internet/webcontent/xmlhttpreq.html is a good  
starting point.
Didn't know this existed ! Very, very cool.
I have to check this out more in depth.
	A note though : you'll have to turn off HTTP persistent connections in  
your server (not in your proxy) or youre back to square one.

It's clear that this discussion has moved way away from PG!  Although in  
the context of DB backed web apps I guess in remains a bit on-topic...
	I find it very on-topic as
	- it's a way to help this guy solve his "pg problem" which was iin fact a  
design problem
	- it's the future of database driven web apps (no more reloading the  
whole page !)

	I think in the future there will be a good bit of presentation login in  
the client...

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

A note though : you'll have to turn off HTTP persistent 
connections in  your server (not in your proxy) or youre back to 
square one.

I hadn't considered that.  On the client side it would seem to be up to 
the client whether to use a persistent connection or not.  If it does, 
then yeah, a request every 5 seconds would still just hold open a 
server.  One more reason to use a proxy I s'pose.

It's clear that this discussion has moved way away from PG!  Although 
in  the context of DB backed web apps I guess in remains a bit 
on-topic...

I find it very on-topic as
- it's a way to help this guy solve his "pg problem" which was iin 
fact a  design problem
- it's the future of database driven web apps (no more reloading 
the  whole page !)

I think in the future there will be a good bit of presentation 
login in  the client...
Not if Bruno has his way ;-)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Matt Clark

Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Christopher Browne
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Markus 
Schaber) transmitted:
> We should create a list of those needs, and then communicate those
> to the kernel/fs developers. Then we (as well as other apps) can
> make use of those features where they are available, and use the old
> way everywhere else.

Which kernel/fs developers did you have in mind?  The ones working on
Linux?  Or FreeBSD?  Or DragonflyBSD?  Or Solaris?  Or AIX?

Please keep in mind that many of the PostgreSQL developers are BSD
folk that aren't particularly interested in creating bleeding edge
Linux capabilities.

Furthermore, I'd think long and hard before jumping into such a
_spectacularly_ bleeding edge kind of project.  The reason why you
would want this would be if you needed to get some margin of
performance.  I can't see wanting that without also wanting some
_assurance_ of system reliability, at which point I also want things
like vendor support.

If you've ever contacted Red Hat Software, you'd know that they very
nearly refuse to provide support for any filesystem other than ext3.
Use anything else and they'll make noises about not being able to
assure you of anything at all.

If you need high performance, you'd also want to use interesting sorts
of hardware.  Disk arrays, RAID controllers, that sort of thing.
Vendors of such things don't particularly want to talk to you unless
you're using a "supported" Linux distribution and a "supported"
filesystem.

Jumping into a customized filesystem that neither hardware nor
software vendors would remotely consider supporting just doesn't look
like a viable strategy to me.

> Maybe Reiser4 is a step into the right way, and maybe even a
> postgres plugin for Reiser4 will be worth the effort. Maybe XFS/JFS
> etc. already have such capabilities. Maybe that's completely wrong.

The capabilities tend to be redundant.  They tend to implement vaguely
similar transactional capabilities to what databases have to
implement.  The similarities are not close enough to eliminate either
variety of "commit" as redundant.
-- 
"cbbrowne","@","linuxfinances.info"
http://linuxfinances.info/info/linux.html
Rules of the  Evil Overlord #128. "I will not  employ robots as agents
of  destruction  if  there  is  any  possible way  that  they  can  be
re-programmed  or if their  battery packs  are externally  mounted and
easily removable." 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Christopher Browne
After a long battle with technology, [EMAIL PROTECTED] (Simon Riggs), an earthling, 
wrote:
> On Thu, 2004-11-04 at 15:47, Chris Browne wrote:
>
>> Another thing that would be valuable would be to have some way to say:
>> 
>>   "Read this data; don't bother throwing other data out of the cache
>>to stuff this in."
>> 
>> Something like a "read_uncached()" call...
>> 
>> That would mean that a seq scan or a vacuum wouldn't force useful
>> data out of cache.
>
> ARC does almost exactly those two things in 8.0.
>
> Seq scans do get put in cache, but in a way that means they don't
> spoil the main bulk of the cache.

We're not talking about the same cache.

ARC does these exact things for _shared memory_ cache, and is the
obvious inspiration.

But it does more or less nothing about the way OS file buffer cache is
managed, and the handling of _that_ would be the point of modifying OS
filesystem semantics.
-- 
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://www3.sympatico.ca/cbbrowne/oses.html
Have you ever considered beating yourself with a cluestick?

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Martin Foster
Matt Clark wrote:

Pierre-Frédéric Caillaud wrote:
check this marvelus piece of 5 minutes of work :
http://boutiquenumerique.com/test/iframe_feed.html
cela m'a fait le sourire :-)
(apologies for bad french)
M

Javascript is not an option for the scripts, one of the mandates of the 
project is to support as many different client setups as possible and we 
have encountered everything from WebTV to the latest Firefox release. 
 It's a chat/roleplay community and not everyone will invest in new 
equipment.

Now, it would seem to me that there is a trade off between a JS push 
system and a constant ever-present process.With the traditional 
method as I use it, a client will incur the initial penalty of going 
through authentication, pulling the look and feel of the realms, sites 
and simply poll one table from that point on.

Now on the other hand, you have one user making a call for new posts 
every x amount of seconds.   This means every X seconds the penalty for 
authentication and design would kick in, increasing overall the load.

The current scripts can also by dynamically adapted to slow things down 
based on heavy load or quiet realms that bring little posts in.   It's 
much harder to expect Javascript solutions to work perfectly every time 
and not be modified by some proxy.

Unfortunately, we are getting way off track.   I'm looking for a way to 
protect the PostgreSQL server, either from PostgreSQL or some sort of 
external script which pools load average once in a while to make that 
determination.

Now is there an administrative command in PostgreSQL that will cause it 
to move into some sort of maintenance mode?   For me that could be 
exceedingly useful as it would still allow for an admin connection to be 
made and run a VACUUM FULL and such.

Martin Foster
Creator/Designer Ethereal Realms
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Neil Conway
On Fri, 2004-11-05 at 06:20, Steinar H. Gunderson wrote:
> You mean, like, open(filename, O_DIRECT)? :-)

This disables readahead (at least on Linux), which is certainly not we
want: for the very case where we don't want to keep the data in cache
for a while (sequential scans, VACUUM), we also want aggressive
readahead.

-Neil



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Neil Conway
On Thu, 2004-11-04 at 23:29, Pierre-Frédéric Caillaud wrote:
>   There is also the fact that syncing after every transaction could be  
> changed to syncing every N transactions (N fixed or depending on the data  
> size written by the transactions) which would be more efficient than the  
> current behaviour with a sleep.

Uh, which "sleep" are you referring to?

Also, how would interacting with the filesystem's journal effect how
often we need to force-write the WAL to disk? (ISTM we need to sync
_something_ to disk when a transaction commits in order to maintain the
WAL invariant.)

>   There's fadvise to tell the OS to readahead on a seq scan (I think the OS  
> detects it anyway)

Not perfectly, though; also, Linux will do a more aggressive readahead
if you tell it to do so via posix_fadvise().

> if there was a system call telling the OS "in the  
> next seconds I'm going to read these chunks of data from this file (gives  
> a list of offsets and lengths), could you put them in your cache in the  
> most efficient order without seeking too much, so that when I read() them  
> in random order, they will be in the cache already ?".

http://www.opengroup.org/onlinepubs/009695399/functions/posix_fadvise.html

POSIX_FADV_WILLNEED 
Specifies that the application expects to access the specified
data in the near future.

-Neil



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-11-04 Thread Neil Conway
On Fri, 2004-11-05 at 02:47, Chris Browne wrote:
> Another thing that would be valuable would be to have some way to say:
> 
>   "Read this data; don't bother throwing other data out of the cache
>to stuff this in."

This is similar, although not exactly the same thing:

http://www.opengroup.org/onlinepubs/009695399/functions/posix_fadvise.html

POSIX_FADV_NOREUSE 
Specifies that the application expects to access the specified
data once and then not reuse it thereafter.

-Neil



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restricting Postgres

2004-11-04 Thread Bruno Wolff III
On Thu, Nov 04, 2004 at 23:32:57 +,
  Matt Clark <[EMAIL PROTECTED]> wrote:
> >
> >I think in the future there will be a good bit of presentation 
> >login in  the client...
> 
> Not if Bruno has his way ;-)

Sure there will, but it will be controlled by the client, perhaps taking
suggestions from the style sheet pointed to by the document.

Running foreign code from random or even semi-random places is a recipe
for becoming a spam server. See examples from Microsoft such as their
spreadsheet and office software. Documents really need to be passive
data, not active code.

If the client and the server have a special trust relationship, then
running code supplied by the server makes sense. So you might use javascript
within a business where the IT department runs the server and the employees
run clients. However, encouraging people to browse the internet with
javascript enabled is a bad idea.

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Checking = with timestamp field is slow

2004-11-04 Thread Antony Paul
Hi all,
   I have a table which have more than 20 records. I need to get
the records which matches like this

where today::date = '2004-11-05';

This is the only condition in the query. There is a btree index on the
column today.
Is there any way to optimise it.

rgds
Antony Paul

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly